-represent quantity of an attribute numerically
-also used to measure psychological characteristics such as IQ test scores, personality traits, interests
-a set of numbers whose properties model empirical properties of the variables assigned to which numbers are assigned
-the process of setting rules for assigning numbers in measurement
-to represent varying amounts of some trait, attribute, or characteristic
-no best way to assign numbers for all types of traits, attributes, or characteristics but there may be an optimal method for the construct you want to measure
-define when objects fall into the same or different categories with regard to an attribute
-examples: types of objects, college majors, sex, personality types
-procedures for organizing, summarizing, and describing quantitative information (e.g. test scores)
-pictorial (e.g. histogram, bar graph)
-measures of central tendency
-measures of variability (or dispersion)
-methods for making inferences about a population of objects based in information from a sample from that population
-contrast with descriptive statistics
-examples: correlation and regression; chi-square test of association; t-test and ANOVA
-other terms for variability: spread and dispersion
-each term refers to difference among scores within a sample or population
-three common types:
-range
-deviation scores
-variance and standard deviation
-a symmetrical, mathematically defined frequency distribution curve
-highest at the center (most frequent scores are at the mean) and tapering on both sides
-asymptotic towards the axis
-mean, median and mode are equal
-area under the curve is divided in terms of the standard deviation units and can aid in the interpretation of test scores
-distributions can be characterized by the extent to which they are asymmetrical or "skewed"
-describes the steepness of a distribution in its center
-a measure of how data are peaked or flat
-represent one transformation of z which overcomes the disadvantage of working with negative scores
-t-score = (z score x 10) + 50
-an expression of the percentage of people whose score on a test or measure falls below a particular score
-a disadvantage: real diff's b/w raw scores may be minimized near the ends of the distribution and exaggerated in the middle of the distribution
-NRT
-interpretation is based on an individual's relative standing in some known group
-percentiles
-CRT
-interpretation is based on measuring an individual's skill level in relation to a clearly specified standard
-can be used when more than one predictor variable is available
-multiple regression takes into account the correlation b/w each of the predictor scores and what is being predicted
-also taken into account are the correlations among the predictors
Y = a + b_{1}X_{1} + b_{2}X_{2}
Coefficient of determination
-accurate interpretation of correlation coefficients requires another statistic, the coefficient of determination
-calculated by squaring the correlation coefficient (r^{2})
-tells us how much variance in one variable is accounted for by the variance in the other
-predicting values based on knowledge of scores on other variables is a practical use of correlation
-simple linear regression: 1 predictor (x), 1 criterion (y; continuous)
-multiple regression: more than 1 predictor, 1 criterion (continuous)
-logistic regression is used when the variable being predicted is dichotomous (ex. gender)
Every increase of one unit in X will result in an increase of b units in Yy= predicted score on Ya= y interceptb=slope or regression coefficientX=score on the predictor
-numerical values obtained by statistical methods that describe reliability
-affected by the number of items
-reliability generally increases with the number of items
-will generally have a range from 0 to 1
-again, % total variance attributable to "true variance" (true variance/ total)
-R is an index of the theoretical reliability of a test
-R = ratio b/w variance of true score to variance of observed score
-R = (sigma)^{2}_{T}/(sigma)^{2}_{X}
-where (sigma)^{2}_{x}=true variance plus error variance
-parallel forms: two different versions of test that measure the same construct (each form has the same mean and variance)
-alternate forms: two different versions of a test that measure the same construct (tests do not meet the equal means and variances criterion)
-coefficient of equivalence is calculated by correlating the two forms of the test
-items range from weaker to stronger expressions of the variable being measured
-arranged so that agreement with stronger statements implies agreement with milder statements as well
-produces ordinal data
-the item reliability index is the product of the item-score standard deviation and the correlation b/w the item score and the total test score
-provides an indication of the tests internal consistency. the higher the index, the higher the consistency
-sources of error affect which reliability estimate is important
-each coefficient is affected by different sources of error
-goal: use the reliability measure that best addresses the sources of error associated with a test
-anything that occurs during the administration of the test that could affect performance
-environmental factors: temp., lighting, noise, how comfortable the chair is, etc.
-test-taker factors: mood, alertness, errors in entering answers, etc.
-examiner factors: physical appearance, demeanor, nonverbal cues, etc.
-subjectivity of scoring is a source of error variance
-more likely to be a problem with: non-objective personality tests; essay tests; behavioral observations
-the same test is administered twice to the same group with a time interval b/w administrations
-coefficient of stability is calculated by correlating the two sets of test results
-there are multiple sources of error that impact the coefficient of stability:
-stability of the construct
-time/maturation
-practice effects
-fatigue effects
-there are multiple sources of error that can impact the coefficient of equivalence:
-motivation and fatigue
-events that happen b/w the two administrations
-item selection will also produce error
-represents the degree of agreement (consistency) b/w multiple scorers (or judges, raters, observers, etc.)
-calculated with pearson r (or Spearman rho depending on the scale)
-training procedures and standardized scoring criteria increase consistency
-a measure of consistency within the test: how well to all of the items "hang together" or correlate with each other?
-homogeneity: the degree to which all items measure the same construct
1. split-half (with Spearman-brown)
2. Kuder-Richardson (KR-20 & KR-21)
3. Chronbach's Alpha
-SEM is an estimate of measurement precision
-high reliability = small standard deviation of scores = small SEM
-in practice, the SEM is most frequently used in the interpretation of an individual's test scores
-another statistic, the standard error of the difference (sigma_{diff}) is better when making comparisons b/w scores
-scores b/w people, tests, or two scores from the same person over time
-does the test look like it measures what it is supposed to measure?
-has to do more with the judgments of the test TAKER, not the user
-not a statistical issue
-criterion: the standard against which a test or a test score is evaluated
-a good criterion is generally:
-relevant
-uncontaminated
-something that can be measured reliably
-validity coefficient: typically a correlation coefficient between scores on the test and some criterion measure (r_{xy})
-Pearson's r is the usual measure, but may need to use other types of correlation coefficients depending on the data scale
-can also use "expectancy tables" for categorical criterion
-construct: unobservable underlying trait hypothesized to describe or explain behavior
-construct validity is the process of determining the appropriateness of inferences about the construct drawn from test scores
-formulate and test hypotheses derived from theories about the nature of the construct
-a miss wherein the test predicted that the test taker did possess the characteristic when that person did not
-guessing is only an issue for tests where a "correct answer" exists
-not an issue when measuring attitudes
-faking can be an issue with attitudes
-faking good: positive self-presentation
-faking bad: malingering or trying to create a less favorable impression
-random responding
-known as item-endorsement index in other contexts
-the proportion of the total number of test takers who got the item right
-proportion of total test takers who get item right
-ideal average (p_{i}) is halfway b/w chance and guessing at 1.0
-a simple correlation b/w the score on an item and the total score
-advantages: can test statistical significance of the correlation; can interpret % of variability item accounts for (rit2)
-symbolized by d: compares proportion of high scorers getting item "correct" and proportion of low scores getting item "correct"
-higher positive values indicate item passed by more examinees in the upper group, while negative values indicate more from lower group passed the item
· The most adequate conceptualization of a person’s behavior in all its detail” (McClelland, 1951)
· Consistent behavior patterns and intrapersonal processes originating within the individual” (Burger, 1997)
· “an individual’s unique constellation of psychological traits and states” (C&S)
-the relatively distinctive and stable patterns of behavior that characterize an individual and their reactions to the environment
-3 common components: focus on individual diff's; the individual diff's are relatively stable; usually refer to intrapersonal processes of emotions, motivations and cognitive processes
· any relatively enduring characteristic of an individual that distinguishes that person from another
o Example: extraversion, Introversion, openness, contentiousness
· a temporary, or transient presentation of some personality trait or disposition
o Examples: anxious, calm, fearful, embarrassed, happy, sad etc.
· are divided as unique sets of traits and states that are similar in pattern to an identified category of personality within a taxonomy of personalities
· Not all typologies are based on psychological theories with an empirical basis
· temperament of the blood, season of spring and the element of air
· associated with functioning of the liver (blood) makes a person optimistic and cheerful
· associated with the spleen, easily angered, bad tempered and controlling
· associated with the gall bladder; perfectionistic, depressive
·phlegm, winter, water
associated with the lungs, calm and unemotional
Six (modern) approaches to Personality
· Unconscious minds are largely responsible for important differences in behavior styles
· people can be described along a continuum of various personality characteristics
· Inherited predispositions and physiological processes explain individual differences
· keys to individual differences are degree of personal responsibility and self acceptance
· consistent differences are the result of conditioning and expectations
· differences are the result of the way people process information and explain differences in behavior
How are Personality assessments used?
· Employment matching
· Adjustment issues for decisions about military service
· Academic opportunities
· Employment mobility
· Diagnoses, or degree of impact from some trauma
· Inform treatments
· Research and validation of theory
Assessment Methods
· Interviews
· Self report to written questions
· Card sorts (q-sort)
· Responses to ambiguous stimuli
· Interviews or responses of friends, family, spouse, teacher, coworkers, etc.
· Case histories
· Ratings by judges or experts
· Paper and Pencil or computer aided
o Choose a response from options that represent various characteristics of personality
· Procedures for scoring require little judgment
· Allows for implementation of a variety of validity indices
o can be answered and scored quickly (scored reliably)
o Breadth of content
o Can be administered or groups or individuals
-can be answered quickly
-can be administered by computer
-can be scored quickly and reliably
-can be administered in groups or individually
-procedures for scoring require little interpretation
-allows for implementation of a variety of validity indices
o Assuming honesty and capacity/insight to answer questions accurately.
-categorical labels or integers, no meaningful middle grounds between categories
-e.g. 1-single, 2-married
-scales, or levels, of measurement help determine what statistical analyses are appropriate
-enable test users to make accurate score interpretations
-four levels:
-nominal
-ordinal
-interval
-ratio
**NOIR**
-nominal (AKA naming) level
-lowest level of measurement
-ordering is not important, only the label attached to designate a mutually exclusive and exhaustive category
-examples:
-medical diagnoses
-gender
-political party affiliation
-values imply nothing about magnitude of differences between one level to the next
-the numbers do not indicate units of measurement
-no absolute zero point; the ways in which data form ordinal scales are limited
-because of equal intervals between values some mathematic operations are meaningfully appropriate:
-addition and subtraction, statistical tests based on mean scores and/or variance
-the difference between the highest and lowest scores
-sensitive to outliers
-the average of the sum of the squared deviations of each score from the mean
_
s^{2 }= ^{1}/n-1 Σ(Xi - X)^{2}
-the average deviation of each score from the mean
-the standard deviation is the square root of the variance
-expressed in the same units of measurement as the original scores
-only a few extremely high scores and many low scores
-tail goes to the right
Negative skew
-only a few extremely low scores and many high ones
-tail goes to the left
-z-scores can be used to calculate percentiles when raw scores have a normal distribution
-when used in conjunction w/ a Z-table, the z-score reveals the area of the normal distribution below the score in question
-the Z-score table that indicates the proportion of the total number of scores that fall into a certain range of z-scores
-strong negative: (r=-.7)
-moderate/weak negative: (r = -.4)
-a way of interpreting test scores by comparing an individual's results to the scores of a group of test takers
-interpretation is relative
-percentiles
-"top 5%"
-the group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual test-takers
-a normative sample must be representative or typical of the intended population of interest
-diff's need to be proportionately represented in the sample
-e.g. gender, race/ethnicity
-sampling individuals from subgroups in the population in the same proportion as the population they are part of
-provides greater precision than a simple random sample of the same size
-can guard against the "unrepresentative" sample
-each individual from the population has an equal chance of being included in the sample
-true random sampling is very rare in practice (time & $, ethics, self selection)
-contrast w/ random assignment (random assignment of participants in the selected sample to different experimental conditions)
-a sample that is convenient or available for use
-ISU psychology participant pool
-average performance of different samples of test-takers at various ages/grades
-scores often used as evaluative standards for one's performance on a test (e.g. below average, average)
-concept of "mental age"
-can be interpreted as the mean of all possible split-half correlations, corrected by the Spearman-Brown formula
-Ranges from 0 to 1 with values closer to 1 indicating greater reliability
-"generally acceptable" values are .70-.90
-coefficient alpha above .90 may be "too high;" indicating redundancy
-twins raised apart support genetic component in intelligence based on degree of similarity
-though not as similar as if raised together
-children adopted from mothers with higher IQs tend to have higher IQs, irrespective of adoption family's SES
-though those in higher SES have higher IQs
-central tendency measures are used to describe the typical response seen in a sample of observations
-variability measures are used to describe how much fluctuation in scores there are in a sample of observations
-we need both to interpret a person's score
-both variance and standard deviation reflect the variability of scores about the mean of the group
-typical distance of a score from the mean
-strong relation: r=.7 or higher
-moderate relation: r=.4 or lower
-standard error of the estimate (SE)
-indicates magnitude of errors in estimation
-higher correlations produce smaller SE
-lower correlations produce larger SE
-most widely used psychological test in the world
-developed by Hathaway and McKinley in the late 1930s and early 1940s
-university of Minn. hospital and persons w/in community
-originally designed to assist w/ the diagnosis of different psychiatric disorders
-at one time was popular for use in employment screening
MMPI-2
-items revised, removed, replaced
-norm: 1138 males and 1462 females b/w 18 & 80 from several regions and divers communities w/in the US
-increased attention to "non-pathological" interpretation
-checklist: list of behaviors, thoughts, events, etc.: each is marked if it is present and/or on intensity
-can be filled out by individual or an evaluator
-rating scale: evaluator provides a score to indicate relative standing on a list of characteristics
-clinicians sometimes want info from clients that is best obtained through psychological tests
-many purposes
-general diagnostic decisions
-ID positive and negative personality traits/states/types
-working with "stuckness"
-info from tests is more "scientifically reliable" than the info from a clinical interview
-the semantic distinction is blurred
-assessment requires greater education, training and skills than simply administering a test
-educational
-clinical
-counseling
-geriatric
-business
-military
-other settings
-intelligence: e.g. Wechsler adult intelligence scale (WAIS), Stanford-Binet intelligence scales (SBIS)
-personality: e.g. MN multiphasic personality inventory-2 (MMPI-2)
-interview
-portfolio
-case history data
-behavioral observation
-role-play tests
-computer as tools
-by what (and how) they measure
-content
-format
-administration procedures (e.g. computer-assisted vs. paper/pencil)
-scoring and interpretation procedures
-psychometric quality
-reliability: does the test produce consistent measurement results?
-validity: does the test measure effectively what it purports to measure?
-adequate norms: was the test developed using samples similar to the people taking the test?
-test catalogues
-test manuals
-reference volumes
-journal articles
-online databases: e.g. PsychINFO and PsychARTICLES
-scaling involves quantifying and assigning a value
-classification involves categorization
-ordinal
-but, most of the time we treat psychological measures as being on interval scales
-ease of statistical manipulation
-works well in practice
-most frequently observed score
-only measure of central tendency that can be used with Nominal data
-the average of a set of scores
-found by summing all values and then dividing that sum by the total number of observed values
-requires interval or ration data
-sensitive to every score in the sample, and may be inappropriate with skewed data/outliers