# PSYCH 440 Study Guide (2013-14 Armstrong)

- StudyBlue
- Iowa
- Iowa State University
- Psychology
- Psychology 440
- Armstrong
- PSYCH 440 Study Guide (2013-14 Armstrong)

**Created:**2014-04-08

**Last Modified:**2014-06-30

#### Related Textbooks:

Psychological Testing and Assessment- Example: weight, height, age, IQ test scores, etc

-represent quantity of an attribute numerically

-also used to measure psychological characteristics such as IQ test scores, personality traits, interests

-a set of numbers whose properties model empirical properties of the variables assigned to which numbers are assigned

-the process of setting rules for assigning numbers in measurement

-to represent varying amounts of some trait, attribute, or characteristic

-no best way to assign numbers for all types of traits, attributes, or characteristics but there may be an optimal method for the construct you want to measure

- College majors, sex, personality types

-define when objects fall into the same or different categories with regard to an attribute

-examples: types of objects, college majors, sex, personality types

- Objectivity
- Quantification (put in terms of numbers)
- Communication
- Economy (more efficient)
- Scientific generalizabilty

- Reliability- does the test produce consistent measurement results?
- Validity- does the test measure effectively what it purports to measure?
- Adequate norms- was the test developed using samples similar to the people taking the test?

- Test developers- Psychologists required to adhere to ethical standards (APA, AERA)
- Test users- counselors, other therapists, teachers, human resources, researchers
- Test takers
- Society at large

- Tests
- Interviews
- Case history data
- Behavioral observation
- Role-playing

- Psychological States and Traits can be measured.
- Various approached to measuring aspects of the same thing can be useful.
- Various sources of error are part of the assessment process
- Test-related behavior can predict behavior in other settings
- Present-day behaviors can predict future behaviors

- Described human abilities w/ respect to reaction time, perception, and attention span

- Studied genetic influence
- Attempted to quantify differences through classification
- DEVELOPED FIRST CORRELATION COEFFICIENT
- Created anthropometic lab in London
- Major proponent of the eugenics movement
- Not very well respected

- Commissioned by France to identify "subnormal" children
- Developed first intelligence test in 1905
- Mental age proposed as evaluation criterion
- Test revised by Lewis Terman, current revisions still widely used

- First American to systematically study assessment of individual differences
- A student of Wundt, but more influenced by Galton's methods
- Studied differences in reaction time
- Brought over early intelligence tests
- Coined term "mental test"
- Named daughter "Psyche"

- Clinical psychologist
- Designed test to measure adult intelligence (Wechsler Adult Intelligence Scale

- Many early tests had no individuals in standardization samples
- Translation problems
- Remains an issue

-procedures for organizing, summarizing, and describing quantitative information (e.g. test scores)

-pictorial (e.g. histogram, bar graph)

-measures of central tendency

-measures of variability (or dispersion)

-methods for making inferences about a population of objects based in information from a sample from that population

-contrast with descriptive statistics

-examples: correlation and regression; chi-square test of association; t-test and ANOVA

- Get more numberlike as they go down

- Range
- Deviation scores
- Variance and standard deviation

Range

-other terms for variability: spread and dispersion

-each term refers to difference among scores within a sample or population

-three common types:

-range

-deviation scores

-variance and standard deviation

- A symmetrical, mathematically defined frequency distribution curve
- Highest at the center (most frequent scores are the mean)
- Asymptotic towards the abscissa
- Mean, median, and mode are equal

-a symmetrical, mathematically defined frequency distribution curve

-highest at the center (most frequent scores are at the mean) and tapering on both sides

-asymptotic towards the axis

-mean, median and mode are equal

-area under the curve is divided in terms of the standard deviation units and can aid in the interpretation of test scores

Negative - few extremely low scores and many high

-distributions can be characterized by the extent to which they are asymmetrical or "skewed"

- Platykurtic- relatively flat
- leptokurtic- relatively peaked
- mesokurtic- somewhere in the middle

-describes the steepness of a distribution in its center

-a measure of how data are peaked or flat

- More easily interpretable than raw scores
- We can tell where a score falls in relation to other scores
- Allow for easier comparisons of both similar and dissimilar scores

-represent one transformation of z which overcomes the disadvantage of working with negative scores

-t-score = (z score x 10) + 50

-an expression of the percentage of people whose score on a test or measure falls below a particular score

-a disadvantage: real diff's b/w raw scores may be minimized near the ends of the distribution and exaggerated in the middle of the distribution

- Approx 68% of the scores are between 1 and -1 SDs
- Approx 95% of scores are between 2 and -2 SDs
- Approx 99.7% of scores are between 3 and -3 SDs

- Indicates each person's standings as compared to the group mean
- Can be easily converted to percentiles

- Negative z values can be difficult to work with and explain
- Dealing with fractional z values can be a hassle

Indicates standing compared to group mean

Easily converted to percentiles

Disadvantages:

Negative z-values are difficult

Fractional values are a hassle

- Can range from -1.0 to +1.0
- When direction is positive, high scores on one variable are associated with high scores on the other
- Reversed interpretation when correlation is negative

-Use correlation coefficient (strength & direction)

- Psychological states and traits exist
- Psychological states and traits can be quantified and measured
- Test-related behavior predicts non-test-related behavior
- Measures have both strengths and weaknesses
- Various sources of error are part of the assessment process
- Testings and assessments can be conducted in a fair and unbiased manner
- Testing and assessment benefit society

- Reliability
- Validity

-NRT

-interpretation is based on an individual's relative standing in some known group

-percentiles

-CRT

-interpretation is based on measuring an individual's skill level in relation to a clearly specified standard

-can be used when more than one predictor variable is available

-multiple regression takes into account the correlation b/w each of the predictor scores and what is being predicted

-also taken into account are the correlations among the predictors

Y = a + b_{1}X_{1} + b_{2}X_{2}

^{2})

^{2})

Coefficient of determination

-accurate interpretation of correlation coefficients requires another statistic, the coefficient of determination

-calculated by squaring the correlation coefficient (r^{2})

-tells us how much variance in one variable is accounted for by the variance in the other

^{2}) tells us how much variance is not accounted for

- Predicting values of one variable based on knowledge of scores on other variables
- Simple Linear Regression is used when one variable is used to predict values
- Multiple regression is used when multiple predictors are used

-predicting values based on knowledge of scores on other variables is a practical use of correlation

-simple linear regression: 1 predictor (x), 1 criterion (y; continuous)

-multiple regression: more than 1 predictor, 1 criterion (continuous)

-logistic regression is used when the variable being predicted is dichotomous (ex. gender)

Every increase of one unit in X will result in an increase of b units in Yy= predicted score on Ya= y interceptb=slope or regression coefficientX=score on the predictor

- Are we measuring what we intended to measure?
- The appropriateness of how we use the info
- Test bias

- True score- the true standing on some construct
- Error- the part of the score that deviates from that true standing on the construct

Error (E) >> Deviation from construct

X = T + E

x - observed score

- Numerical values obtained by statistical methods that describe reliability
- Have similar properties to correlation coefficients
- Will generally range from 0 to 1, but negative values are possible but not likely
- Usually indicates the proportion of true variance in the test scores divided by the total variance observed in the scores

Simliar to correlation coefficients

-numerical values obtained by statistical methods that describe reliability

-affected by the number of items

-reliability generally increases with the number of items

-will generally have a range from 0 to 1

-again, % total variance attributable to "true variance" (true variance/ total)

-R is an index of the theoretical reliability of a test

-R = ratio b/w variance of true score to variance of observed score

-R = (sigma)^{2}_{T}/(sigma)^{2}_{X}

-where (sigma)^{2}_{x}=true variance plus error variance

Leptokurtic- peaked distribution, too many scores at the center

Reliability - measures the degree in which the tests results are consistent and stable.

Accommodation - modifying existing to consider new info and/or experiences

- Norm referenced tests - a good item is one where people who score high tended to get it right, and vice versa
- Criterion referenced tests - items need to assess mastery of the concepts

-parallel forms: two different versions of test that measure the same construct (each form has the same mean and variance)

-alternate forms: two different versions of a test that measure the same construct (tests do not meet the equal means and variances criterion)

-coefficient of equivalence is calculated by correlating the two forms of the test

- Urine screen - either with card onsite or sent to a lab
- Blood - most popular for accidents
- Saliva - gaining popularity
- Sweat - to collect over time
- Hair - residues encased in hair shaft

- Often used when making employment related decisions
- Found to be valid predictors of future performance
- Group differences in performance make this controversial

- Test conceptualization
- Test construction
- Test tryout
- Analysis
- Revision

-Produces ordinal data

-items range from weaker to stronger expressions of the variable being measured

-arranged so that agreement with stronger statements implies agreement with milder statements as well

-produces ordinal data

_Get ratings of items

_Items selected using statistical evaluation

_Individual score based on ratings

- Selected response items take less time to answer; used when breadth of knowledge is being assessed.
- Constructed response items more time consuming to answer; used to assess depth of knowledge.
- Selected response item scoring is more objective

-the item reliability index is the product of the item-score standard deviation and the correlation b/w the item score and the total test score

-provides an indication of the tests internal consistency. the higher the index, the higher the consistency

-Appropriateness of using info

^^Concerned with validity!

R=o2(true)/o2(total)

o2(total)=o2(true)+o2(error)

total variance = true variance + error variance

Goal = use measure that best addresses error for test

-sources of error affect which reliability estimate is important

-each coefficient is affected by different sources of error

-goal: use the reliability measure that best addresses the sources of error associated with a test

Test-taker factors

-anything that occurs during the administration of the test that could affect performance

-environmental factors: temp., lighting, noise, how comfortable the chair is, etc.

-test-taker factors: mood, alertness, errors in entering answers, etc.

-examiner factors: physical appearance, demeanor, nonverbal cues, etc.

Problem with:

-non-objective personality tests

-essay tests

-behavioral observations

-computer scoring

-subjectivity of scoring is a source of error variance

-more likely to be a problem with: non-objective personality tests; essay tests; behavioral observations

*Coefficient of Stability* is calculated by correlating two sets of results

-the same test is administered twice to the same group with a time interval b/w administrations

-coefficient of stability is calculated by correlating the two sets of test results

-Time

-Fatigue Effects

-there are multiple sources of error that impact the coefficient of stability:

-stability of the construct

-time/maturation

-practice effects

-fatigue effects

P=same mean & variance

A=don't meet equal means & variance

*Coefficient of Equivalence* is calculated by correlating two forms of test

-Item selection error

-there are multiple sources of error that can impact the coefficient of equivalence:

-motivation and fatigue

-events that happen b/w the two administrations

-item selection will also produce error

-represents the degree of agreement (consistency) b/w multiple scorers (or judges, raters, observers, etc.)

-calculated with pearson r (or Spearman rho depending on the scale)

-training procedures and standardized scoring criteria increase consistency

-The degree to which all items measure the same construct

-a measure of consistency within the test: how well to all of the items "hang together" or correlate with each other?

-homogeneity: the degree to which all items measure the same construct

-Cronbach's Alpha

1. split-half (with Spearman-brown)

2. Kuder-Richardson (KR-20 & KR-21)

3. Chronbach's Alpha

Steps:

Test items split in half

Scores of each half are correlated

Correlation-Coefficient is corrected using *Spearman-Brown formula*

r

_{sb}= n*r

_{xy }/[1+(n-1)*r

_{xy}]

n= 100/300 = .33

-Have 10 items and want to add 30

n= 30/10 = 3

Statistic of choice with dichotomous items (yes & no)

-Mean of all possible split-half correlations corrected by Spearman-Brown

-Most popular reliability coefficient with psychological research

a = [k/(k-1)]*[1-(Eo

^{2}

_{i})/o

^{2}]

o2

_{1}= variance of one item

Criterion - reflect material that is mastered hierarchically

^^ Reduced variability in scores which reduces estimates

^^pilot comparison w/ & w/out mastery to asses items

-If all facets are the same, should expect the same score

-If facets vary, scores should vary

What situations will this be reliable?

What facets impact the test the most?

True-score theory doesn't differentiate finite sample of behaviors from universe of behaviors

Generalizability theory describes conditions (facets) over which one can generalize scores

-Provides estimate of amount of error in observed score or measurement

-Based on True-Score theory

-Inverse relation with reliability

-Used to estimate extent to which observed deviates from true score

-SEM is an estimate of measurement precision

-high reliability = small standard deviation of scores = small SEM

SEM - method of estimating amount of error in test score

^^is a function of reliability of test and variability of test scores

Leads to small standard deviation and SEM

Reliability and SEM are *inversely related*

-Standard error of measurement considers observed test scores as indicative of a potential range of scores

- Determines how large a different should be before it becomes statistically significant

-in practice, the SEM is most frequently used in the interpretation of an individual's test scores

-another statistic, the standard error of the difference (sigma_{diff}) is better when making comparisons b/w scores

-scores b/w people, tests, or two scores from the same person over time

-Refers to degree of appropriateness

-Has more to do with test-taker, rather than test-user

-does the test look like it measures what it is supposed to measure?

-has to do more with the judgments of the test TAKER, not the user

-not a statistical issue

^^Use Lewshe's Content Validity Ratio (CVR)

a. Essential to construct

b. Useful but not essential

c. Not necessary

Negative: less than half "essential"

Zero: half "essential"

Positive: more than half "essential"}

>>Items kept if agreement exceeds chance

-Typically relevant, uncontaminated & something that can be measured reliably

-criterion: the standard against which a test or a test score is evaluated

-a good criterion is generally:

-relevant

-uncontaminated

-something that can be measured reliably

-validity coefficient: typically a correlation coefficient between scores on the test and some criterion measure (r_{xy})

-Pearson's r is the usual measure, but may need to use other types of correlation coefficients depending on the data scale

-can also use "expectancy tables" for categorical criterion

-Different formulas for un/related variables

*Umbrella Validity*

-construct: unobservable underlying trait hypothesized to describe or explain behavior

-construct validity is the process of determining the appropriateness of inferences about the construct drawn from test scores

-formulate and test hypotheses derived from theories about the nature of the construct

Increase or decrease as predicted

-Do subscales correlate with total score?

-Do individual items correlate with subscale or total scale scores?

-Do all of the items load onto a single factor using a factor analysis?

-Will the construct change after intervention?

-Doesn't have to measure the exact construct, similar ones are OK!

-a miss wherein the test predicted that the test taker did possess the characteristic when that person did not

-Categorical or Comparative

-Assess breadth of knowledge

-Assess depth of knowledge

-Three options: cumulative, class/categoral, & ipsative

-Faking can be issue with attitudes: Faking good & bad

-guessing is only an issue for tests where a "correct answer" exists

-not an issue when measuring attitudes

-faking can be an issue with attitudes

-faking good: positive self-presentation

-faking bad: malingering or trying to create a less favorable impression

-random responding

-May include enhancing

-known as item-endorsement index in other contexts

-the proportion of the total number of test takers who got the item right

-proportion of total test takers who get item right

-ideal average (p_{i}) is halfway b/w chance and guessing at 1.0

Can test statistical significance of the correlation & interpret % of variability item accounts

-a simple correlation b/w the score on an item and the total score

-advantages: can test statistical significance of the correlation; can interpret % of variability item accounts for (rit2)

Often evaluated using latent trait theory

Contrast number who got item correct in upper and lower ranges

-symbolized by d: compares proportion of high scorers getting item "correct" and proportion of low scores getting item "correct"

-higher positive values indicate item passed by more examinees in the upper group, while negative values indicate more from lower group passed the item

· The most adequate conceptualization of a person’s behavior in all its detail” (McClelland, 1951)

· Consistent behavior patterns and intrapersonal processes originating within the individual” (Burger, 1997)

· “an individual’s unique constellation of psychological traits and states” (C&S)

-the relatively distinctive and stable patterns of behavior that characterize an individual and their reactions to the environment

-3 common components: focus on individual diff's; the individual diff's are relatively stable; usually refer to intrapersonal processes of emotions, motivations and cognitive processes

· any relatively enduring characteristic of an individual that distinguishes that person from another

o Example: extraversion, Introversion, openness, contentiousness

· a temporary, or transient presentation of some personality trait or disposition

o Examples: anxious, calm, fearful, embarrassed, happy, sad etc.

· are divided as unique sets of traits and states that are similar in pattern to an identified category of personality within a taxonomy of personalities

· Not all typologies are based on psychological theories with an empirical basis

· temperament of the blood, season of spring and the element of air

· associated with functioning of the liver (blood) makes a person optimistic and cheerful

· associated with the spleen, easily angered, bad tempered and controlling

· associated with the gall bladder; perfectionistic, depressive

·phlegm, winter, water

associated with the lungs, calm and unemotional

Six (modern) approaches to Personality

· Unconscious minds are largely responsible for important differences in behavior styles

· people can be described along a continuum of various personality characteristics

· Inherited predispositions and physiological processes explain individual differences

· keys to individual differences are degree of personal responsibility and self acceptance

· consistent differences are the result of conditioning and expectations

· differences are the result of the way people process information and explain differences in behavior

How are Personality assessments used?

· Employment matching

· Adjustment issues for decisions about military service

· Academic opportunities

· Employment mobility

· Diagnoses, or degree of impact from some trauma

· Inform treatments

· Research and validation of theory

Assessment Methods

· Interviews

· Self report to written questions

· Card sorts (q-sort)

· Responses to ambiguous stimuli

· Interviews or responses of friends, family, spouse, teacher, coworkers, etc.

· Case histories

· Ratings by judges or experts

· Paper and Pencil or computer aided

o Choose a response from options that represent various characteristics of personality

· Procedures for scoring require little judgment

· Allows for implementation of a variety of validity indices

o can be answered and scored quickly (scored reliably)

o Breadth of content

o Can be administered or groups or individuals

-can be answered quickly

-can be administered by computer

-can be scored quickly and reliably

-can be administered in groups or individually

-procedures for scoring require little interpretation

-allows for implementation of a variety of validity indices

o Assuming honesty and capacity/insight to answer questions accurately.

-categorical labels or integers, no meaningful middle grounds between categories

-e.g. 1-single, 2-married

-scales, or levels, of measurement help determine what statistical analyses are appropriate

-enable test users to make accurate score interpretations

-four levels:

-nominal

-ordinal

-interval

-ratio

**NOIR**

-nominal (AKA naming) level

-lowest level of measurement

-ordering is not important, only the label attached to designate a mutually exclusive and exhaustive category

-examples:

-medical diagnoses

-gender

-political party affiliation

-values imply nothing about magnitude of differences between one level to the next

-the numbers do not indicate units of measurement

-no absolute zero point; the ways in which data form ordinal scales are limited

-because of equal intervals between values some mathematic operations are meaningfully appropriate:

-addition and subtraction, statistical tests based on mean scores and/or variance

-the difference between the highest and lowest scores

-sensitive to outliers

-the average of the sum of the squared deviations of each score from the mean

_

s^{2 }= ^{1}/n-1 Σ(Xi - X)^{2}

-the average deviation of each score from the mean

-the standard deviation is the square root of the variance

-expressed in the same units of measurement as the original scores

-only a few extremely high scores and many low scores

-tail goes to the right

Negative skew

-only a few extremely low scores and many high ones

-tail goes to the left

-z-scores can be used to calculate percentiles when raw scores have a normal distribution

-when used in conjunction w/ a Z-table, the z-score reveals the area of the normal distribution below the score in question

-the Z-score table that indicates the proportion of the total number of scores that fall into a certain range of z-scores

Interpret performance in various groups and easily understood

Disadvantages:

Units are not equal on all parts of the scale

-Differences between individuals near the middle are magnified while extremes are compressed

-strong negative: (r=-.7)

-moderate/weak negative: (r = -.4)

-a way of interpreting test scores by comparing an individual's results to the scores of a group of test takers

-interpretation is relative

-percentiles

-"top 5%"

-the group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual test-takers

-a normative sample must be representative or typical of the intended population of interest

-diff's need to be proportionately represented in the sample

-e.g. gender, race/ethnicity

-sampling individuals from subgroups in the population in the same proportion as the population they are part of

-provides greater precision than a simple random sample of the same size

-can guard against the "unrepresentative" sample

-each individual from the population has an equal chance of being included in the sample

-true random sampling is very rare in practice (time & $, ethics, self selection)

-contrast w/ random assignment (random assignment of participants in the selected sample to different experimental conditions)

-a sample that is convenient or available for use

-ISU psychology participant pool

-average performance of different samples of test-takers at various ages/grades

-scores often used as evaluative standards for one's performance on a test (e.g. below average, average)

-concept of "mental age"

-can be interpreted as the mean of all possible split-half correlations, corrected by the Spearman-Brown formula

-Ranges from 0 to 1 with values closer to 1 indicating greater reliability

-"generally acceptable" values are .70-.90

-coefficient alpha above .90 may be "too high;" indicating redundancy