6. Evaluating selection techniques and decisions | Mean And Standard Deviation

Test (in I/O psych)

Refers to ANY TECHNIQUE used to evaluate someone.

Mental measurements yearbook (MMY)

The name of a book containing info about RELIABILITY and VALIDITY of various PSYCHOLOGICAL TESTS

1. References
2. Interviews
3. Assessment centers

EMPLOYMENT tests include such methods as:

1. Reliable
2. Valid
3. Cos-efficient
4. Fair
5. Legally defensible

5 characteristics of EFFECTIVE SELECTION TECHNIQUES

Reliability

The extent to w/c a score from a test or from an evaluation is CONSISTENT/STABLE and FREE FROM ERROR.

NOT USEFUL

If a score from a measure is NOT STABLE or ERROR-FREE, it is ______

Reliability

An essential characteristic of an effective measure

1. Test-retest reliability
2. Alternate-forms reliability
3. Internal reliability
4. Scorer reliability

Test RELIABILITY is determined in 4 ways:

Test-retest reliability

- The extent to w/c REPEATED ADMINISTRATION of the SAME TEST will achieve SIMILAR RESULTS
- Each one of several ppl take the same test twice. The scores from the 1st administration of the test are correlated w/ scores from the 2nd to determine whether the

Temporal stability

- The CONSISTENCY of test scores ACROSS TIME
- Scores are STABLE across time and NOT HIGHLY SUSCEPTIBLE to such random daily conditions as illness, fatigue, stress, or uncomfortable testing conditions

NO STANDARD

In test-retest reliability, there is ______ amount of TIME that should elapse between the 2 administrations of the test. However, the time interval should be LONG ENOUGH so that the specific test answers have NOT been MEMORIZED, but SHORT ENOUGH so that t

3 days to 3 months

Typical TIME INTERVALS between test administration in TEST-RETEST reliability range from ____ days to ____ months.

LONGER, LOWER

Usually, the ______ time interval in test-retest reliability, the _____ reliability coefficient.

.86

The typical test-retest reliability coefficient for tests used by organizations is ____.

Short-term moods or feelings

TEST-RETEST reliability is NOT appropriate for ALL kinds of tests. It would not make sense to measure the test-retest reliability of a test designed to measure ___________.

Trait anxiety and State anxiety

STATE-TRAIT Anxiety Inventory measures 2 types of anxiety

Trait anxiety

Refers to the amount of anxiety that an individual normally has ALL THE TIME

State anxiety

The amount of anxiety an individual has at ANY GIVEN MOMENT

Alternate-forms reliability

- The extent to w/c 2 FORMS of the SAME TEST are SIMILAR
- 2 FORMS of the SAME TEST are constructed. The scores on the 2 forms are then correlated to determine whether they are SIMILAR. If they are, the test is said to have FORM STABILITY

Form stability

The extent to w/c the SCORES on 2 FORMS of a test are SIMILAR.

Counterbalancing

A method of controlling for ORDER EFFECTS by giving half of a sample Test A first, followed by Test B, and giving the other half of the sample Test B first, followed by Test A.

Counterbalancing

Designed to ELIMINATE any effects that taking one form of the test first may have on scores on the second form

Alternate-forms reliability

This type of reliability method is used because if there is a high probability that ppl will take a test more than once, 2 forms of the test are needed to REDUCE THE POTENTIAL ADVANTAGE to individuals who take the test a second time. (e.g., Police dept ex

Multiple forms

Might be used in large grps of test takers where there is a possibility of cheating

Short as possible

The TIME INTERVAL with ALTERNATE-FORMS reliability should be as ________.

.89

The avg correlation between ALTERNATE FORMS of tests used in industry is ____

Mean and standard deviation (SD)

In addition to being CORRELATED, two forms of a test in alternate forms reliability should also have the SAME ________ and _______.

NOT EQUIVALENT

The difference in MEAN SCORES indicates that the 2 forms are ______. In such case, either the forms must be REVISED or DIFFERENT STANDARDS (norms) must be used to interpret the results of the test.

Reliability, Validity, Difficulty

ANY CHANGES in a TEST in ALTERNATE FORMS RELIABILITY potentially CHANGE its _______, ______, _______, or ALL THREE.

TEST OUTCOMES

Though ALTERNATE-FORM DIFFERENCES potentially affect the ________, most of the research indicates that these effects are either NONEXISTENT or rather SMALL.

Internal reliability

Consistency w/ which an applicant responds to items measuring a SIMILAR DIMENSION or CONSTRUCT

Internal consistency

The extent to w/c SIMILAR ITEMS are ANSWERED in SIMILAR WAYS and measures ITEM STABILITY

Item stability

The extent to w/c RESPONSES to the SAME test ITEMS are CONSISTENT

LONGER, HIGHER

The ______ the test, _______ its internal consistency�the AGREEMENT among RESPONSES to the various test items.

Item homogeneity

Another factor that can affect the internal reliability of a test is ________. This is defined as the extent to w/c TEST ITEMS MEASURE THE SAME CONSTRUCT, or simply ALL of the items measure the SAME CONSTRUCT

HIGHER

The MORE HOMEGENOUS the items, the _______ the INTERNAL CONSISTENCY

1. Split-half method
2. Coefficient alpha
3. Kuder-Richardson formula 20 (K-R 20)

Methods used to determine INTERNAL CONSISTENCY

Split-half method

A form of INTERNAL RELIABILITY that is EASIEST to USE and in w/c the consistency of item responses is determined by COMPARING scores on HALF of the items w/ scores on the OTHER HALF of the items.

Split-half method

Usually, all of the odd-numbered items are in one group and all the even-numbered items are in the other group. The scores on the 2 groups of items are then correlated.

Spearman-Brown Prophecy Formula

Because the NUMBER OF ITEMS in the test has been REDUCED, researcher have to use a formula, called the ______ to ADJUST THE CORRELATION

Spearman-Brown Prophecy Formula

Used to CORRECT reliability coefficients resulting from the S-H method.

1. Cronbach's alpha
2. K-R 20

- 2 methods that are MORE POPULAR and ACCURATE of determining INTERNAL RELIABILITY, although they are MORE COMPLICATED TO COMPUTE

K-R 20

Used for tests containing DICHOTOMOUS ITEMS (e.g., yes-no, true-false)

Coefficient alpha

Can be used for tests containing DICHOTOMOUS ITEMS, INTERVAL and RATIO ITEMS such as 5-point scales

.81

The MEDIAN INTERNAL RELIABILITY COEFFICIENT found in the research is

Coefficient alpha

By far the MOST COMMONLY reported measure of INTERNAL RELIABILITY

Coefficient alpha

A statistic used to determine INTERNAL RELIABILITY of tests that use INTERVAL or RATIO SCALES

Scorer reliability

The extent to which 2 PPL SCORING a test AGREE on a TEST SCORE, or the extent to w/c a test is SCORED CORRECTLY

Scorer reliability

- A test/inventory can have homogeneous items and yield heterogeneous scores and still not be reliable if the person scoring the test makes mistakes.
- An issue in PROJECTIVE or SUBJECTIVE tests in w/c there is NO ONE CORRECT ANSWER, but even tests scored

Interrater reliability

When HUMAN JUDGMENT of performance is involved, scorer reliability is discussed in terms of _____. "Will two interviewers give an applicant SIMILAR RATINGS? Or will two supervisors give an employee similar performance ratings?

1. MAGNITUDE of the reliability coefficient
2. Test taker

When deciding whether a test demonstrates sufficient reliability, 2 factors must be considered:

1. Own data
2. Test manual
3. Journal articles
4. Test compendia

The reliability coefficient for a test can be obtained from:

Similar

To EVALUATE the coefficient, you can COMPARE it w/ reliability coefficients typically obtained for _____ types of tests.

Validity

The degree to w/c inferences from test scores are justified by the evidence.

Reliability, DOES NOT

As with _______, a test MUST be VALID to be USEFUL. But just because a test is reliable ______ it is valid.

Reliability

The POTENTIAL VALIDITY of a test is limited by its ______. Thus, if a test has POOR RELIABILITY, it CANNOT HAVE HIGH VALIDITY.

NECESSARY BUT NOT SUFFICIENT

A test's reliability does not imply validity. Instead, we think of RELIABILITY as having a __________ relationship w/ VALIDITY

1. Content
2. Criterion
3. Construct
4. Face
5. Known-group

5 common strategies to investigate the VALIDITY of scores on a test

Content validity

- The extent to w/c tests or test items SAMPLE the CONTENT that they are supposed to measure.
- For example, your instructor tells u that the final exam will measure your knowledge of Chapters 8, 9 and 10. The test will have 60 questions. For the test to

Job analysis

In industry, the appropriate content for a test or test battery is determined by the _________

RATE test items

One way to test CONTENT VALIDITY of a test is to have SME _______ on the extent to w/c the content and lvl of difficulty for each item are related to the job in question. These SMEs should be asked to indicate if there are important aspects of the job tha

Criterion validity

The extent to w/c a test score is related to some measure of JOB PERFORMANCE called a CRITERION

1. Supervisor ratings of performance
2. Objective measures of performance (e.g., sales, # of complaints, # of arrest maid)
3. Attendance (tardiness, absenteeism)
4. Tenure
5. Training performance (e.g., police academy grades)
6. Discipline problems

Commonly used criteria

1. Concurrent validity
2. Predictive validity

CRITERION validity is established using 1 of 2 research designs:

Concurrent validity

- A form of criterion validity that CORRELATES TEST SCORES w/ MEASURES OF JOB PERFORMANCE for employees CURRENTLY working for an organization
- A test is given to a grp of employees who are ALREADY ON THE JOB.

Predictive validity

- A form of criterion validity in which test scores of applicants are compared at a LATER DATE w/ a measure of job performance
- The test is administered to a group of job applicants who are GOING TO BE HIRED. The test scores are then compared w/ FUTURE M

Concurrent design

MOST CRITERION VALIDITY STUDIES use a _______

Concurrent design

_______ is WEAKER than predictive design due to HOMOGENEITY of PERFORMANCE SCORES.

Restricted

________ NARROW range of performance scores that makes obtaining a significant validity coefficient DIFFICULT (ex. Very few employees are at the extremes of a performance scale)

Validity Generalization (VG)

A MAJOR ISSUE concerning the CRITERION validity of tests. The extent to w/c inferences from test scores from one organization CAN BE APPLIED to ANOTHER organization

SAMPLING ERROR

The test being valid in one location but NOT in another was the product primarily of ________ (use of SMALL sample sizes)

LARGE

With _______ sample sizes, a test found valid in one location probably will be VALID in another, providing that the jobs actually are SIMILAR and are not merely 2 separate jobs sharing the same job title.

1. Meta-analysis
2. Job analysis

2 Building blocks for VG

Meta-analysis

Can be used to determine the AVERAGE validity of SPECIFIC types of tests for a variety of jobs

Job analysis

VG SHOULD be used only if a _________ has been conducted, the results of w/c show that the job in question is similar to those used in meta-analysis

ADVERSE IMPACT

Though VG is generally accepted by the scientific community, federal enforcement agencies such as OFCCP SELDOM accept VG as a SUBSTITUTE for a local validation study if the test is shown to have _______

SINGLE-GROUP VALIDITY

The CHARACTERISTIC of a TEST that SIGNIFICANTLY PREDICTS a CRITERION FOR ONE CLASS of ppl BUT NOT FOR ANOTHER

SYNTHETIC VALIDITY

A form of VG in w/c validity is inferred on the basis of a MATCH between JOB COMPONENTS and tests PREVIOUSLY found valid for those job components

SYNTHETIC VALIDITY

Based on the assumption that tests that predict a particular component (e.g., customer service) of one job (e.g., a call center for a bank) should predict performance on the same job component for a different job (e.g., receptionist at a law office)

Validity generalization

We are trying to generalize the results of studies conducted on a PARTICULAR JOB to the SAME JOB at ANOTHER ORGANIZATION

Synthetic validity

We are trying to generalize the results of studies of DIFFERENT JOBS to a job that shares a COMMON COMPONENT (e.g., prob solving, customer service skills, mechanical ability)

Construct validity

MOST THEORETICAL of the validity types. Basically, it is defined as the extent to w/c a test actually MEASURES the CONSTRUCT that it PURPORTS TO MEASURE

CONSTRUCT, CONTENT

_______ validity is concerned w/ inferences about TEST SCORES, in contrast to ________ validity, w/c is concerned w/ inferences about TEST CONSTRUCTION

Construct validity

Usually determined by correlating scores on a test w/ scores from other tests, such as CONVERGENT and DISCRIMINANT validity.

CONVERGENT validity, DISCRIMINANT validity

________ measures the SAME construct while __________ DO NOT measure the same construct

Known-group validity

Another method of measuring CONSTRUCT validity. This method is NOT COMMON and should be used only when other methods for measuring construct validity are not practical.

Known-group validity

- A form of validity in w/c test scores from 2 CONTRASTING GROUPS "known" to DIFFER on CONSTRUCT are compared
- Ex.: Administering an honesty test to priests and convicts
- If the known groups DO NOT DIFFER on test scores, consider that test INVALID. If s

SITUATION

There is NO BEST METHOD among the 3 common ways of measuring validity. IT DEPENDS on the _________ and what the person conducting the validity is TRYING TO ACCOMPLISH

CONTENT validity, CRITERION validity

If it is to decide whether the test will be USEFUL PREDICTOR OF EMPLOYEE PERFORMANCE, then ________ will usually ne used, and a _________ study will also be conducted if there are ENOUGH EMPLOYEES and if a GOOD MEASURE of job performance is available.

Next-door-neighbor rule

In deciding whether CONTENT VALIDITY is ENOUGH, organizations are advise to use the ________. That is, ask urself "If my next-door neighbor were on a jury and I had to justify the use of my test, would content validity be enough?

1. Good test
2. Good measure of performance
3. Decent sample size

To get a significant validity coefficient, you need:

.20 to .30

Most VALIDITY COEFFICIENTS are small, in the range of

TEST

A _____ itself can NEVER be VALID.

TEST SCORES

When we speak of VALIDITY, we are speaking about the validity of the ________ as they RELATE to a PARTICULAR JOB. A test may be a valid predictor of tenure for counselor but not of performance for shoe salespeople.

PARTICULAR JOB and PARTICULAR CRITERION

a TEST is considered VALID for a __________ and a __________. NO TEST WILL EVER BE VALID FOR ALL JOBS AND ALL CRITERIA.

Face validity

Not one of the 3 major methods of determining test validity cited in the federal Uniform Guidelines on Employee Selection Procedures, it is still important.

Face validity

The extent to w/c a test APPEARS to be valid or job related. This perception is important bec if a test or its items DO NOT APPEAR valid, the test takers and administrators will NOT HAVE CONFIDENCE in the results. Their perception of its fairness decrease

MOTIVATION, TEST PERFORMANCE

FACE-VALID tests resulted in high levels of test-taking ________, w/c in turn resulted in higher lvls of _______. Thus, face validity MOTIVATES applicants to do well on tests.

1. Decrease the chance of lawsuits
2. Reduce the # of applicants dropping out of the employment process
3. Increase in the chance that an applicant will accept a job offer.

Advantages of FACE-VALID tests

Applicants might be tempted to FAKE the test because the correct answers are obvious

Disadvantage of FACE-VALID tests

Informing, multimedia, honest

The face validity and acceptance of test results can be INCREASED by ______ the applicants HOW a test RELATES to job PERFORMANCE and by administering the test in s ________ format. Acceptance of test results also INCREASES when applicants receive _____ fe

DOES NOT

But just because a test has face validity ______ mean it is accurate or useful.

Barnum statements

Statements so general that they CAN BE TRUE of ALMOST EVERYONE. Statements such as those used in astrological forecasts. (Ex., If I described u as being "Sometimes being sad, sometimes being successful, and at time not getting along w/ ur best friend.")

NOT ENOUGH

Face validity by itself is _______

Validity coefficient

The CORRELATION between SCORES on a SELECTION METHOD (e.g., interview, cognitive ability test) and a measure of JOB PERFORMANCE (e.g., supervisor rating, absenteeism)

19th Mental Measurements Yearbook (MMY)

The MOST COMMON SOURCE of test info w/c contains info on over 2,700 psychological tests as well as reviews by test experts.

Test in Print VIII

Another excellent source of info of psychological tests w/c is a compendium entitled

COST

If 2 or more tests have SIMILAR VALIDITIES, the ____ should be considered.

Group testing

Usually less expensive and more efficient than individual testing, although important information may be lost in ____________.

Computer-assisted testing

An applicant takes a test online, the computer scores the test, and the results of the test and interpretation are immediately available.

Online testing

Many public and private employers are switching to this method because it can LOWER TESTING COSTS, DECREASE FEEDBACK TIME, and yield results in w/c the test takers can have great confidence

SIMILAR

Test administered electronically seem to yield results ______ to those administered through the traditional paper-and-pencil format

Computer-adaptive testing (CAT)

An increasing common use of computer testing. A type of test taken on computer in w/c the computer ADAPTS the DIFFICULTY LEVEL of questions asked to the test taker's SUCCESS in ANSWERING QUESTIONS. (ex., If the test taker successfully answered three multi

CAT

The logic behind ____ is that if a test taker can't answer easy questions, it doesn't make sense to ask difficult questions.

1. Fewer test items are required
2. Tests take less time to complete
3. Finer distinctions in applicant ability can be made
4. Immediate feedback is given
5. Test scores are interpreted based on the # of questions answered correctly and on which questions

ADVANTAGES of CAT

1. Wonderlic Personnel Test
2. WAIS

In selecting POLICE OFFICERS, it is common to use COGNITIVE ABILITY such as the __________ or the _____________.

AVERAGE DIFFICULTY

In CAT, computer starts by asking questions of _________.

NOT NECESSARILY

Even when a test is BOTH reliable and valid, it is ______ useful.

1. Taylor-Russel Tables
2. Expectancy charts
3. Lawshe tables
4. Utility formula

Formulas and tables used in establishing the USEFULNESS of a SELECTION DEVICE

Taylor-Russel tables

Provide an estimate of the PERCENTAGE of TOTAL NEW HIRES who will be SUCCESSFUL employees if a test is adopted (ORGANIZATIONAL SUCCESS)

1. Expectancy charts
2. Lawshe tables

Provide a probability of SUCCESS for a PARTICULAR APPLICANT based on test scores (INDIVIDUAL SUCCESS)

Utility formula

Provides an estimate of the AMOUNT OF MONEY an organization will SAVE if it adopts a new testing procedure

Taylor-Russell Tables

Designed to estimate the PERCENTAGE of FUTURE EMPLOYEES who will be SUCCESSFUL on the job if an organization uses a particular test

Taylor-Russell tables

A series of tables based on the SELECTION RATIO, BASE RATE, and TEST VALIDITY that yield info about the percentage of FUTURE EMPLOYEES who will be SUCCESSFUL if a particular test is used.

1. Test is VALID
2. The organization can be SELECTIVE in its hiring because it has more applicants than openings
3. There are plenty of CURRENT employees who are NOT PERFORMING well, thus there is room for improvement

The philosophy behind TAYLOR-RUSSELL tables is that a test will be useful to an organization if:

1. Criterion validity coefficient
2. Selection ratio
3. Base rate of current performance

To use the TAYLOR-RUSSELL tables, 3 pcs of info must be obtained:

1. Conduct a criterion validity study
2. Validity generalization

2 ways to obtain CRITERION VALIDITY COEFFICIENT

Criterion validity study

This is conducted to obtain a criterion validity coefficient. This method is conducted with test scores correlated w/ some measure of JOB PERFORMANCE.

Validity generalization

Often, however, an organization wants to know whether testing is useful before investing time and money in criterion validity study.

HIGHER, USEFUL

The ________ the validity coefficient, the GREATER the possibility the test will be ______

Selection ratio

The PERCENTAGE of APPLICANTS an organization HIRES

(Number hired)/(Number of applicants)

formula for SELECTION RATIO

LOWER

The ______ the SELECTION RATIO, the GREATER the POTENTIAL USEFULNESS of the test

Base rate

Percentage of CURRENT EMPLOYEES who are considered SUCCESSFUL

1. Employees are SPLIT into 2 equal groups
2. To choose a CRITERTION MEASURE SCORE ABOVE which all employees are considered successful

BASE RATE of current performance is usually obtained in one of two way:

Employees are split into 2 equal groups

The MOST SIMPLE but LEAST ACCURATE method of obtaining the BASE RATE of current performance. Employees are split based on their SCORES on some criterion such as tenure of performance.

.50

The BASE RATE when the method of SPLITTING employees into 2 equal groups is used, is ______ because one-half of the employees are considered satisfactory.

To choose a CRITERTION MEASURE SCORE ABOVE which all employees are considered successful

- The MORE MEANINGFUL method
- Example: At one real estate agency, any agent who sells more than $700,000 of properties makes a profit for the agency after training and operating expenses have been deducted. In this case, agents selling more than $700,000

Proportion of correct decisions

A utility method that COMPARES the percentage of times a SELECTION DECISION was ACCURATE w/ the percentage of SUCCESSFUL EMPLOYEES

Proportion of correct decisions

EASIER to do but LESS ACCURATE than the Taylor-Russell tables

1. Employee TEST SCORES
2. SCORES on the CRITERION

The only info needed to determine the PROPORTION OF CORRECTION DECISIONS

Quadrant I

Represent employees who SCORED POORLY on the TEST but PERFORMED WELL on the JOB (Poor test score + Performed well on the job)

Quadrant II

Represent employees who SCORED WELL on the test and were SUCCESSFUL on the JOB

Quadrant III

Represent employees who SCORED HIGH on the test, yet did POORLY on the JOB

Quadrant IV

Represent employees who SCORED LOW on the TEST and did POORLY on the JOB

Quadrants II and IV

If a TEST is a GOOD PREDICTOR of performance, there should be more points in QUADRANTS ____ and ____

Quadrants I and III

PREDICTIVE FAILURES." NO CORRESPONDENCE is seen between test scores and criterion scores

Percentage of TIME

- (QII + QIV)/(Total points in ALL quadrants)
- To estimate the TEST's EFFECTIVENESS
- Represents the _____ that we expect to be accurate in making a selection decision in the future

Satisfactory performance baseline

- (QI + QII)/(Total points in ALL quadrants)
- To determine whether this is an IMPROVEMENT

INCREASE SELECTION ACCURACY

If, percentage of TIME > SATISFACTORY PERFORMANCE BASELINE, the proposed test should _________. If NOT, it is probably better to STICK w/ the CURRENT SELECTION METHOD used.

Taylor-Russel tables, Lawshe tables

_________ were designed to determine the OVERALL impact of testing procedure. _________ to know the probability that a PARTICULAR APPLICANT will be successful.

Lawshe tables

Tables that use the BASE RATE, TEST VALIDITY, and APPLICANT PERCENTILE on a test to determine the probability of FUTURE SUCCESS for that applicant

1. Validity coefficient
2. Base rate
3. Applicant's test score

To use LAWSHE TABLES, 3 pcs of info are needed:

Lawshe tables

Did the person score in the top 20%, the next 20%, the middle 20%, the next lowest 20%, or the bottom 20%?

Brodgen-Cronbach-Gleser utility formula

Another way to determine the value of a test in a given situation is by computing the AMOUNT of MONEY an organization would SAVE if it used the test to select employees.

Utility formula

A fairly simple formula devised by I/O psychologists that estimates the MONETARY SAVINGS to an organization

Utility formula

Method of ASCERTAINING the extent to w/c an organization will BENEFIT from the use of a particular SELECTION system

1. Number of employees hired per year (n)
2. Average tenure (t)
3. Test validity (r)
4. SD of performance in dollars
5. Mean standardized predictor score of selected applicants (m)

5 items of info that must be known to use the UTILITY FORMULS

Number of employees hired per year (n)

easy to determine, simply the number of employees who are hired for a given position in a year

Average tenure (t)

Average amount of time that employees in the position TEND TO STAY with the company

Tenure

The LENGTH OF TIME an employee has been w/ an organization

Test validity (r)

The CRITERION VALIDITY coeff that was obtained through either VALIDITY STUDY or VG

Mean standardized predictor score of selected applicants (m)

Obtained in 2 ways:
1. To obtain the AVERAGE SCORE on the selection test for both the applicants who are hired and the applicants who are not hired.
2. To compute the PROPORTION OF APPLICANTS who are hired. This method is used when an organization plans t

HIGH UTILITY ESTIMATES

The cost of daily poor performance, combined with the cost of occasional mistakes provides support for the validity of _____________.

1. Measurement bias
2. Predictive bias

Determining the FAIRNESS of a test

Ensure that the test is a FAIR and UNBIASED

Once a test has been determined to be reliable and valid, and to have utility for an organization, the next step is to:

Test fairness

Most professionals agree that one must consider potential RACE, GENDER, DISABILITY, and other CULTURAL DIFFERENCES in both content of the test (MEASUREMENT BIAS) and the way in w/c scores from the test predict job performance (PREDICTIVE BIAS)

Measurement bias

Group differences in test scores that are UNRELATED to the CONSTRUCT being measured

Measurement bias

- Refers to TECHNICAL ASPECTS of a test. A test is considered to have ________ if there are GROUP DIFFERENCES (e.g., sex, race, or age) in test scores that are unrelated to the construct being measured
- Ex., If a test includes vocabulary words found more

Adverse impact

If differences in test scores result in one group (e.g., men) being selected at a SIGNIFICANTLY HIGHER rather than another (e.g., women). The burden is on the ORGANIZATION using the test to prove that the test is valid

Adverse impact

An employment practice that results in members of a PROTECTED CLASS being NEGATIVELY AFFECTED at a HIGHER RATE than members of the majority class.

Four-fifths rule

ADVERSE IMPACT is usually determined by _______

LESS THAN 80%

ADVERSE IMPACT occurs if the selection rate for any group is ______ of the highest scoring group (PRACTICAL SIGNIFICANCE) and the difference is statistically significant (STATISTICAL SIGNIFICANCE)

JOB RELATEDNESS, JOB-RELATED TEST

A LEGAL DEFENSE for adverse impact is __________ and that a VALID test is a ____________

Fairness

- Can include BIAS, but also includes POLITICAL and SOCIAL ISSUES
- Ex., Equal chance of being hired. Test is fair if it can predict performance equally well for all races, genders, and national origins.

Predictive bias

A situation in w/c the predicted level of success FALSE FAVORS one group over another.

Predictive bias

Ex., If men scored higher on the test than women but the job performance of women was equal to or better than that of men.

1. Single-group validity
2. Differential validity

2 Forms of PREDICTIVE BIAS

Single-group validity

- The characteristic of a test significantly PREDICTS PERFORMANCE FOR ONE GROUP and NOT OTHERS
- Valid only for 1 group
- Happens by CHANCE

Single-group validity

Ex., A test of reading ability might predict performance of White clerks but not of African American clerks

BOTH correlations are SIGNIFICANT

Test does NOT exhibit single-group validity

ONE correlation is significant

Test is FAIR for only that one group

1. Small sample sizes
2. Other methodological problems

Single-group validity is VERY RARE and is usually the result of:

Disregarding single-group validity

Most appropriate choice when single-group validity occurs

Differential validity

- A test is VALID FOR 2 GROUPS but MORE VALID FOR ONE than for the other
- If a test does not lead to adverse impact, does not have single-group validity, and does not have differential validity, it is considered to be FAIR
- To be used w/ complete confid

Differential validity

The characteristic of a test that significantly predicts a criterion for 2 groups, such as both minorities and nonminorities, but predicts significantly better for one of two groups

Perception

Another important aspect of test fairness is the _____ of fairness held by the applicants taking the test. That is, a test may not have measurement/predictive bias, but applicants might perceive the test itself or the way in w/c the test is administered a

1. Difficulty of the test
2. Amount of time allowed to complete the test
3. Face validity of the test items
4. Manner in which hiring decisions are made
5. Policies about retaking the test
6. The way in which requests for testing accommodations for disabi

Factors that MIGHT AFFECT applicants' PERCEPTION of FAIRNESS

Selection

Looking for the RIGHT PERSON

Placement

Looking for the RIGHT JOB

Nepotism

Preference for HIRING RELATIVES of CURRENT employees

Qualified workforce

The percentage of ppl in a given GEOGRAPHIC AREA who have the QUALIFICATIONS (skills, educ, etc) to perform a certain job

COMBINED

If MORE THAN ONE criterion-valid test is used, the scores on the tests must be _______. Usually, this is done by a statistical procedure known as MULTIPLE REGRESSION

Multiple regression

A statistical procedure in w/c the scores from more than one criterion valid test are weighted according to HOW WELL EACH TEST SCORE PREDICTS THE CRITERION

1. Unadjusted top-down selection
2. Rules of three
3. Passing scores
4. Banding

LINEAR APPROACHES to hiring take one of four forms

Top-down selection

Selecting applicants in STRAIGHT RANK ORDER of their test scores

Top-down selection

Applicants are rank-ordered on the basis of their test scores. Selection is then made by STARTING w/ the HIGHEST score and moving down until all openings have been filled.

By hiring the top scores on a valid test, an organization will GAIN THE MOST UTILITY

ADVANTAGE of TOP-DOWN selection

1. Can result in HIGH LEVELS of ADVERSE IMPACT
2. REDUCES and organization's FLEXIBILITY to use nontest factors such as references or organizational fit

DISADVANTAGES of TOP-DOWN selection

Compensatory approach

A method of making selection decisions in which a HIGH SCORE on ONE TEST can COMPENSTAE for a LOW SCORE on another test (ex., A high GPA might compensate for a low GRE score)

Multiple regression

To determine whether a score on one test can compensate for a score on another, ________ is used in w/c each test score is weighted accdg to HOW WELL IT PREDICTS THE CRITERION

Contamination

The condition in w/c a criterion score is AFFECTED by the things OTHER than those UNDER THE CONTROL of the employee

Top-down selection

- Who will PERFORM THE BEST?
- Selecting applicants in straight rank ORDER of their TEST SCORES
- STARTING with the HIGHEST score and MOVING DOWN until all openings have been filled

Rule of three or five

A variation on top-down selection in w/c the names of the TOP 3 or 5 applicants are given to a HIRING AUTHORITY (e.g., police chief, HR director) who can then select any of the 3 or 5

PUBLIC SECTOR

Rule of three or five is often used in the _________

Rule of three or five

This method ensures that the person hired WILL BE WELL QUALIFIED but PROVIDES MORE CHOICE than does top-down selection

Possibly HIGHER QUALITY of selected APPLICANTS and OBJECTIVE DECISION MAKING

ADVANTAGES of RULE OF THREE Or FIVE

1. Less flexibility in decision making
2. IGNORES measurement error
3. Assumes TEST SCORE ACCOUNTS for ALL the VARIANCE in PERFORMANCE

DISADVANTAGES of RULE OF THREE or FIVE

Passing scores approach

The MINIMUM TEST SCORE that an applicant must achieve to be CONSIDERED for HIRE

Passing scores approach

Are a means of REDUCING ADVERSE IMPACT and INCREASING FLEXIBILITY. With this system, an organization determines the LOWEST SCORE on a test is associated w/ ACCEPTABLE PERFORMANCE on the job

TOP-DOWN selection, PASSING SCORES

______________: "Who will perform the BEST in the future?" while, _____________: "Who will be able to perform at an ACCEPTABLE LEVEL in the future?

Affirmative action goals

Use of PASSING SCORES allow us to reach our _________, w/c would NOT HAVE BEEN MET w/ TOP-DOWN SELECTION

Passing score

A point of distribution of scores that DISTINGUISHES ACCEPTABLE from UNACCEPTABLE performance

Multiple-cutoff approach or multiple hurdle approach

If there is MORE THAN ONE TEST for w/c we have passing scores, a decision must be made regarding the use of a _____________

Multiple-cutoff approach

- A selection strategy in which applicants must MEET or EXCEED the passing score on MORE THAN ONE SELECTION TEST.
- Applicants would be administered ALL of the TESTS at ONE TIME. If they failed any of the tests (below the passing score), THEY WOULD NOT BE

NOT LINEAR

MULTIPLE-CUTOFF APPROACH and MULTIPLE-HURDLE APPROACH are used when one score cannot compensate for another or when the relationship between the selection test and performance is ______

COST

One problem with MULTIPLE-CUTOFF APPROACH is the ______.

MULTIPLE-HURDLE APPROACHES

To REDUCE the COSTS associate w/ applicants failing one or more test, __________ are often used.

MULTIPLE-HURDLE APPROACH

- Selection practice of administering ONE TEST AT A TIME, usually beginning w/ the LEAST EXPENSIVE, so that applicants must past the test before being allowed to take the next test.

1. Many of the tests TAKE TIME to conduct or score
2. The LONGER the TIME BETWEEN SUBMISSION of a job application and the HIRING decision, the smaller the number of African American applicants who will remain in the applicant pool.

Reasons why MULTIPLE-HURDLE APPROACH is not always used

African American

_______ populations have HIGHER UNEMPLOYMENT rates than Whites, and ppl who are UNEMPLOYED are more HURRIED to obtain employment than PEOPLE WITH JOBS

MULTIPLE-HURDLE APPROACH

Because ___________ takes LONGER than multiple-CUTOFF, it may bring an UNINTENDED ADVERSE IMPACT, and affirmative action goals MAY NOT BE MET

Cutoff approach

A method of hiring in w/c an applicant MUST SCORE HIGHER than a particular score to be considered employment

Uniform Guidelines (1978) Sec. 5H

Passing scores should be REASONABLE and CONSISTENT with expectations of ACCEPTABLE PROFICIENCY

1. INCREASED FLEXIBILITY in decision making
2. LESS ADVERSE impact against protected grps

ADVANTAGES of PASSING SCORES approach

1. LOWERED UTILITY
2. Can be DIFFICULTY to SET

DISADVANTAGES of PASSING SCORES approach

Composite score

A SINGLE SCORE that is the SUM of the SCORES of SEVERAL ITEMS or DIMENSIONS

TOP-DOWN hiring, PASSING SCORES

A problem with _________ is that processes results in the HIGHEST LEVEL of ADVERSE IMPACT. On the other hand, use of ________ DECREASES adverse impact but REDUCES UTILITY.

BANDING

As a COMPROMISE bet top-down hiring and passing scores, _____ attempts to HIRE the TOP SCORERS while still allowing some FLEXIBILITY FOR AFFIRMATIVE ACTION

Banding

A statistical technique based on the SEM that allows SIMILAR TEST SCORES to be GROUPED

ERROR

BANDING takes into consideration the degree of _____ associated w/ any test score

SEM

The number of points that a test score could be off due to test unreliability. SD and reliability values are needed to obtain SEM

Banding

Hire ANYONE WITHIN a "HIRING BAND

1. Standard error of the test
2. Other statistical criteria

The WIDTH of the band is based upon the:

Banding

Can help to achieve certain hiring goals such as improving diversity

1. Increase workforce diversity and perceptions of fairness
2. Allows you to consider secondary criteria relevant to the job

ADVANTAGES of BANDING

Standard error

How may points apart do 2 applicants have to be before we say their test scores are significantly different

DEGREE OF ERROR

BANDING takes into consideration the ______ associated with any test score

CHANCE (error)

Differences in scores in BANDING might be the result of _______ rather than actual differences in ability

Banding

How many POINTS APART do two applicants have to be before we say their test scores are significantly different?

MINORITIES

Only selecting ______ in a band would be illegal