HR Tests - Recruitment, assessment, and personnel selection: Race differences

Showing posts with label Race differences. Show all posts

Thursday, March 05, 2015

Mega research update

I hope you like research, because there's a lot of it coming your way...and many are free as of this posting!

Without further ado...

Let's start with the Journal of Applied Psychology, January 2015 issue:

- We see a lot of research involving large candidate groups, but much less for individuals. In this meta-analysis of individual assessments, the authors found support for their usefulness, but it varied significantly across studies. Highest validity was found for managerial jobs and assessments that included a cognitive ability test.

- Being in the wrong job can be frustrating for both the employee and the employers. In this study, the authors show a relationship between poor vocational fit and counterproductive work behaviors (CWBs).

- Speaking of CWB, there may be more of them going on than you would think based on the assessment literature...

- And even more on CWB! These authors found support for both self- and acquaintance-reported personality ratings, specifically conscientiousness and agreeableness, in predicting "workplace deviance".

- Unfortunately, gender bias still exists in selection. In this meta-analysis, the authors found this to be particularly the case in male-dominated jobs. On a positive note, they do suggest ways of mitigating this: provide clear evidence of the competence of applicants, encourage careful decision making, and use experienced raters.

- The over-/under-prediction of cognitive ability tests debate for different ethnic groups continues. In this study, the authors find support of overprediction for African Americans, suggesting the tests are not predictively biased.

Next, the March issue of J.A.P.:

- More support for the predictive validity of emotional intelligence, but more importantly, how the concept overlaps with other constructs such as the Big 5 and self-efficacy.

- All situational judgment tests (SJTs) are not equal, and according to these authors in a large number of instances the context that is presumably important? Not so much.

- Speaking of SJTs, these researchers suggest that putting the "situational" back in SJTs--i.e., assessing how the situation is analyzed rather than the response options--is a useful method.

- A fascinating update of effect size benchmarks that can be used for a variety of purposes.

- Trying to predict safety-related behavior? This research suggests that personality traits, particularly agreeableness, can usefully predict this behavior.

Moving on to the March issue of IJSA (free right now!):

- Some guidelines on preparing norms for personality inventories.

- Evidence that different cultures have different procedural justice perceptions of different selection mechanisms

- Some important findings on the equivalence and stability of job performance ratings over time

- Development of a new measure of subjective career success

- More evidence that both technical knowledge and prosocial knowledge are important factors in predicting medical student clinical performance

- This study found that CWBs are under-reported and organizational commitment increases the likelihood that peers will report them

- Evidence that forced-choice and Likert-type scales used in personality inventories have similar measurement properties

On to the Spring issue of Personnel Psych (also free right now!):

- This meta-analysis on narcissism showed that it is related to leadership emergence (through extraversion) and leadership effectiveness in a curvilinear fashion.

- More evidence of the importance of political skill--particularly the aspects of networking ability, interpersonal influence, and apparent sincerity--in predicting a range of important outcomes, including task performance beyond GMA and the Big 5. It would be interesting to see how this is related to emotional intelligence (yes this is a foreshadowing).

Turning to the March issue of Psych Bulletin:

- More on narcissism: this time, researchers found that men consistently report higher levels of narcissism compared to women, which is interesting when taken in combination with the study above.

In the December issue of Industrial and Organizational Psychology:

- The first focal article calls out researchers for using incorrect assumptions about criterion reliabilities, thus impacting criterion validity values. They make suggestions for how to improve meta-analyses moving forward.

- The second makes the important argument that utility analyses should consider measures of well-being when determining the effectiveness of interventions (such as an employment test).

Finally, in the January issue of JOB (also free right now):

- a proposal for improving the calculation and reporting of Cronbach's alpha

- a fascinating study showing that high conscientiousness may hinder performance during stressful situations

- in support of EI, this study found a link between emotion recognition ability and income (interestingly through political skill and interpersonal facilitation...remember the earlier study on political skill?).

That's all for now!

Monday, October 27, 2014

Just kidding...more research update!

Seriously? Just yesterday I did my research update, ending with a note that the December 2014 issue of the International Journal of Selection and Assessment should be out soon.

Guess what? It came out today.

So that means--you guessed it--another research update! :)

- First, a test of Spearman's hypothesis, which states that the magnitude of White-Black mean differences on tests of cognitive ability vary with the test's g loading. Using a large sample of GATB test-takers, these authors found support for Spearman's hypothesis, and that reducing g saturation lowered validity and increased prediction errors.

So does that mean practitioners have to choose between high-validity tests of ability or increasing the diversity of their candidate pool? Not so fast. Remember...there are other options.

- Next, international (Croatian) support for the Conditional Reasoning Test of Aggression, which can be used to predict counterproductive work behaviors. I can see this increasingly being something employers are interested in.

- Applicants that do well on tests have favorable impressions of them, while those that do poorly don't like them. Right? Not necessarily. These researchers found that above and beyond how people actually did on a test, certain individual differences predict applicant reactions, and suggest these be taken into account when designing assessments.

- Although personality testing continues to be one of the most popular topics, concerns remain about applicants "faking" their responses (i.e., trying to game the test by responding inaccurately but hopefully increase the chances of obtaining the job). This study investigates the use of blatant extreme responding, consistently selecting the highest or lowest response option, to detect faking, and looked at how this behavior correlated with cognitive ability, other measures of faking, and demographic factors (level of job, race, and gender).

- Next, a study of assessment center practices in Indonesia.

- Do individuals high in neuroticism have higher or lower job performance? Many would guess lower performance, but according to this research, the impact of neuroticism on job performance is moderated by job characteristics. This supports the more nuanced view that the relationship between personality traits and performance is in many cases non-linear and depends on how performance is conceptualized.

- ...which leads oh so nicely into the next article! In it, the authors studied air traffic controllers and found results consistent with previous studies--ability primarily predicted task performance while personality better predicted citizenship behavior. Which raises an interesting question: which version of "performance" are you interested in? My guess is for many employers the answer is both--which suggests of course using multiple methods when assessing candidates.

- Last but not least, an important study of using cognitive ability and personality to predict job performance in a three studies of Chilean organizations. Results were consistent with studies conducted elsewhere, namely ability and personality significantly predicted performance.

Okay, I think that's it for now!

Sunday, April 27, 2014

Mobile assessment comes of age + research update

The idea of administering employment tests on mobile devices is not new. But serious research into it is in its infancy. This is to be expected for at least two reasons: (1) historically it has taken a while with new technologies to have enough data to analyze (although this is changing), and (2) it takes a while for researchers to get through the arcaneness of publishing (this, to my knowledge, isn't changing, but please prove me wrong).

Readers interested in the topic have benefited from articles elsewhere, but we're finally at a point where good research is being published on this topic. Case in point: the June issue of the International Journal of Selection and Assessment.

The first article on this topic in this issue, by Arthur, Doverspike, Munoz, Taylor, & Carr, studied data from over 3.5 million applicants who completed unproctored internet-based tests (UIT) over a 14-month period. And while the percentage that completed them on mobile devices was small (2%), it still yielded data on nearly 70,000 applicants.

Results? Some in line with research you may have seen before, but some may surprise you:

- Mobile devices were (slightly) more likely to be used by women, African-Americans and Hispanics, and younger applicants. (Think about that for a minute!)

- Scores on a personality inventory were similar across platforms.

- Scores on a cognitive ability test were lower for those using mobile devices. Without access to the entire article, I can only speculate on proffered reasons, but it's interesting to think about whether this is a reflection of the applicants or the platform.

- Tests of measurement invariance found equivalence across platforms (which basically means the same thing(s) appeared to be measured).

So overall, in terms of using UITs, I think this is promising in terms of including a mobile component.

The next article, by Morelli, Mahan, and Illingworth, also looked at measurement variance of mobile versus non-mobile (i.e., PC-delivered) internet-based tests, with respect to four types of assessment: cognitive ability, biodata, a multimedia work simulation, and a text-based situational judgment test. Data was gathered from nearly 600,000 test-takers in the hospitality industry who were applying for maintenance and customer-facing jobs in 2011 and 2012 (note the different job types). Nearly 25,000 of these applicants took the assessment on mobile devices.

Results? The two types of administrations appeared be equivalent in terms of what they were measuring. However, interestingly, mobile test-takers did worse on the SJT portion. The authors reasonably hypothesize this may be due to the nature of the SJT and the amount of attention it may have required compared to the other test types. (btw this article appears to be based on Morelli's dissertation, which can be found here--it's a treasure trove of information on the topic)

Again, overall these are promising results for establishing the measurement equivalence of mobile assessments. What does this all mean? It suggests that unproctored tests delivered using mobile devices are measuring the same things as tests delivered using more traditional internet-based methods. It also looks like fakability or inflation may be a non-issue (compared to traditional UIT). This preliminary research means researchers and practitioners should be more confident that mobile assessments can be used meaningfully.

I agree with others that this is only the beginning. In our mobile and app-reliant world, we're only scratching the surface not only in terms of research but in terms of what can be done to measure competencies in new--and frankly more interesting--ways. Not to mention all the interesting (and important) associated research questions:

- Do natively developed apps differ in measurement properties--and potential--compared to more traditional assessments simply delivered over mobile?

- How does assessment delivery model interact with job type? (e.g., may be more appropriate for some, may be better than traditional methods for others)

- What competencies should test developers be looking for when hiring? (e.g., should they be hiring game developers?)

- What do popular apps, such as Facebook (usage) and Candy Crush (score), measure--if anything?

- Oh, and how about: does mobile assessment impact criterion-related validity?

Lest you think I've forgotten the rest of this excellent issue...

- Maclver, et al. introduce the concept of user validity, which uses test-taker perceptions to focus on ways we can improve assessments, score interpretation, and the provision of test feedback.

- Bing, et al. provide more evidence that contextualizing personality inventory items (i.e., wording the items so they more closely match the purpose/situation) improves the prediction of job performance--beyond noncontexual measures of the same traits.

- On the other hand, Holtrop, et al. take things a step further and look at different methods of contextualization. Interestingly, this study of 139 pharmacy assistants found a decrease in validity compared to a "generic" personality inventory!

- This study by Ioannis Nikolaou in Greece of social networking websites (SNWs) that found job seekers still using job boards more than SNWs, that SNWs may be particularly effective for passive candidates (!), and that HR professionals found LinkedIn to be more effective than Facebook.

- An important study of applicant withdrawal behavior by Brock Baskin, et al., that found withdrawal tied primarily to obstructions (e.g., distance to test facility) rather than minority differences in perception.

- A study of Black-White differences on a measure of emotional intelligence by Whitman, et al., that found (N=334) Blacks had higher face validity perceptions of the measure, but Whites performed significantly better.

- Last, a study by Vecchione that compared the fakability of implicit personality measures to explicit personality measures. Implicit measures are somewhat "hidden" in that they measure attitudes or characteristics using perceptual speed or other tools to discover your typical thought patterns; you may be familiar with project implicit, which has gotten some media coverage. Explicit measures are, as the name implies, more obvious items--in this case, about personality aspects. In this study of a relatively small number of security guards and semiskilled workers, the researchers found the implicit measure to be superior in terms of fakability resistance. (I wonder how the test-takers felt?)

That's it for this excellent issue of IJSA, but in the last few months we also got some more great research care of the March issue of the Journal of Applied Psychology:

- An important (but small N) within-subjects study by Judge, et al. of the stability of personality at work. They found that while traits exhibited stability across time, there were also deviations that were explained by work experiences such as interpersonal conflict, which has interesting implications for work behavior as well as measurement. In addition, the authors found that individuals high in neuroticism exhibited more variation in traits over time compared to those who were more emotionally stable. You can find an in press version here; it's worth a read, particularly the section beginning on page 47 on practical implications.

- Smith-Crowe, et al. present a set of guidelines for researchers and practitioners looking to draw conclusions from tests of interrater agreement that may assume conditions that are rarely true.

- Another interesting one: Wille & De Fruyt investigate the reciprocal relationship between personality and work. The researchers found that while personality shapes occupational experiences, the relationship works in both directions and work can become an important source of identity.

- Here's one for you assessment center fans: this study by Speer, et al. adds to the picture through findings that ratings taken from exercises with dissimilar demands actually had higher criterion-related validity than ratings taken from similar exercises!

- Last but not least, presenting research findings in a way that is understandable to non-researchers poses an ongoing--and important--challenge. Brooks et al. present results of their study that found non-traditional effect size indicators (e.g., a common language effect size indicator) were perceived as more understandable and useful when communicating results of an intervention. Those of you that have trained or consulted for any length of time know how important it is to turn correlations into dollars or time (or both)!

That's it for now!

Monday, February 18, 2013

Research update addendum

Okay, I know I just did a research update but but I had a couple stragglers, including a pretty important one, namely the Spring 2013 Personnel Psychology, which is free right now!

- First, an important piece by Bobko and Roth updating d values for Black-White differences on selection measures. Their updated analysis indicates measures such as biodata and assessment centers may have d values as large as paper-and-pencil tests of cognitive ability. Personality measures still benefit from small differences. They include a helpful table that breaks down the values by construct, and they also include a list of factors that can impact d, such as job complexity and range restriction.

- Second, a fascinating study of aberrant personality tendencies and their impact on career outcomes conceptualized using the Five-Factor Model and measured using the NEO PI-R. More evidence that "dark side" personality traits are an important consideration in predicting career trajectories.

The other that just came out is the February 2013 Journal of Applied Social Psychology:

- First, a look at moderators of the relationship between employee weight and job-related outcomes.

- Second, a study that I think has implications for selection: looking at circumstances under which competitors copy their opponents choices. I've observed over and over again that when an employee gets a competing offer, suddenly their attractiveness increases. Perhaps not the same phenomenon, but worth exploring.

- Next, another study of discrimination, this time age discrimination in within- and between-career job changes. Results indicated discrimination against older workers was particularly pronounced when older applicants were making between- rather than within-career changes.

- Okay, I'm sensing a theme to this post. In this study, the authors looked at how the wording of occupational descriptions activates gender stereotypes.

- Finally, something not about discrimination: the authors of this (small N) study found that a perceived aspect of emotional intelligence predicted perceived negotiation success beyond traditional personality traits.

And on that note...until next time!

Tuesday, April 26, 2011

Research update: Political skill, stereotype threat, and NFL players

A few research articles for us...

First up, several articles from the latest issue of Human Performance:

Lee & Dalal demonstrate in a policy-capturing study that performance "troughs" exceed "peaks" in their influence on performance ratings.

Next, a fascinating study by Meurs et al. where they show how political skill (or networking ability) moderates the relationship between the HEXACO factor of sincerity and task performance. In other words, for individuals high on political skill the authors found a positive relationship between sincerity and task performance (and a negative relationship for those low on the skill).

Are you recruiting highly educated graduates? Then you'll want to read Jaidi, et al.'s piece. In it, they describe a study where recruitment advertising and positive word of mouth related positively to job pursuit intention and behavior. Somewhat surprisingly, on-campus presence related negatively to these measures.

If you like football and/or physical ability tests, you'll be interested in the study by Lyons, et al. of NFL players. In it, they demonstrate that collegiate game performance out-predicted physical ability tests administered during the NFL Combine when looking at future NFL performance. And unlike physical ability, past performance remained a consistent predictor across four years of performance, although the criterion coefficients deteriorated over time, similar to what we find with cognitive ability scores.

Finally, over in the Journal of Applied Social Psychology, Nadler & Clark report the results of research on stereotype threat. The results of their meta-analysis indicated that attempts to nullify stereotype threat (e.g., by dismissing it or disguising the task) resulted in a moderate improvement in score (d=.52) for both African Americans and Hispanic Americans, and there appeared to be no difference between the groups in terms of the effect.

Sidenote: those of you with an interest in HR technology and talent management might want to check out the six sessions being streamed live from Bersin & Associates' IMPACT 2011 conference on April 27th and 28th.

Monday, January 24, 2011

January, 2011 J.A.P.: Interests, trainability tests, interns, and personality tests

The January issue of the Journal of Applied Psychology is out with some great content, so let's jump right in.

First up, an intriguing study by Van Iddekinge, et al. on an interest inventory. Even though many suspect vocational interest plays a part in motivating work behavior, historically the published relationship between interest and job performance has not been strong. The authors of this study created a new measure of interest and using a decent sample size (418) found surprisingly high (corrected) correlation values between scores and various criteria, including job knowledge, job performance, and continuance intentions (mean R = .31). Scores also predicted additional variance beyond cognitive ability and measures of the Big 5 personality dimensions. Could we be on the cusp of a revolution in measures of interest? This could help bridge the gap between KSAs and discretionary effort.

Next, Roth et al. with an update on trainability tests--their predictive validity as well as Black-White score differences. As a refresher, trainability tests are a sub-category of work sample tests that involve a structured period of learning for applicants and are designed to measure how well they can learn a new skill. Previous (limited) research indicated they predict training performance fairly well but job performance less so, and this decreases over time. However, the authors of the current study show using data from a recent video-based trainability exam that the validity may be higher than we thought. Unfortunately it also showed a high level of mean differences between Black and White applicants, matching or exceeding that typically found for cognitive ability (which the test correlated highly with).

Did someone say personality testing? (no, but you knew it was coming) Le et al. are up next with an update on the curvilinear relationship between personality scores and job performance. Using two different samples, the authors found not only the hypothesized curvilinear relationship but that the inflection point (after which the relationship disappears) occurs later in jobs that are more complex--similar to the relationship between experience and performance. So for example, scores on Conscientiousness may correlate with job performance (and OCBs, and CWBs) for higher scores than, say, a retail clerk. Important for anyone making assumptions about what personality inventory scores imply.

Next up is your second personality test article, this time by Landers et al., who provide a warning about faking. The authors noticed a new trend in responses, which they label "blatant extreme responding" (BER; not listed as an X Games sport), indicated by answering all "1"s or "5"s on an inventory. They hypothesize that this is due to a coaching rumor, which seems to have been supported by the fact that internal retesters showed a higher prevalence of BER than the general sample. On the plus side, an interactive warning seems to have reduced the spread. Hard to tell if this is anything new, since we know faking does indeed occur--the debate is over its impact on validity.

Last but not least, Zhao and Liden write about internship programs and the impression management that occurs on the part of interns as well as the organizations. Not surprisingly, interns that wished to get hired by the organization were more likely to use self-promotion and ingratiation, which increased the likelihood of receiving a job offer. Perhaps more interesting is the finding that organizations wishing to hire the interns permanently exhibited more openness to creativity on the part of the interns, which in turn increased the likelihood that interns would apply. Lesson? If you have a good intern that you want to bring on full-time, solicit and be open to their suggestions.

Friday, July 16, 2010

July 2010 J.A.P.

A new round of journals is out, so let's start with the June issue of the Journal of Applied Psychology.

First up, Schleicher et al. looked at whether there were demographic differences in how much candidate scores improved upon retesting. Turns out there were several. Whites showed larger improvements than Blacks or Hispanics on several assessments, particularly on written tests. Women and applicants under 40 showed greater improvements than men and applicants 40+. Implications? In some situations allowing applicants to retest may exacerbate adverse impact.

Next, an important piece by Aguinis et al. (that you can read here) about test bias. This follows on the heels of the June IOP articles on the same topic and seems to represent a resurgence of interest in a topic that seemed dormant. In this article the authors report the results of a very large Monte Carlo simulation (billions and billions of data points) where they found that if bias is measured using slope-based techniques, it's likely to go undetected, and intercept-based bias favoring minority group members is likely to be found when in fact it does not exist. This study, combined with points made in the IOP article suggest that some of the "established" conclusions regarding test bias may not be as solid as we thought.

Third, for those of you interested in differential functioning (of items or scales), you should check out the piece by Adam Meade where he presents a taxonomy of potential differential functioning effect sizes and also describes a software program created for computing the indices and graphing differential functioning.

Next, a piece by Wang et al. on locus of control. Importantly, they found that when locus of control (LOC) is specific to work-related issues, there are stronger correlations between LOC and work-related criteria such as job satisfaction and commitment. Similarly, when LOC is defined more broadly to include non-work issues, there are some stronger correlations with non-work criteria such as life satisfaction. Implications? Much like research on personality items, specifying a work-related context would seem to increase the predictive power of LOC measures.

Last but not least, an important article on counterproductive work behavior (CWB) and organizational citizenship behavior (OCB) by Spector, et al. CWB and OCB seem like they should be opposites of each other--one demonstrated by disengaged, unhappy workers, the other by engaged, happy ones--right? Not so fast. The authors report the results of an experiment that suggest that the concepts are unrelated and do not necessarily have opposite relationships with other variables. The authors also recommend that when measuring these behaviors, frequency of performance be used rather than level of agreement.

Sunday, May 23, 2010

June 2010 IJSA

The summer journal season continues with the June 2010 issue of the International Journal of Selection and Assessment. Take a deep breath, there's a lot of stuff packed into this issue:

- Roth et al. provide evidence that women outperformed men on work sample exams that involved social skills, writing skills, or a broad array of KSAs. To the extent that an employer is trying to avoid discriminating against female applicants, this provides support for work sample usage.

- In a study of managers in Taiwan, Tsai et al. show that the most effective way an applicant can make up for a slip in an interview is to apologize (vs. attempting to justify or use an excuse).

- Jackson et al. strive to add some clarity on task-based assessment centers

- Blickle & Schnitzler provide evidence of the construct and criterion-related validity of the political skill inventory

- Colarelli et al. studied how racial prototypicality and affirmative action policies impact hiring decisions. Results of a resume review indicated more jobs were awarded to black candidates as racial prototypicality and affirmative action policy strength increased, but stronger AA policies decreased the percentage of minority hires attributed to higher qualifications.

- In my personal favorite article of the issue, Karl et al. found in a study of U.S. and German students that those low on conscientiousness (especially), agreeableness, and emotional stability were more likely to post "Facebook Faux Pas". This provides some support for employers who screen out applicants based on inappropriate social networking posts. I'll talk more about this in my upcoming webinar.

- Denis, et al. provide support for the NEO PI-R's ability to predict job performance in two French-Canadian samples.

- Bilgiç and Acarlar report results of a study of Turkish students and perceptions of various selection instruments. Interviews were rated most highly and there were some differences in terms of privacy perceptions depending on the goal orientation of the student.

- Trying to figure out how to hire better direct support professionals (e.g., those providing long-term residential care or care to those with disabilities)? Robson, et al. describe the development of a composite predictor composed of various measures (e.g., agreeableness, numerical ability) that predicted performance, satisfaction, and turnover.

- Ahmetoglu et al. provide support for using the Fundamental Interpersonal Relationship Orientations-Behaviour (FIRO-B) to predict leadership capability.

- Ispas et al. describe results of a study that showed support for a nonverbal cognitive ability measure (the GAMA) in predicting job performance in two samples.

- Last but not least, in another win for context-specific assessments, Pace & Brannick show how a measure of openness to experience tailored to specific work outpredicted the comparable general NEO PI-R scale. IMHO this is how personality measures will eventually become more prominent and accepted as pre-hire assessments.

Sunday, January 31, 2010

Lessons from the NYC Fire case - part 1

Part 1 of 2

New York City, like the cities of New Haven and Chicago, has a long history of employment discrimination litigation related to its firefighter testing.

Since the 1970s and cases like Guardians, the city has been under scrutiny for its woefully low number of black firefighters.

In 2007 the city found itself faced with another lawsuit over its firefighter hiring practices, and in July of 2009, a U.S. District Court judge found that the city had violated Title VII by administering written exams from 1999-2007 that had high levels of adverse impact. The city marshaled an inadequate defense. In January of 2010, the same judge (Nicholas Garaufis) found the city liable for a pattern and practice of disparate treatment for those same exams. An adverse impact finding, particularly for written exams, and especially for public safety tests, is not earth-shattering. But a finding of disparate treatment in this situation is less common.

This case, while only one example and limited in its impact, has some valuable lessons for test users and sheds some light on how judges look at our field. In particular, I describe below nine points the judge specifically made and what lessons we can draw from them:

1) While the city conducted a job analysis with an "extensive" list of tasks and surveyed incumbents, the city offered "no evidence of 'the relationship of abilities to tasks.'" They conducted a linkage, but the judge found that the SMEs were confused about what they were supposed to do and didn't understand several of the abilities they were rating.

Lesson: simply having subject matter experts (SMEs) link essential tasks and knowledge, skills, and abilities (KSAs) is not sufficient. You need to ensure they understand the statements they are linking as well as how exactly they are supposed to be linking them.

2) In conducting the job analysis, the city inappropriately retained tasks and KSAs that could be learned on the job. It is quite clear (e.g., per the Uniform Guidelines) that only tasks and KSAs that are required upon entry to the job should be identified as critical in terms of exam development.

Lesson: make sure that when you are developing exams based on job analysis results that you focus only on those tasks and KSAs that are required upon entry to the job. This should be determined by your SMEs.

3) The city relied to some extent upon the work of a previous test developer, Dr. Frank Landy (who sadly recently passed away). In addition to a tenuous link between Dr. Landy's work and the current exams, the judge makes it clear that "reliance on the stature of a test-maker cannot stand in for a proper showing of validity." At the same time, the judge emphasizes that exams should be constructed by "testing professionals."

Lesson: tests should be developed by people who know what they're doing. This means HR professionals with the requisite background in test validation and construction in conjunction with job experts. Do not rely solely on previous efforts, particularly when (as in this case) the results of those efforts were either incomplete or not fully relevant to your current situation.

4) The city performed no "sample testing" to ensure that the questions were reliable as well as "comprehensible and unambiguous."

Lesson: few steps in the test development process are as easy--or as valuable--as pilot testing. I have yet to see an exam that didn't benefit from a "trial run" with a group of incumbents. Not only will you catch unintended flaws, you will verify that the exam is doing what you claim it is.

5) There was insufficient evidence that the exams actually measured the (nine cognitive) KSAs the city claimed they intended to measure. Plaintiffs were able to suggest the opposite through analyzing convergent and discriminant validity as well as by conducting a factor analysis.

Lesson: there are two linkages of primary importance in test development. The first was describe in #1. The second is the link between critical KSAs and the exam(s). At the very least, you must be able to show evidence that there is a logical link between the two. When you claim to be measuring cognitive abilities, you incur an additional responsibility, which is gathering statistical evidence that supports this claim.

Next time: more lessons and the relief order.

Monday, October 19, 2009

Is recruiting using SNS discriminatory?

I keep reading/hearing about how recruiting using social networking sites (SNS) opens employers up to discrimination lawsuits because of who uses the sites. For the most part, this just plain isn't true.

A recent Pew study is the latest to show that when it comes to using SNS like Facebook, MySpace, and LinkedIn, you really should have one primary demographic concern when it comes to ensuring a diverse candidate pool: age.

Not gender, at least not in traditional sense. While four years ago SNS users tilted slightly male (55%), the balance has essentially flipped today (54% female).

Not race, there simply do not appear to be generalizable differences in racial groups when it comes to these sites (in fact I've seen some data that suggest the user base on these sites is more diverse)--but things change, and this may vary with particular sites, so keep an eye on this one.

But when it comes to age, SNS users are disproportionately younger than the overall Internet population. In the words of the Pew report, "[this] doesn't mean that more older adults aren't flocking to SNS--they are--but younger adults are ALSO flocking to the sites, so the overall representation of the age cohorts in the SNS user population has actually gotten younger."

One demographic difference I don't see a whole lot about: disability status. Are individuals with disabilities more/less likely to use SNS? I think that's an important question we need to address if we're truly trying to diversity our candidate pools.

Tuesday, November 25, 2008

Giving thanks for research

It's almost Thanksgiving here in the U.S., a time to give thanks, and I'd like to thank a largely unsung group of people. Thank you to all the researchers out there who try to help us put some science around the art we call personnel recruitment and selection. Thank you for all your work and insights.

What better way to celebrate this wish of thanks than by talking about a new issue of the International Journal of Selection and Assessment (v16, #4)! As usual it's chalk full of good articles, so let's take a look at some of them.

First, a study of applicant perceptions of credit checks, something many of us do for sensitive positions. Using samples of undergraduates, Kuhn and Nielsen found mostly negative reactions, especially for older participants, but they varied with the explanation given as well as privacy expectations. Worth a look for any of you that conduct large numbers of background checks (and if you do, don't miss the Oppler et al. study below).

Next up, a fascinating study of police officer selection in the Netherlands. Using data from over 3,000 applicants, De Meijer et al. found evidence for differential validity between ethnic majority and minority participants. Specifically, cognitive ability tests predicted training performance for minorities but not for those in the majority. Performance prediction for the latter group was low for cognitive ability tests and somewhat better using non-cognitive ability variables. By the way, the dissertation of the primary author, a fascinating look at similar issues, can be found here.

The third article is one of those articles that almost (...almost) makes me want to pay for it, and anybody interested in electronic applicant issues take note. In this study, Dunleavy et al. used simulations to show the tremendous impact that small numbers of applicants can have on adverse impact (AI) analysis. In fact, the authors reveal situations where AI can be caused or masked by a single applicant applying multiple times! The authors present ways of identifying and handling these cases. Scary stuff. Hope the OFCCP is reading.

Fourth, Lievens and Peeters present results of a study of elaboration and its impact on faking situational judgment tests. Using master students, the researchers found that requiring elaboration on items (i.e., the reason they chose the response) had several positive results. It reduced faking on items with high familiarity. It also reduced the percentage of "fakers" in the top of the distribution. Lastly, candidates reported that the elaboration allowed them to better demonstrate their KSAs. This could be a great strategy for those of you worried about the inflation effects of administering SJTs online.

Next, Furnham et al. with a study of assessment center ratings. The authors found that expert ratings of "personal assertiveness", "toughness and determination", and "curiosity" were significantly correlated with participant personality scores, particularly Extraversion. Correlations with intelligence test scores were low.

Last but definitely not least, Oppler et al. discuss results of a rare empirical study of financial history and its relationship to counterproductive work behaviors (CWBs). Using a "random sample of 2519 employees" the authors found that those with financial history "concerns" were significantly more likely to demonstrate CWBs after hire. Great support for conducting these types of checks.

There are other articles in here, so I encourage you to check them all out. Thank goodness for research!

Tuesday, September 02, 2008

Power v. Group Differences

In a recent post I wrote about a chart my co-workers and I created to help us communicate with hiring supervisors about the pros and cons of various testing instruments. That graph mapped power (validity) on one axis, and speed of administration on the other.

One of the comments on that post mentioned it would be nice to see power vs. group differences. I agreed. So here it is!

The bottom line on this graph (no pun intended), if you're looking for the best combination of both, will be in the upper left quadrant.

A few notes of notes of caution before interpreting the graph:

- this graph charts only Black-White differences, which is the largest data set we have. It's important to remember that combinations of other groups (including gender) will yield slightly different results.

- the evidence on group differences for T&Es is rather scant. Not much has been found, but that doesn't mean it couldn't in the future, depending on what specific training or experience is being measured.

- finally, as the excellent recent article by Roth, et al. reminds us, adverse impact in your selection process depends on several factors, including the specific test or construct, the selection ratio, your applicant pool, and the order you place your assessments in.

Monday, August 25, 2008

Adverse impact on personality and work sample tests

The latest issue (Autumn, 2008) of Personnel Psychology has so much good stuff in it that I'm going to split it into two parts.

The first part, which I'll do today, focuses on the more selection-oriented articles which have to do with adverse impact on personality tests and in work sample exercises. In my next post I'll talk about three more articles that have to do with strategic HRM practices.

Today let's talk about adverse impact. It's a persistent dilemma, particularly given many employers' desire to promote diversity and the legal consequences of failing to avoid it. One of the "holy grails" of employee assessment is finding a tool that is generally valid, inexpensive to implement, and does not result in large amounts of adverse impact.

One type of instrument that has been suggested as fitting these criteria is the personality test. They're easy to administer and can be valid predictors of performance, but our knowledge of group differences has up until now been limited. In this issue of Personnel Psych, Foldes, Duehr, and Ones present meta-analytic evidence that attempts to fill in the blanks.

Their study of Big 5 personality factors and facets is based on over 700 effect sizes. So what did they find? There is definitely value to separating the factors from the facets, as they show different levels of group difference. And most of the group differences (in cases with decent sample sizes) were small to moderate. Here are some of the largest and most robust findings (e.g., 90% confidence interval does not include zero):

- Whites scored higher than Asians on even-temperedness (an aspect of emotional stability; d=.38)
- Hispanics scored higher than Whites on self-esteem (an aspect of emotional stability; d=.25)
- Blacks outscored Asians on global measures of emotional stability (d=.58)
- Blacks outscored Asians on global measures of extraversion (d=.41)
- Hispanics outscored Blacks on sociability (d=.30)

The article includes a very useful chart that summarizes the findings and includes indications of when adverse impact may occur given certain selection ratios. What I take away from all this is the classic racial discrimination situation employers are worried about in the U.S. (Whites scoring higher than another group) is less of a concern with personality tests than with, say, cognitive ability tests. But (and this is a big but), it doesn't take much group difference to result in adverse impact (see Sackett & Ellingson, 1997)

The second article is also about group differences. This time it's work sample tests and it's a meta-analysis of Black-White differences by Roth, Bobko, McFarland, and Buster.

The authors analyzed 40 effect sizes in their quest to dig further into this subject--and it's a good thing they did. A group difference (d) benchmark often cited for these exercises is .38 in favor of Whites. These authors obtained a value of .73, but with an important caveat--this value depends greatly on the particular work sample test.

For example, in-basket and technical exercises (e.g., reading a construction map) yielded d values of .74 .76, respectively. On the lower end, oral briefings and role-plays had d values of .22 and .21, respectively. Scheduling exercises were in the middle at d=.52.

Why the difference? The authors provide data that indicates the more saturated with cognitive ability/job knowledge the measure, the higher the d values. The more the exercise requires demonstrating social skills, the lower the d values.

Bottom line? Your choice of selection measure should always be based on the KSAs required per the job analysis. But given a choice between different exercises, consideration should be given to the group differences described above. Blindly selecting a work sample over, say, a cognitive ability test, may not yield the diversity dividends you anticipate (in addition to the fact that they may not be as predictive as we previously thought!).

Some important caveats should be noted about both of these pieces of research: (1) adverse impact is heavily dependent on factors other than group differences, such as applicant population, selection ratio, and stage in the selection process; and (2) from a legal perspective, adverse impact is only a problem if you don't have the validity evidence to back it up. Of course you should have this evidence anyway, because that's how you're deciding how to filter your candidates...right?

Wednesday, May 14, 2008

Adverse impact of assessment centers (May Applied Psych)

The May '08 issue of the Journal of Applied Psychology is out with lots of great content. Unfortunately only one is directly related to recruitment and assessment, so let's check that one out then I'll give you links to some others that look interesting.

The study is Ethnic and gender subgroup differences in assessment center ratings: A meta-analysis by Dean, Roth, and Bobko. The authors found overall d-values of .52 for Black-White differences, .28 for Hispanic-White differences, and -.19 for male-female differences. (the second group in these comparisons performs better)

The results suggest that the Black-White difference is larger than previously thought but may be a more "diversity friendly" option for Hispanics and females.

There are some other great articles in here for fans of organizational behavior, including:

Subjective cognitive effort: A model of states, traits, and time. (which, by the way, suggests another reason why conscientiousness may predict job performance)

Early predictors of job burnout and engagement.

Event justice perceptions and employees' reactions: Perceptions of social entity justice as a moderator.

Harmful help: The costs of backing-up behavior in teams.

Trust that binds: The impact of collective felt trust on organizational performance.

Stirring the hearts of followers: Charismatic leadership as the transferal of affect.

The influence of psychological flexibility on work redesign: Mediated moderation of a work reorganization intervention.

...and several more!

Tuesday, March 25, 2008

Too fat or too thin? You may not get hired.

Job candidates that are either too fat or too thin may have a more a difficult time getting hired than those in the middle weight ranges according to a study by Swami, et al. reported in the most recent issue of the Journal of Applied Social Psychology.

Weighting in line
The authors found that when men were asked to rate a variety of female pictures for either a management position or for providing help (N=30 and 28, respectively), they were less likely to hire or help women with body mass indices (BMI) over 30 or under 15. Those with a slender body (BMI = 19-20) were most likely to be hired or helped. This shouldn't be surprising, given that studies have consistently linked physical attributes, including weight, with employment decisions, but it's certainly a reminder to watch your biases when evaluating candidates!

Predict-ability
In another article, Truxillo et al. found a relationship between cognitive ability and the ability to accurately judge one's performance on an employment test. Using a video-based situational judgment test of customer service skills, the authors found that those with high cognitive ability were able to predict their performance while those with low cognitive ability were not. Practical implications? Providing thorough test feedback may be particularly important for candidates lower in cognitive ability as they may be more likely to be surprised (and dismayed) by the results. This means providing information prior to the test as well as afterward (e.g., how it was developed, how it is scored, how you can improve your performance).

Working IT
In a third study, Johnson, et al. found gender and ethnic group differences in how IT careers are perceived as well as in self-efficacy related to IT. Using data from 159 African- and 98 Anglo-Americans, the authors found that African American men reported higher levels of IT self-efficacy than all other groups, whereas Anglo women reported the lowest levels. In addition, Anglos had more negative stereotypes of IT professionals than did African Americans. This study had a small sample size, but the implication is that how people see their own ability related to an occupation, as well as how they perceive those in it, influences their career choices. This will in turn impact your applicant demographics as well as your recruiting success.

The rest
There are some other interesting reads in here, including:

When emotional displays of leaders may increase follower performance

How to give performance feedback

Self-perceptions of ethical behavior

Tuesday, May 08, 2007

2007 SIOP Conference: Highlights, Part 2

This is the third in a series of posts about the 2007 SIOP Conference. In Part 1 I talked about some of the new products out there and in Part 2 I went over some of the research that was presented. In this post I'll point out some more research that you may find interesting...

Legal risks and defensibility factors for employee selection procedures

Posthuma, Roehling, and Campion analyzed nearly 600 federal district court cases and came up with some very interesting results:

- Employers are most likely to win (by far) when defending tests of math or mechanical ability. Employers also fare well when defending assessments of employment history and interviews.

- Employers did worst when defending physical ability tests and medical examinations. Tests of verbal ability and job knowledge were also more likely to result in a plaintiff win.

Predicting Internet job search behavior and turnover

Using a sample of 110 nurses in Texas, Posthuma et al. found using longitudinal survey data that (among other things) Internet job search behavior was related to turnover--folks weren't just surfing for fun. This suggests that organizations need to pay close attention to job searching behavior among employees; not necessarily to curtail it but instead to figure out why high performers want to leave.

Gender differences in career choice influences

After analyzing survey data from nearly 1,400 fourth-year medical students from two U.S. schools, Behrend et al. found a gender difference in preferred career: specifically, female medical students valued "opportunities to provide comprehensive care" when choosing a specialty much more than men. This is consistent with other work that has showed women to be more "relationship-oriented" than men when it comes to choosing a career.

Portraying an organization's culture through properties of a recruitment website

In this study of 278 undergraduate students, Kroustalis and Meade found that inclusion of pictures on a website that were intended to portray a certain organizational culture did so--but only for certain cultural characteristics. Specifically, pictures that implied a culture of either innovation or diversity had the intended effect--but pictures representing a team orientation did not. Interestingly, "employee testimonials" designed to emphasize these cultural aspects failed to do so for any of the three aspects studied. Finally, individuals who perceived a greater fit between themselves and the organization (in terms of the three cultural aspects) reported being more attracted to the organization.

Recruiting solutions for adverse impact: Race differences in organizational attraction

Last but definitely not least, Lyon and Newman gathered data from nearly 600 university students on their reactions to 40 hypothetical job postings...and came away with some very interesting results. For example:

- Conscientious individuals were more likely to apply to postings that explicitly stated a preference for conscientious applicants.

- Conscientious individuals were more likely to apply to postings that described the company as results-oriented.

- Black applicants with higher cognitive ability were more likely to respond to ads seeking conscientious individuals while White applicants with higher cognitive ability were less likely to do so.

- When a company was described as innovative, Black applicants high on conscientiousness were more likely to apply; this was not the case for White applicants.