HR Tests - Recruitment, assessment, and personnel selection: December 2009

Wednesday, December 30, 2009

Outback settlement contains interesting requirements

You may have heard that Outback Steakhouse, a restaurant chain based in Tampa, Florida, has agreed to settle a gender discrimination lawsuit for $19M. What's interesting about this isn't the size of the settlement, but rather the conditions attached.

Background: The EEOC sued Outback in 2006, claiming it systematically discriminated against its female employees by denying them promotion opportunities to the more lucrative profit-sharing management positions. In addition, they claimed that female employees were denied promotional job assignments such as kitchen management, which were required for employees to be considered for top management positions.

The settlement: Outback agreed to a four-year consent decree and $19M in monetary relief. So far, pretty standard. But there were additional settlement requirements, and here's where it gets interesting. In addition to the monetary relief, Outback has agreed to:

1. Create an online application system for employees interested in management positions. This is the first time I've seen this in a settlement (which isn't to say it hasn't happened) and seems to indicate that the EEOC views this as a more "objective" screening mechanism.

2. Create and hire someone for a newly created "human resources executive" position titled Vice President of People. Again, this is a new one for me.

3. Hire an outside consultant for at least two years who will monitor the online application system to ensure women are being provided equal opportunities for promotion and provide reports to the EEOC every 6 months.

The main thing that strikes me about this settlement is the faith that is being placed in an online application system to somehow ensure equal opportunity. Sure, having a standardized application system may cut down on some of the subjectivity of individual hiring supervisors, but it leaves me wondering:

- What will the screening criteria for management positions be?

- How will the outside consultant define "equal opportunities"?

- How will access to the online system be controlled, and who will be making screening/hiring decisions?

- What happens if there continues to be adverse impact, which you would expect if applicants continue to be screened on experience?

- What will be the duties of the Vice President of People, how will they be hired, and how will they interact with the consultant?

This will be interesting to watch.

Sunday, December 20, 2009

Validity: An elusive (unitary?) concept

What makes a test "valid"? What is the best way to develop a selection system? These are two of the most fundamental questions we try to answer as personnel assessment professionals, yet the answers are strangely elusive.

First of all, let's get two myths out of the way: (1) a test is valid or invalid, and (2) there is a single approach to "validating" a test. It is the conclusions drawn from test results that are ultimately judged on their validity, not simply the instruments themselves. You may have the best test of color vision in the world--that doesn't mean it's useful for hiring computer programmers. And many sources of evidence can be used when making the validity determination; this is the so-called "unitary" view of validity described in references like the APA Standards and the SIOP Principles. Unitary in this case refers to validity being a single, multi-faceted concept, not that psychologists agree on the concept of validity--a point we'll come back to shortly.

Although we can debate test validation concepts ad infinitum, the bottom line is we create tests to do one primary thing: help us determine who will perform the best on the job. The validation concept that most closely matches this goal is criterion-related validity: statistical evidence that test scores predict job performance. So we should gather this evidence to show our tests work, right? Here's where things get complicated.

It's likely that many organizations can't, for various reasons, conduct criterion-related validity studies (although baseline evidence of this would be helpful). Most of the time, it's because they lack the statistical know-how or high quality criterion measures (a 3-point appraisal scale won't do it). So in a strange twist of fate, the evidence we are most interested in is the evidence we are least likely to obtain.

So what are organizations to do? Historically the answer is to study the requirements of the job and select/create exams that target the KSAs/competencies required; this matching of test and job is often referred to as "content validity" evidence. But Kevin Murphy, in a recent article in SIOP's journal Industrial and Organizational Psychology, makes an important point: this is good practice, but not a guarantee that our tests will be predictive of job performance. Why not? For a number of reasons, including poor item writing and applicant frame of reference. Murphy makes a passionate argument that we rely way too heavily on unproven content validation approaches when we should focus more on criterion-related validation evidence. Instead of focusing on job-test match, we should focus on selecting proven, high quality exams.

Not surprisingly, the article is accompanied by 12 separate commentaries that argue with various points he makes. It's also interesting to compare this piece with Charley Sproule's recent IPAC monograph where he makes an impassioned defense of content validity.

A complete discussion of the pros and cons of different forms of validation evidence are obviously beyond a simple blog post. My main issues with Murphy's emphasis on criterion-related validation are threefold. First, as stated above, most organizations likely don't have the expertise to gather criterion-related validation evidence for every selection decision (maybe this is his way of creating a need for more I/O psychologists?). Perhaps "insufficient resources" is a poor excuse, particularly for an issue as important as employment, but it is a reality we face.

Second, even if we were to shift our focus to individual test performance, following a content validation approach for development enhances job relatedness (which Murphy acknowledges). Should your selection system face an adverse impact challenge, the ability to show job relatedness will be essential.

Finally, let's not forget that high test-job match gives candidates a realistic job preview--hardly an unimportant consideration. RJPs help candidates decide whether the job would be a good match for their skills and interests. And no employer that I know of enjoys answering this question from candidates: "What does this have to do with the job?"

The approach advocated by Murphy, taken to its extreme, would result in employers focusing exclusively on the performance of particular exams rather than on their content in relation to the job. This seems unwise from a legal as well as face validity perspective.

In the end, as a practitioner, my concern is more with answering the second question I posed at the beginning of this post: What is the best way to develop a selection system? Given everything we know--technically, legally, psychologically--I return to the same advice I've been giving for years: know the job, select or create good tests that relate to KSAs/competencies required on the job, and base your selection decision on the accumulation of test score evidence.

Should researchers work harder to show that job-test content "works" in terms of predicting job performance? Sure. Should employers take criterion-related validation evidence into consideration and work to collect it whenever possible? Absolutely. Will job-test match guarantee a perfect match between test score and job performance? No. But I would argue this approach will work for the vast majority of organizations.

By the way, if you are interested in learning more about the different ways to conceptualize validity--"content validity" in particular--Murphy's focal article as well as the accompanying commentaries are highly recommended. He acknowledges that he is purposely being provocative, and it certainly worked. It's also obvious that our profession has a ways to go before we all agree on what content validity means.

Last point: the first focal article in this issue--about identifying potential--looks to be good as well. Hopefully I'll get around to posting about it, but if not, check it out.

Sunday, December 13, 2009

R/A Predictions for 2010

With 2010 right around the corner, here are some predictions for what the new year will bring in the area of recruitment and assessment:

1) More personality testing. Year after year personality testing continues to be one of the hottest topics. Look for more research, more online personality testing, and new measurement methods.

2) More boring job ads. Even though we know better, don't expect to see any big leaps in readability for 80% of job ads. Same old job descriptions. Maybe we'll see some pictures. On the plus side, more organizations focus on making their career portals attractive.

3) A slow trickle of research on recruiting. The amount of large-scale, sophisticated research on recruiting methods remains a shadow of that found in the assessment literature. Don't expect this to change.

4) More focus on simulations. 2010 sees more focus on simulations, particularly those delivered on-line, as highly predictive assessments as well as realistic job previews. Oh, and they likely have low adverse impact (research, anyone?).

5) Leadership assessment gets even hotter. With the economy improving and more boomers deciding the time is right to retire, finding and placing the right people in leadership positions becomes an even more important strategic objective.

6) Federal oversight agencies get more aggressive. With more funding and backing from the Obama administration, expect to see the EEOC and OFCCP go after employers with renewed vigor. By the way, have you seen the EEOC's new webpage? It's actually quite well done.

7) More fire departments get sued. In the wake of the Ricci decision, fire dept. candidates feel emboldened when they fail a test or fail to get hired/promoted. Look for departments to try to get out ahead of this one by revamping their selection systems.

8) More age discrimination lawsuits. With so many boomers, expect to see more claims of discrimination, particularly over terminations. Keep words like "energetic" and "fresh" out of your job ads.

9) Automation providers slowly focus on simplicity. Whether we're talking applicant tracking or talent management systems, vendors slowly realize that they need to make their applications simpler to increase usability and buy-in. No, simpler than that. Keep going...

10) Employers get more sophisticated about social networking sites. Many realize that rather than jumping on the latest Twitter-wagon, it's best to figure out where these sites fit with their recruitment/assessment strategy. Watch for more positions whose sole role is managing social media.

11) Online candidate-employer matching continues to be a jumbled mess. Without a clear winner in terms of a provider, job seekers are forced to maintain 400 profiles on different sites and may give up altogether and focus more on social networking. Meanwhile, employers continue to try to figure out how to reach passives; LinkedIn continues to look good here but needs to expand its reach a la Facebook.

12) More employers face the disappointing results of online training and experience questionnaires. Will they go back to the drawing board and try to improve them (hint: don't use the same scale throughout), or abandon them for more valid methods, such as biodata, SJT, and simulations? More research on T&Es is badly needed, even if we are just putting lipstick on a pig.

13) Decentralized HR shops centralize. Centralized ones decentralize. Particularly in the public sector, these decisions unfortunately continue to be made based on budgets rather than best practice. Hiring supervisors wonder why HR still can't get it right.

14) Fortunately, HR continues to professionalize. With much of the historical knowledge walking out the door and the job market improving, HR leaders are forced to re-conceptualize how they recruit and train recruitment and assessment professionals. This is a good thing, as it means more focus on analytical and consultative skills.

Keep up the good work everybody. And Happy Holidays!

Sunday, December 06, 2009

Setting cutoff scores on personality tests

What's the best way to set a cutoff score for a personality test, knowing that some candidates inflate their score? It all depends on your goal. Are you trying to maximize validity or minimize the impact of inflation?

According to a research study by Berry & Sackett published in the Winter '09 issue of Personnel Psychology, if your goal is to maximize validity, your best bet is to wait until applicants have taken the exam, then set your cut-score (e.g., the top two-thirds); this was particularly true when selection ratios are small (i.e., organization is very selective).

If your goal is to minimize the number of deserving applicants who are displaced by "fakers", you're better off establishing the cut point ahead of time, by using a non-applicant derived sample (e.g., job incumbents, research group). The results were generated using a Monte Carlo simulation.

Interestingly, the authors also replicated the work of other researchers who have shown that the impact of faking on the criterion-related validity of personality measures is relatively low. There are a few other very good points made in this article:

- Expert judgment methods of establishing pass points (e.g., Angoff method) may be difficult to use for personality tests since experts may find it difficult to judge individual items. Methods used to select a certain number of applicants or methods based on a criterion-related validity study (both used as variables in this study) are more appropriate for personality tests.

- There is no consensus of how prevalent faking on personality exams is; estimates range from 5-71%. It likely depends on the situation and how motivated test takers are to engage in impression management.

- Some recommend setting a very low cutoff score for personality tests, which would exclude only those likely not suitable for the position (and not faking), while others prefer a more stringent cutoff to maximize utility.

- A reasonable range of d-values for score inflation on personality inventories is .5-1.0 (used in this study).

- There exists very little research on the skewness of faking score increases. A positively-skewed distribution (meaning most people faked a small amount) was used in this study. (I would think this would also vary on the situation)

So bottom line: where--and how--you set your cutoff score on personality inventories depends on whether you want to maximize the predictive validity or minimize the number of deserving applicants that get left out of the process.

Other good reads in this issue:

- Police officer applicants reactions to promotional assessment methods

- The impact of diversity climate on retail store sales

- The construct validity of multisource performance ratings

- Labor market influences on CEO compensation