Sunday, December 20, 2009

Validity: An elusive (unitary?) concept

What makes a test "valid"? What is the best way to develop a selection system? These are two of the most fundamental questions we try to answer as personnel assessment professionals, yet the answers are strangely elusive.

First of all, let's get two myths out of the way: (1) a test is valid or invalid, and (2) there is a single approach to "validating" a test. It is the conclusions drawn from test results that are ultimately judged on their validity, not simply the instruments themselves. You may have the best test of color vision in the world--that doesn't mean it's useful for hiring computer programmers. And many sources of evidence can be used when making the validity determination; this is the so-called "unitary" view of validity described in references like the APA Standards and the SIOP Principles. Unitary in this case refers to validity being a single, multi-faceted concept, not that psychologists agree on the concept of validity--a point we'll come back to shortly.

Although we can debate test validation concepts ad infinitum, the bottom line is we create tests to do one primary thing: help us determine who will perform the best on the job. The validation concept that most closely matches this goal is criterion-related validity: statistical evidence that test scores predict job performance. So we should gather this evidence to show our tests work, right? Here's where things get complicated.

It's likely that many organizations can't, for various reasons, conduct criterion-related validity studies (although baseline evidence of this would be helpful). Most of the time, it's because they lack the statistical know-how or high quality criterion measures (a 3-point appraisal scale won't do it). So in a strange twist of fate, the evidence we are most interested in is the evidence we are least likely to obtain.

So what are organizations to do? Historically the answer is to study the requirements of the job and select/create exams that target the KSAs/competencies required; this matching of test and job is often referred to as "content validity" evidence. But Kevin Murphy, in a recent article in SIOP's journal Industrial and Organizational Psychology, makes an important point: this is good practice, but not a guarantee that our tests will be predictive of job performance. Why not? For a number of reasons, including poor item writing and applicant frame of reference. Murphy makes a passionate argument that we rely way too heavily on unproven content validation approaches when we should focus more on criterion-related validation evidence. Instead of focusing on job-test match, we should focus on selecting proven, high quality exams.

Not surprisingly, the article is accompanied by 12 separate commentaries that argue with various points he makes. It's also interesting to compare this piece with Charley Sproule's recent IPAC monograph where he makes an impassioned defense of content validity.

A complete discussion of the pros and cons of different forms of validation evidence are obviously beyond a simple blog post. My main issues with Murphy's emphasis on criterion-related validation are threefold. First, as stated above, most organizations likely don't have the expertise to gather criterion-related validation evidence for every selection decision (maybe this is his way of creating a need for more I/O psychologists?). Perhaps "insufficient resources" is a poor excuse, particularly for an issue as important as employment, but it is a reality we face.

Second, even if we were to shift our focus to individual test performance, following a content validation approach for development enhances job relatedness (which Murphy acknowledges). Should your selection system face an adverse impact challenge, the ability to show job relatedness will be essential.

Finally, let's not forget that high test-job match gives candidates a realistic job preview--hardly an unimportant consideration. RJPs help candidates decide whether the job would be a good match for their skills and interests. And no employer that I know of enjoys answering this question from candidates: "What does this have to do with the job?"

The approach advocated by Murphy, taken to its extreme, would result in employers focusing exclusively on the performance of particular exams rather than on their content in relation to the job. This seems unwise from a legal as well as face validity perspective.

In the end, as a practitioner, my concern is more with answering the second question I posed at the beginning of this post: What is the best way to develop a selection system? Given everything we know--technically, legally, psychologically--I return to the same advice I've been giving for years: know the job, select or create good tests that relate to KSAs/competencies required on the job, and base your selection decision on the accumulation of test score evidence.

Should researchers work harder to show that job-test content "works" in terms of predicting job performance? Sure. Should employers take criterion-related validation evidence into consideration and work to collect it whenever possible? Absolutely. Will job-test match guarantee a perfect match between test score and job performance? No. But I would argue this approach will work for the vast majority of organizations.

By the way, if you are interested in learning more about the different ways to conceptualize validity--"content validity" in particular--Murphy's focal article as well as the accompanying commentaries are highly recommended. He acknowledges that he is purposely being provocative, and it certainly worked. It's also obvious that our profession has a ways to go before we all agree on what content validity means.

Last point: the first focal article in this issue--about identifying potential--looks to be good as well. Hopefully I'll get around to posting about it, but if not, check it out.

No comments: