Peroni writes in his top-level comment, "Give the candidate a realistic technica...

davesims · on Sept 20, 2013

The challenge here, in my experience on both sides of this, is that a developer's deep domain knowledge of a large or large-ish app is essential to a 'work-sample' environment. And that's virtually impossible to duplicate even in the longest interview time frame.

Making decisions about managing technical debt, adding architecturally-significant changes, balancing good OOP with responsiveness, knowing the difference between future-proofing and conscientious coding -- all of those are both crucial (in many cases the most crucial) to day-to-day work, and also so highly context-specific that those decision-making traits are nearly impossible to identify during a technical exercise.

So for coding challenges, that leaves short-term tactical/analytic/algorithmic exercises, which in (anecdotally) 95% percent of cases cannot begin to approach a 'work-sample' environment. I've yet to encounter a technical challenge that would tell me much more about a candidate than basically how fluent they are with their tools, how well they know syntax and some general design principles, and what, for instance, their TDD (or lack of) workflow is like. Probably some insight into line-level analytic and algorithmic ability.

All of that is helpful, but -- Trust Me Here!! -- can also be very deceiving. The same coders that can knock those challenges out of the park can also be highly-proficient Debt Machines, all the more destructive because of their special genius for cranking out architecturally suspect code at a breathtaking rate.

To get into a real 'work-context' flow of a large app requires weeks, sometimes months, and only then can you get full perspective on how a given coder is going to contribute to your team on an ongoing basis. To get a feel for what that will look like in an interview, I've found I have to pretty much rely on the candidate's past projects, and informal conversations around larger architectural and OO principles.

tokenadult · on Sept 20, 2013

I think your point is well made that what you can sample in a work-sample test is not the full set of long-term skills that benefit a for-profit company. That's why the predictive validity of work-sample tests is only about .50 across a wide range of industries. But the key point is that EVERY other hiring procedure, except for general cognitive ability tests, has lower validity, so a company is throwing away a lot of opportunity to hire good workers if it doesn't use a combination of work-sample testing and cognitive ability testing for all of its hiring. Your sound analysis can be turned around to using interviews as a hiring procedure--which is much more commonplace than using work-sample testing as a hiring procedure--to make the correct point that an applicant who looks good in an interview may not be a "team player" once hired. Any hiring procedure is a sample of applicant behavior, not fully representative of how the worker will behave on the job after being hired. But work-sample tests get much closer to what the worker will do on the job long-term than any other procedure besides general cognitive ability tests. Because work-sample tests and cognitive ability tests each have incremental validity when added to the other, it's best to use both in combination to get a hiring procedure with somewhat more than .50 validity in finding good workers.

HelloMcFly · on Sept 20, 2013

That's a very well-researched comment! There a few things probably worth noting. Roth, Bobko and McFarland have been pretty active in this topic for the past decade. They've found the validity coefficient cited by Schmidt and Hunter in 1998 is likely an over-estimate due to relying on research conducted when there were less rigorous statistical and methodological best-practices.

[1]http://www.psychologie.uni-mannheim.de/cip/Tut/seminare_witt...

The validity coefficient provided by Roth and Bobko is likely more accurate. That isn't to diminish their value as they are still valuable, but the aren't the cure-all we'd like them to be. They do continue to show promise in reduced adverse impact though, which is great (note: the full article is behind a paywall - what is the HN-approved method of sharing the information?):

[2]http://onlinelibrary.wiley.com/doi/10.1111/j.1468-2389.2010....

That is for gender. With regards to ethnicity, the evidence isn't quite as optimistic yet. Like other predictors including cognitive ability tests, if they are showing notable adverse impact you may be in trouble Like other predictors including cognitive ability tests, if they are showing adverse impact you may be in trouble despite their validity.

[3]http://onlinelibrary.wiley.com/doi/10.1111/j.1744-6570.2008....

It's a problem with lots of predictors, though scope of the problem varies. There is work being done all the time, even in the most reliably stalwart predictor, the cognitive ability test:

[4}http://psycnet.apa.org/journals/apl/92/3/794/

Anyone interested in the great "diversity-validity dilemma" can check out this link for more information, though there's always progress. It's a great article.

[5]http://onlinelibrary.wiley.com/doi/10.1111/j.1744-6570.2008....

For my money I endorse integrity tests as a part of the solution. Decent validity, including incremental validity over cognitive ability due to a low correlation between the two, and small sub-group differences.

[6]http://onlinelibrary.wiley.com/doi/10.1111/j.1744-6570.2007....

Having said all that, I imagine the efficacy of work samples is moderated by the type of work, and I'd have to believe they are more amenable to demonstrations of technical skill like coding (I don't know of any references for this now, but I'll look later). Coding-related jobs would be nice because it would be possible to blindly judge on the output as well, and in programming-type jobs it would be much easier and cost-efficient to test large numbers of applicants than it would for many other jobs. Cost and ease of large-scale administrations are their big problems, so overcoming those would be gravy. I don't know how subgroup differences are impacted though.

tokenadult · on Sept 20, 2013

I've bookmarked all the references you kindly shared and will be doing further research on this. Thanks.