Studies [1] show that a work sample test is the best predictor of candidate performance on the job, which is why many software engineering teams use take-home tests as one step in their hiring process. But designing an effective test is difficult and time-consuming. For example, candidates are reluctant to complete tests that are too long or not engaging enough. But make them too short and teams won’t get the signal they need for a proper evaluation.
To encourage more thoughtful test design (and hopefully save future candidates from the worst offenders), my team compiled the largest library of non-“whiteboard” take-home tests that real engineering teams have used. You’ll find the challenges that Stripe and Microsoft gave to their full-stack candidates, front-end tests from Tailwind and Rivian, and back-end ones from Basecamp and Revolut. Whether you’re looking to evaluate an Android, DevOps, or Data Science candidate, a bootcamp grad, or senior engineer, we found a few options for each.
Having built 20+ tests ourselves, we also rated the design of each test. The criteria for a 5-star rating:
1. Tests for skills highly relevant to those required for the position
2. Includes a well-written description of the prompt and even motivation for using a take-home test
3. Sets clear expectations for candidates (e.g. time requirements, evaluation criteria, submission details)
4. Asks for a reasonable time commitment from candidates (<4 hours)
A few notes:
- We found most of these test prompts in public GitHub repos, usually owned by the hiring team but occasionally in the candidate-owned submission. We sifted through hundreds of tests and filtered out those overly focused on algorithms (aka LeetCode), leaving us with 142 tests in the library.
- The larger and more recognizable companies didn’t always have the best tests. Some of the most interesting prompts we found were from smaller teams (e.g. YC startups). This shouldn’t be surprising. Startups need to design candidate-friendly hiring experiences to compete for talent against more established players.
- There were common themes among the tests we found. For example, front-end candidates were often given a Figma design + content feed to implement, while back-end candidates had to implement an API given a set of requirements. Data scientists were usually given a data set to clean, analyze, and submit a Jupyter notebook with their findings.
- We’ll continue to update this library and add descriptions of each test so it’s easier to compare.
Have feedback, or another take-home test we should add? We’d love to hear from you!
While this certainly sounds like reasonable criteria for rating a take home, does this truly account for general candidate reluctance to do take homes due to the time commitment? Even 2-4 hours is more time than conventional Leetcode-style initial assessments, and many may rather use that time to interview with 2-4 companies instead of just one.
I recently interviewed with Ramp and enjoyed their model of practical problems, like a take home, but condensed into just one hour with less of the end-to-end expectations of a true take home.
I think most 2-4 hour take-homes can be condensed to 1-2 hours with some thoughtful choices. A few ideas I've used:
- Provide starter code and setup instructions so candidates don't waste time on boilerplate.
- Abbreviate requirements to what actually matters. E.g. do you really need 100% test coverage on a take-home? Ask candidates to write a few tests and then tell you what else they'd do given more time.
- Use an open-ended, time-boxed format instead of having end-to-end expectations. IMO a hybrid format where a short (1 hr) take-home is followed by a live discussion/pairing afterward can be the core component of a hiring process.
I'd love to hear more about the Ramp process. Do you mind sharing what sort of practical problems they used?
I interviewed for a backend role and the problems were greatly simplified versions of day-to-day backend SWE work. For example, use a server to complete a task. The focus was on how you make the requests and handle any edge cases that might come up, and the server was actually live so you could tinker and get immediate feedback. Hopefully that’s not too vague :)
Edit: to be clear, the interview was live with an interviewer. So it wasn’t a take home in the scheduling sense either.
I agree that take homes can be simplified with your suggestions above, and that certainly makes a better experience for the candidate. The hybrid format is also great - future interviews become an extension of your previous work, so it’s more comfortable than having to context switch for a new challenging problem each round.
I didn’t find this in your linked database, but I also enjoyed GitHub’s take home. I only recall spending 45-90 minutes on it, and the setup process was seamless. A recent blog post describes their approach here: https://github.blog/2022-03-31-how-github-does-take-home-tec...
Yes! I spoke to Andy (the author of the article you linked) when he posted this. I'm a big fan of this approach. And the software we're building is quite similar to the interview-bot one that GitHub uses internally. Why shouldn't every eng team be able to benefit from tools like theirs?
I'd love to add GitHub's take-home to the library, but I feel the article describes the exercise without sharing the actual prompt. If there's a public link to it, lmk and I'll add it.
You’re right, I don’t think their exercise is public. I just wasn’t sure if you knew about it and felt it was good enough to call out.
This is awesome to hear! I did check out your website but wasn’t sure how closely it tracked with the GitHub method.
Honestly, I had a thought to build something like this after my GitHub interview but wasn’t sure how much traction it would get. It’s cool that you’re working on it and that my random thought was validated; best of luck!
> Even 2-4 hours is more time than conventional Leetcode-style initial assessments,
Hard disagree on this one. Unless your day-to-day work includes many leetcode style problems, you need to put in significant time training on leetcode if you want to pass the interview.
You should be able to complete a take-home based on your current skill set. Yes, it may take 4-6 hours (or 8-10), and yes, that is a big ask of a candidate, but leetcode can take 5 to 10x prep time and you still can muff the interview.
Is there literally any other field where there's an expectation to essentially do fairly intense studying to pass interviews that apparently have very little to do with day to day jobs? Maybe they exist I've never encountered them. The limit of my "studying" for an interview is to just learn a bit about the company, their strategy, etc. either by online research or by talking to people I know connected with them.
ADDED: I suppose one could argue that the Bar exam is a bit like that but that's a credential as are degrees which are not necessarily all that overlapping with the real world.
> ADDED: I suppose one could argue that the Bar exam is a bit like that but that's a credential as are degrees which are not necessarily all that overlapping with the real world.
Yeah, I think most other professions where one might imagine them doing something like a stereotypical software interview, use that instead: a credential obtained by a (perhaps very difficult!) test, maybe with periodic re-tests or required refresher courses to keep the credential valid.
I have a suspicion that a big part of why top-comp software companies keep their interviews so incredibly unpleasant has more to do with discouraging job-hopping among them (so, suppressing wages) than with its being the best process for hiring good developers.
> I have a suspicion that a big part of why top-comp software companies keep their interviews so incredibly unpleasant has more to do with discouraging job-hopping among them (so, suppressing wages) than with its being the best process for hiring good developers.
You have to wonder if this has backfired. ;) Plenty of people job-hop every 1-2 years, and it's pretty common knowledge which companies ask which level of LC difficulty for which level.
It's probably pretty common within a certain cluster. A bunch of other people probably see the barriers and just pass. Or my experience is that others just find grass is distinctly not greener and move on.
Tons of them. How much time do you think the typical actor or dancer spends auditioning and preparing for the audition? How much time do you think people in many creative fields spend putting together portfolios? And if I'm going to interview with just about anyone, I'm going to spend at least 4 hours prepping for an interview.
I'm not a big fan of take homes in general but a time-consuming job search and interviews is absolutely normal beyond low-skill jobs in a time of labor shortages.
1. Some candidates only need to brush up to be in interview shape.
2. To mitigate the need to spend time to prepare, simply use initial assessments to get back in shape. That is, apply to and interview at companies you don’t want to work at first to get more practice problems. If you pass, you get the added benefit of more leverage when negotiating your compensation!
3. As another commenter mentioned, as Leetcode style interviews are shorter, there should be a threshold where $prep_time + $leetcode_interview_time <= $take_home_interview_time. Of course, the threshold adjusts up and down based on your initial level of preparedness.
I honestly hate spending 4-6 hours of my time on a pointless take-home, but I suck at leetcode and I absolutely refuse to spend any time on learning it, so they’re the lesser of two evils.
The difference is that you are spending 4-10 hours of your time trying to pass the process for 1 company. If you instead invest 4-10 hours doing leetcode prep, you are studying to pass the process for LOTS of companies. Economies of scale is a huge factor here
If you are interviewing for 10-20 jobs in a particular week, a 2-4 hour take home test is a full time job 40 hours worth of work.
I would accept a 2-4 hour time commitment but only much later in the interview process, when much of the interviewing was done and the list was whittled down to 2-3 serious companies.
The test might be longer, but I don’t have to spend time studying and practicing something I’ll never use on the job. I use the same knowledge I build up while working
"This article summarizes the practical and theoretical implications of 85 years of research in personnel selection. On the basis of meta-analytic findings, this article presents the validity of 19 selection procedures for predicting job performance and training performance and the validity of paired combinations of general mental ability (GMA) and the 18 other selection procedures. Overall, the 3 combinations with the highest multivariate validity and utility for job performance were GMA plus a work sample test (mean validity of .63), GMA plus an integrity test (mean validity of .65), and GMA plus a structured interview (mean validity of .63)"
1. The research is dated (1998). Long before many of the current best practices in SW Eng were established.
2. Says it is based on 85 years of research. Obviously not IT-related then.
3. Even if we get past that it gives 3 almost equally good methods of hiring where the highest one is GMA and integrity test - not work sample test.
4. Even if we get past all that work sample means work sample. It does not have to be produced under pressure in a weekend as unpaid work which as any professional knows is very hard to bring yourself to do right (being a professional means getting paid for my services as I live from selling them). It can very well be some past work on github etc.
So, unless there is a better/more focused on IT/up to date research proving that take home tests (which btw can be offshored/gamed very easily) lead to better hires I remain highly skeptical of all that and big fan of whiteboard/pair programming.
Re: 3 - if you read the full paper you'll see that on its own a work sample test has (barely) the highest predictive value (but a lower confidence in that than GMA which is more heavily studied). This quote I think does more to demonstrate that GMA and integrity are less correlated than GMA and work product testing or structured interviews, which is intuitive.
Re: 4 - I'm not sure why you would consider a take home test less valid as a work sample test and then prefer a whiteboard test. Certainly the latter is less representative of real working conditions. (I probably would not want to work in a place where whiteboarding is more representative, at least!)
Your other two points are reasonable threats to validity but I don't think especially strong ones. The research covers many different professions so I think the onus would be on you to explain why software engineering is so different.
Yeah I think #1 and #2 are valid criticisms. My issue with the parent comment's line of reasoning is that it doesn't apply the same level of rigor to evaluating whiteboarding as it does to work sample tests. Our goal should be to identify which evaluation approach is most predictive based on the evidence available.
One thing I wish I could see but probably isn't compilable is how much an impact the test has on the final decision or what stage it's at.
Lots of the comments are pointing out a 2-4h takehome is a poor value prop from the candidate's side compared to a 1h leetcode which transfers better across companies. (I'd agree with this.) But a 1h leetcode is usually just a first hurdle to what's often another 1-3 rounds, while for me the point of the 2-4h takehome is that it's the only thing. You do it, we grade/talk about, we can arrange a more formal team meeting if you want but that's the whole process.
Am I alone in wanting shorter interviews but longer tests?
While it's not compilable, I did reach out to a few of the 5-star test designers and asked them this question. How teams use their tests (both in weight and stage) varies a lot, but the well-designed tests usually featured as a central component of the hiring process. They were often given at the stage right before the final round of interviews, but occasionally earlier. The 5-star test designers I spoke with weighted the test heavily because it:
1. helped them reduce bias, identifying great candidates even when they didn't have the backgrounds they expected
2. gave them a foundation for the future interviews. It's less stressful on candidates when they're already familiar with the code being discussed live (because they wrote it!)
Anecdotally, on-the-job performance of the candidates they ultimately hired seemed to correlate with performance on the test. I realize this is very hard to compare against other evaluation approaches though!
After scanning the first page, I quickly noticed "Want something unique to your team? Tell us what you need."
I'm not a marketing guy (by any stretch) so maybe my opinion isn't worth much, but I think this is outstanding - provide something already useful, and then invite people to purchase your customized even more useful stuff.
It's always nice to see something well done — kudos!
(You seem to be focused on software development, but if you have anything along the lines of practical math used in a lab/shop/factory environment around machining, mixing, monitoring, testing, etc., It'd be great to hear about it)
Thank you! I appreciate this because I think I'm terrible at marketing.
I happen to be an Applied Math major and spent years teaching competition math classes, though this was a while ago. If I can be helpful, shoot me an email: alex@trytapioca.com.
The reason why there aren't any 1-star projects isn't because they don't exist, but because we didn't think anyone would want to see them. Cataloguing these was a ton of work (we sifted through hundreds of tests). Including the 1-star ones seemed like it would only shame companies who used them.
I appreciate that you might not want to name-and-shame. I wonder if you could just scrub the identifying info and post them, though?
Personally I think it'd still be tremendously valuable to show clear examples of what "bad" looks like in your ranking system. Not only does it help you show off more of the hard work you did, but it also helps all your readers get a more complete understanding of your framework.
Beyond that, it's generally helpful -- in learning anything complex -- to see examples of "what not to do, and why."
To encourage more thoughtful test design (and hopefully save future candidates from the worst offenders), my team compiled the largest library of non-“whiteboard” take-home tests that real engineering teams have used. You’ll find the challenges that Stripe and Microsoft gave to their full-stack candidates, front-end tests from Tailwind and Rivian, and back-end ones from Basecamp and Revolut. Whether you’re looking to evaluate an Android, DevOps, or Data Science candidate, a bootcamp grad, or senior engineer, we found a few options for each.
Having built 20+ tests ourselves, we also rated the design of each test. The criteria for a 5-star rating:
1. Tests for skills highly relevant to those required for the position
2. Includes a well-written description of the prompt and even motivation for using a take-home test
3. Sets clear expectations for candidates (e.g. time requirements, evaluation criteria, submission details)
4. Asks for a reasonable time commitment from candidates (<4 hours)
A few notes: - We found most of these test prompts in public GitHub repos, usually owned by the hiring team but occasionally in the candidate-owned submission. We sifted through hundreds of tests and filtered out those overly focused on algorithms (aka LeetCode), leaving us with 142 tests in the library.
- The larger and more recognizable companies didn’t always have the best tests. Some of the most interesting prompts we found were from smaller teams (e.g. YC startups). This shouldn’t be surprising. Startups need to design candidate-friendly hiring experiences to compete for talent against more established players.
- There were common themes among the tests we found. For example, front-end candidates were often given a Figma design + content feed to implement, while back-end candidates had to implement an API given a set of requirements. Data scientists were usually given a data set to clean, analyze, and submit a Jupyter notebook with their findings.
- We’ll continue to update this library and add descriptions of each test so it’s easier to compare.
Have feedback, or another take-home test we should add? We’d love to hear from you!
[1] The Validity and Utility of Selection Methods in Personnel Psychology (https://www.researchgate.net/publication/232564809_The_Valid...)