Be careful with (outsourcing) hiring

Richard Möhn

Added 2022-10-20: To clarify up-front, the main point of this article is not to give hiring advice. More nuance is required for that. The main point is to tell you that you have to be careful with hiring and especially with outsourcing hiring.

Who makes up an organization, a community? People. People do the work. People make the decisions. Choosing people well is the most important thing we do. People choice is self-reinforcing, too. Choose one bad COO and he’ll hire more bad people. Have a few bad people in an organization and nobody good will want to work there anymore. Hiring, therefore, is vital.

What do you want from a hiring process? A good hire. Crucially, no bad hire. And for those people whom you haven’t hired to be mostly happy with how things went. Because you care for them.

Sadly, you’re at risk of making a bad hire and disgruntling your other applicants if you don’t know what you’re doing. If you don’t know what you’re doing, outsourcing isn’t a solution, either, because you don’t know how to judge the actions of those you’re outsourcing to. I will demonstrate this by example of a hiring process I’ve observed as an outsider, in which the hiring firm (call them Hirely) acted in a way that would have seemed sensible to the average founder who knows little about hiring, but to me looked like blundering. Even if you don’t plan to outsource hiring, the following points are worth thinking about.

Added 2022-10-16: I won't be arguing every point fully. One commenter even wrote that I make ‘lots of general assertions without a clear explanation as to why people should believe [me]’. (I appreciate this comment.) That's because doing otherwise would have made the article ten times longer. Hiring is a wide field and I've only tilled a small patch of it myself. I encourage you to follow the links to Manager Tools podcasts/whitepapers that I've included in the article. They argue many of the claims properly. I will also be glad to explain more in the comments.

Added 2022-10-22: A commenter points out, and I agree, that there are ways to outsource something safely without knowing about it. In the case of hiring, one way is to find strong evidence that the hiring processes run by a firm have resulted in many good hires, vanishingly few bad hires and the people who weren't hired being mostly happy with how things went. Although if you're paying attention to these three things, you already know something about hiring. (The above is not exactly what the commenter wrote. His comment is here. If you read it, please note that he wrote it based on ‘casually listening’ to my post and also read my reply, which counters many of his claims.)

Aside: Manager Tools is a decent authority

In the two main sections I will repeatedly reference Manager Tools in support of my arguments. Why is Manager Tools (MT) a decent authority to reference? (By the way, I’m in no way affiliated with or paid by MT. They have no idea that I’m writing this.)

Their core guidance is based on data

As far as I know, MT measure how well their core guidance works for their clients. And they adjust accordingly. Since they’ve presented their data on one-on-ones (mto3s) and feedback (mtfeedback), you can judge for yourself whether their research makes sense. They haven’t presented their data on hiring behaviours, but I suspect that they do have a lot of it. Additionally, the authors of MT’s hiring guidance, Wendii Lord and Mark Horstman, were recruiters themselves and claim to have in sum spent thousands (tens of thousands?) of hours interviewing. If this is true (and I have little reason to doubt it), they must have learned a thing or two about what works and what doesn’t.

Their guidance leads to success

Applying MT guidance has worked out well for many people. I conducted the selection phase of AI Safety Camp #5 (aisc5) based on MT hiring guidance (mthiringfeed). The participants whom my volunteers and I selected did well. One person thanked me when I rejected his application. And participants gave the selection phase the highest rating of all AISC selection phases so far.

I also followed MT guidance (mtinterv) for my last job search, writing résumés, preparing for the interview etc. It helped me get my current job at Spark Wave. On the other end of the job lifecycle, my wife used MT guidance (mtresign) to resign and she is still in good standing with her company, unlike others before her.

More generally, MT are a small company that has been training and advising organizations for a long time. They have been podcasting for seventeen years and get > 100 k downloads per week. And they appear to have made good hiring decisions themselves (mtsuccession).

Hirely puts you at risk of making a bad hire

They might not give you many (good) candidates

The more good applicants you have, the higher the likelihood that one of them will meet your standard. If you don’t have enough and none meets your standard, you will be tempted to hire the best of those who don’t meet your standard. This would be bad. Hirely, on the one hand, is good at getting many eyes on a job ad. On the other hand, the format of their job ad is a turn-off, especially for busy high-caliber people.

A good thing I’ve heard about Hirely is that they use marketing channels skillfully to get a job ad in front of many people. So they might bring in a decent number of candidates. The conversion from job ad and application form, however, likely won’t be as good as it could be. The drop-out throughout the hiring process would also be high because it’s intransparent and wastes people’s time.

At least the one job ad I’ve seen was almost three pages, with long sentences and big, empty words. Busy people don’t have time to read this. And good candidates tend to be busy because they’re good. Besides, we in the EA community care about imposter syndrome, right? I don’t have imposter syndrome, but to me,

Convert strategy into executable steps for growth, putting into practice operations planning, organization-wide goal setting, and performance management to scale the business effectively (quote rewritten to be not Googleable)

sounds much less like ‘I can do it’ than this:

Project planning and management: Take rough, high-level guidance from the CEO, turn it into concrete plans, then make sure that they’re carried out.

Further time wasters/points of intransparency:

Irrelevant questions on the application form, such as this for an operations job: ‘What do you think the field of AI Alignment most needs?’ Would you rule out a good ops person because they haven’t thought about whole-field strategy? I hope not. If a question doesn’t help you make a hiring decision, why do you ask it?
They ask applicants to take a Big Five assessment. At least the Big Five version I know can be gamed easily. Even integrity and conscientiousness tests that can't easily be gamed appear not predictive enough of job performance to be worth the applicants' time (selval). I concede that some assessments make sense in some hiring processes and I don't have space here to discuss that properly. The minimum viable point is: A Big Five assessment didn't make sense for this role. And Hirely's hiring process draft didn't mention taking it into account for any decision. It would have been a waste of applicant time. (Rewritten 2022-10-19. See also discussion.)
They don’t outline the whole hiring process up-front. Most egregiously, the process draft I saw didn’t tell candidates about a work trial (1-2 weeks of full-time work?) until after some test task, the Big Five test and two interviews. (By the way, work trials are a nice idea, but in many cases don’t make sense practically. I don’t have space to discuss this here.)
They haven’t figured out the whole process up-front, leading to delays between steps, which causes candidates to drop out. (Remember that people need a job and giving someone a job quickly is a competitive advantage. And you’re competing for good people.)

Their filters don’t work

What is bad? To turn down a candidate who did badly in the interview because of nervousness, only to later see them kill it at a different (possibly non-EA) org. What is far worse? To hire someone who did great in the interview, but then sucks at the job itself. Sadly, with Hirely you’re at risk to get both the bad and the worse.

How do you do better? By having filters that let nervous-but-awesome people shine and that expose the flaws in people who only seemed awesome. Call the former nervous aces and the latter confident duds. Sounds impossible? Not impossible, but of course difficult. (Added 2022-10-16: Another way to look at this is to think about spreading the field of applicants. (‘Spreading’ in the sense of ‘widening’, not ‘distributing’.) I'm too lazy to revise this section based on that frame. So just an example. – If I was interviewing for a senior developer position at GuidedTrack, I could ask: ‘Tell me about a time you successfully added two numbers using a pocket calculator.’ Since almost everyone can use a pocket calculator, the answers would barely distinguish candidates. If, however, I ask, ‘Tell me about a time you successfully fixed a defect in an interpreter’, it would open a wide gap between candidates who likely are and aren't fit for the job.)

I will first describe a way to do it right, then how Hirely does it wrong. Basically this is an extremely compressed selection of MT guidance. It is much too short to be a guide to hiring well. If you want to hire well, go to the source (mthiringfeed, mtehm). (Added 2022-10-21: Of course, MT is not the only valid source about hiring. Some of their guidance might not apply to your case. Some of it might be outperformed by more EA-fashionable approaches. But I think they're an excellent starting point, and you will do decently if you rely on their material and common sense alone.)

An inkling of how to filter well

(I’m focussing on structured behavioural interviews as the main filter. Further filters that work are structured technical interviews, work samples and IQ tests (selval). I have thoughts on those, but no space to include them here.)

Set a standard. A high standard. In the rest of the process, compare people against this standard. Do not compare them against one another. Comparing people against one another feeds your biases (‘I like that guy better’). Do not rank them. The best of ten candidates who don’t meet the standard still doesn’t meet the standard. (You may rank people at the very end when you have to decide whom to offer first.) In the whole process look for reasons to say no. Because hiring the wrong person results in Hell on Earth. (mtbarhigh)
Narrow the field of applicants by sorting résumés (mtresumes). (Yes, résumés are a flawed filter. If you care about false negatives, you need to modify the approach. I don’t have space to discuss this here.)
Design a structured, behavioural interview (mtcreation, mtbehavioural) that helps you compare candidates against the standard you set. Every question must help you find out whether a candidate meets the standard. In particular, it must dig up any reasons to say no. If it doesn’t help you make a decision, it’s a waste of time.
- A structured interview is one with the same set of questions for each candidate and a fixed procedure for scoring each answer. See also the appendix. MT doesn’t go as far as assigning numeric scores, but their interview creation tool (mtcreation) at least provides criteria for each question.
- A behavioural question sounds like this: ‘Tell me about a time when you successfully planned an event with more than twenty people.’ Ie. you ask an open-ended question about past behaviour. Past behaviour because that’s a hard fact: A confident dud can’t confabulate a shiny ‘I would do this and that‘ answer like he can for a scenario question. And even if he tells you about a past ‘success’, you can probe it until you get to its empty bottom. In contrast, the nervous ace gets more confident as you probe into the awesome details of her past behaviour, which she forgot to tell you about because she was nervous. (With confident duds I don’t mean people who are simply unfit for a certain job. They will do well somewhere else. And I don’t mean those who are confident and talk a lot, but still get the job done. I mean those people who get jobs easily because of their confidence and then cause harm. They are rare, but destructive.)
Narrow the field further by doing phone screens.
- Here you decide whom you will spend hours of interviewing time on in the next step. Do you want to outsource this decision to a hiring consultant or HR? Only if you can trust them to know what you’re looking for. Often you can’t. (mthrscreen)
- The way to do this is similar to a behavioural interview that is cut off after the second or third question. For details see mtphonescr.
Interview the remaining few candidates intensely.
- Reduce stress. Be clear before the interview about what will happen how. Be as friendly as you can be. Smile. Start with small talk. See also mtstress. Reserve matters of salary until you make an offer. I don’t have data on this, but ‘salary negotiation’ must be one of the most dreaded parts of an interview. So leave it out. It has no bearing on whether the candidate can do the job.
- Ask behavioural questions (see item 3) and probe (mtprobe). A candidate will seldom tell you everything you need to know. You must to find it out by asking follow-up questions.
- Several people in your organization interview each candidate. Everyone uses the same set of questions, with exceptions for technical interviews (mtehm, p. 111).
  - Yes, this means that the candidate has to answer the same question repeatedly. But it won’t be the same because every interviewer will hear the answer differently and ask different follow-up questions.
  - You might say that this takes a lot of time from the candidate (and the organization) and you’re right. But you would only do this with a few candidates. And it’s in the interest of the candidate: Wouldn’t you be glad to see a manager take hiring seriously and and know that your potential colleagues will all be up to par? Wouldn’t you want to be listened to in detail before it’s decided whether you’ll be allowed to contribute the next two, five, ten years of productivity to an organization? (And mind you, a few hours is still mighty short for judging whether someone can do a job.) Finally, would you want to be hired by mistake into a role that doesn’t fit you? I would rather spend an hour more being interviewed than three months suffering in a job that wasn’t for me, only to be fired or resign and then be asked by every potential employer about that three-month engagement on my résumé.
  - This is also a way to reduce bias. Having diverse interviewers is an easy way to increase diversity on your team without introducing quotas or lowering standards for certain groups of people.
After a series of interviews hold an interview results capture meeting (mtcapture). Each interviewer has to say ‘hire’ or ‘don’t hire’ and justify this recommendation. One ‘don’t hire’ is generally enough to rule out the candidate, but you as the hiring manager make the final decision.

There is much more to know around offers, turning down candidates, onboarding etc. No space for that here.

How Hirely doesn’t filter well

Now how does Hirely do it wrong? In short, they are mediocre at setting a standard and many of their questions aren’t dispositive, ie. they don’t help find out whether the candidate meets the standard. They do the phone screen themselves and plan on only one further interview by the hiring manager. And they increase interviewee stress. I’m assessing this based on a draft hiring process document from Hirely. Since I was advising the hiring manager in parallel to Hirely, I’m familiar with the requirements for the role, which is a ‘COO’/high-level ops role.

To get into the worst detail first: scenario questions. A good candidate will have a lot to tell about past successes that match the requirements for the role you’re hiring for. (These don’t have to be suit-and-tie, commended-by-CEO ‘professional’ successes. If you’re hiring an intern and are looking for planning skills, having built a weather-proof tree house in ninth grade can be an adequate accomplishment to talk about.) Detailed information about past successes is highly useful, but Hirely barely asks about it. (Talk about disgruntling applicants!) Instead, their main interview appears to focus on scenario questions, which put the nervous ace on the spot while being a perfect opportunity for the confident dud to confabulate something that sounds great. For each follow-up you ask, she’ll invent a good answer. How do you know it’s the right answer? You don’t because it’s all hypothetical! And what if you’re dealing with a clever interviewee who asks you a question back? I’m being facetious. Case study interviewing is a well-known technique. But it’s hard. There is a lot more to a case study than asking a three-sentence scenario question. Stick to behavioural questions about the past.

Let me backtrack and point out that Hirely is using fixed criteria and scoring rubrics. This is good. Unfortunately, the criteria I can see in the draft are only mediocre. Some are irrelevant, such as ‘Strong familiarity with AI Safety and Existential Risks’. – This is nice to have. But you wouldn’t reject an application from a highly successful ops person who only heard about AI safety last week. – They are not specific, either. For example, when they ask for ‘high levels of conscientiousness’, every interviewer has to make up their own idea of where ‘high levels’ are.

I also have to point out that Hirely has good intentions. They try to save the hiring manager time by doing the phone screen themselves. And they plan one main interview with the hiring manager. This is internally consistent, but doesn’t make sense if you want to hire well. I’ve established above that one main interview is not enough (see also mtmult). You need to dig deep. You need to have multiple perspectives. Even if you don’t buy the diversity thing, wouldn’t you want to give your team a say about whom they have to work with for several years? If you’re going to spend a lot of time interviewing, you need to choose well whom to spend that time on. You can only outsource this decision if the consultants are intimately familiar with your hiring needs. And the previous paragraph hints that they are not.

Now, Hirely might say: ‘That’s why we record the phone screens.’ And later in the draft it sounds as if the hiring manager is supposed to watch all those recordings. First of all, if I have to spend time watching all the recordings, why would I not do the phone screens myself and ask the questions I want to ask? Second, recording stresses people. What else stresses people? Panel interviews (mtpanel). And questions about compensation! Both Hirely interviews are panel interviews (with good intentions: they want to assist the hiring manager). And they plan to discuss compensation at the end of the main interview. Once more, they’re putting the nervous ace at a disadvantage.

Coming back to questions. If a question doesn’t help you decide whether or not to hire, it’s a waste of both your and the candidate’s time. Additionally, it takes time away from questions that do help you decide. A Big Five assessment is one group of questions that don’t work. I’ve commented on this in the section about time wasters. Most questions in the phone screen aren’t good at digging up information, either. Example: ‘What are you really good at that might apply to this position?’ Candidate: ‘Er, I’m good at project management.’ Interviewer: ‘What else are you good at?’ Candidate: ‘Hm, event planning and, uh, accounting.’ Interviewer: ‘Give me an example of a time you used that skill.’ Candidate: ‘I used it for organizing last year’s orchid auction.’ – You’ve asked three questions, gotten three answers and still don’t know much more about the candidate’s ability. Of course, you could now start probing into that orchid auction piece. But Hirely doesn’t mention probing once. And you could have circumvented that awkward back-and-forth by instead asking an open-ended question about a particular skill you’re interested in: ‘Tell me about a time you successfully ran an event, including managing its finances.’

Hirely puts you at risk of disgruntling applicants

(This section is much shorter than the previous only because I’ve addressed many points above. It is important nevertheless.)

When I ran the applications process for AI Safety Camp, one applicant whom I had to turn down replied: ‘Thank you for your thoughtful and graceful rejection email.’ How do we treat candidates in a way that keeps them happy even when we have to reject their applications? We don’t waste their time during the process. We don’t increase their anxiety and imposter syndrome. And ideally we give them something useful, such as an option to get feedback. This is an aspect of hiring which Manuel Allgaier of EA Germany has made me especially aware of. There have been at least two popular posts on this forum about this, too: eaf1a, eafcost I don’t agree with all of that, but it is important feedback.

Unfortunately again, Hirely doesn’t appear to take it into account. Above I’ve addressed time wasters, anxiety (nervousness) and imposter syndrome. And you know what makes you as a company look even worse? Rejecting a candidate, who has gone through application form, test task, personality test, two interviews and a work trial … by email! That’s what Hirely is planning to do. Rejection hurts. It hurts only a little bit when it comes from a warm voice on the other end of the telephone (mtturndown). Show them that you care.

(I mention above that I sent rejection emails. That’s acceptable at earlier stages when the candidate hasn’t had to spend much time yet.)

My last gasp: In their rejection email, at least the one after the second interview, they preclude feedback. I grant that feedback in application processes is a difficult thing. It allows people to argue and, in the worst case, to sue you. So you don’t have to offer it explicitly. But precluding it outright makes you look bad. And if a candidate asks you and you do it right (mtdemo), you provide value even to someone you’ve had to turn down.

Added 2022-10-20: Let me remind you one more time, the main point of this article is not to give hiring advice. More nuance is required for that. The main point is to tell you that you have to be careful with hiring and especially with outsourcing hiring.

Appendix: On structured vs. unstructured interviews

Unstructured interviews have no fixed format or set of questions to be answered. In fact, the same interviewer often asks different applicants different questions. Nor is there a fixed procedure for scoring responses; in fact, responses to individual questions are usually not scored, and only an overall evaluation (or rating) is given to each applicant, based on summary impressions and judgments. Structured interviews are exactly the opposite on all counts. In addition, the questions to be asked are usually determined by a careful analysis of the job in question. As a result, structured interviews are more costly to construct and use, but are also more valid. – selval

References

aisc5: https://aisafety.camp/2021/06/23/aisc5-research-summaries/
eaf1a: https://forum.effectivealtruism.org/posts/jmbP9rwXncfa32seH/after-one-year-of-applying-for-ea-jobs-it-is-really-really
eafcost: https://forum.effectivealtruism.org/posts/Khon9Bhmad7v4dNKe/the-cost-of-rejection
lwspec: https://www.lesswrong.com/posts/XosKB3mkvmXMZ3fBQ/specificity-your-brain-s-superpower
mtbarhigh: https://www.manager-tools.com/2007/04/effective-hiring-set-the-bar-high
mtbehavioural: https://www.manager-tools.com/2008/06/how-create-simple-behavioral-interview-question
mtcapture: https://www.manager-tools.com/2008/04/the-interview-results-capture-meeting
mtcreation: https://www.manager-tools.com/products/interview-creation-tool
mtdemo: https://www.manager-tools.com/2013/07/you-did-not-demonstrate-part-1-hall-fame-guidance
mtehm: https://www.manager-tools.com/products/effective-hiring-manager-book
mtfeedback: https://www.manager-tools.com/2021/03/manager-tools-data-feedback-part-1
mthiringfeed: https://www.manager-tools.com/podcasts/important-topic-feeds/hiring-feed
mthrscreen: https://www.manager-tools.com/2019/01/should-hr-do-my-phone-screens-i-interview
mtinterv: https://www.manager-tools.com/products/interview-series
mtmult: https://www.manager-tools.com/2011/01/conduct-multiple-interviews-chapter-1-part-1
mto3s: https://www.manager-tools.com/2019/01/manager-tools-data-one-ones-part-1-hall-fame-guidance
mtpanel: https://www.manager-tools.com/2016/08/no-panel-interviews
mtphonescr: https://www.manager-tools.com/2015/12/how-do-phone-screen-interview-part-1
mtprobe: https://www.manager-tools.com/2012/01/first-rule-probing-interview
mtresign: https://www.manager-tools.com/2006/07/how-to-resign-part-1-of-3
mtresumes: https://www.manager-tools.com/2016/05/how-scan-resume-part-1
mtstress: https://www.manager-tools.com/2019/12/effective-hiring-manager-missing-chapters-reducing-interviewee-stress-part-1
mtsuccession: https://www.manager-tools.com/2022/07/special-cast-succession-planning
mtturndown: https://www.manager-tools.com/2014/11/how-turn-down-job-candidate-part-1
selval: https://web.archive.org/web/20160227075513/http://lab4.psico.unimib.it/nettuno/forum/free_download/articolo_114.pdf

Background: This is a repurposed article with a history

(Expanded on 2022-10-16 from the last paragraph of the originally published introduction. Moved to the end of the article on 2023-08-22. This describes the article's history in boring detail.)

This article is strange because it's repurposed from a direct critique of the organization behind ‘Hirely’, which is a fictional name. (Please don't try to find out who is behind that name.) This is the first part of the article's history step-by-step:

I observe the hiring process in which Hirely is advising the hiring manager. (I'm also advising the hiring manager, mostly telling him to be more involved and listen to Manager Tools.)
I think Hirely is giving harmful advice.
I write this article as a direct critique.
I give the article to Hirely to react to. I tell them that I will edit it to be more general and not point the finger at them (meaning I won't out the organization by name) if they convince me that they're on a better trajectory. You may view this as being kind or you may view it as blackmail.
Hirely deliberates internally.
Hirely responds to me with the improvements they've made and are making, and asks me to deliver on my promise to edit the critique before publishing. Their response does make me think they're on a better path. (So again, please don't try to find out who they are.)
Since I'm too lazy to rewrite the whole damn article, I search and replace the organization name with ‘Hirely’, and rewrite only the introduction.

Despite this laziness, I was happy with the way it demonstrated the new main point with specificity (lwspec): If you know very little about X, you can’t safely outsource X. If X = hiring, it’s especially bad. So you better learn something about hiring.

You're already yawning, but the story isn't over. To explain all the strangenesses of this article, I have to describe the rest:

I publish the article.
It receives a lot of downvotes in addition to upvotes.
I add a prescript (opposite of postscript) asking for the downvoters to at least hint at what they don't like about this article.
alexrjl and Kirsten helpfully comment with their speculations why people might be downvoting. (Thank you, Alex and Kirsten!) None of the downvoters comment.
More downvotes and upvotes, steadying at 3 total karma from 41 votes.
It turns out that this article was mass-downvoted by anonymous accounts created for this purpose. The EA Forum team confirms this to me (and you'll see a note from them at the top of the comments).
The EA Forum team reverts the mass-downvoting.
The EA Forum team graciously reruns the article, which might be how you came to read it.

40 Reactions

Mentioned in

35EA & LW Forums Weekly Summary (17 - 23 Oct 22')

More posts like this

Comments39

Sorted by

New & upvoted

Click to highlight new comments since: Today at 7:10 PM

LizkaOct 17 2022Moderator Comment53

A note from the moderation team: this post was shared on September 5, but a user used a number of anonymous/throwaway accounts to downvote it,^[1] so we are reposting it.

Down-voting using multiple accounts is a serious breach of Forum norms (you can see the guide). We are banning the user for 6 months.

We will not be sharing the identity of this user, because our old Guide to Norms was not clear on which situations would lead to de-anonymization of any part of voting behavior. We take privacy and protecting anonymity seriously, and didn’t want to act in a way that would endanger users’ ability to trust us on this (by, for instance, interpreting our guide in a way that is less generous than would be possible, or applying new guidelines retroactively). We have updated the Guide to make it clear that, given serious breaches of Forum norms (e.g. if something leads to a ban), we may leave a public comment about the reason for the ban, even if that threatens the user’s anonymity. (We will try to maximally protect privacy and pseudonymity, as long as it does not seriously interfere with our ability to enforce important norms on the Forum.) Please let us know if you have any questions about this.

^{^}
which led to the post's quick disappearance from the Frontpage

LinchOct 17 202235

Thank you so much for your hard work guarding the forum against malicious behavior.

MichaelA🔸Oct 18 202227

There's a decent amount in this post that I agree with, but also sufficiently poor epistemics/reasoning transparency[1] (in my view) and enough advice that seems inaccurate[2] that I would recommend people either (a) don't read this post or (b) read it only with substantial skepticism and alongside other sources of advice (and not just Manager Tools, though maybe use that as well, just with skepticism). One place other advice can be found is in this quick collection I made: Quick notes/links on hiring/vetting

I do think this post has some advice many people would benefit from, but the post also confidently asserts some things that I think are probably inaccurate or bad advice.[2] And (in my view) the post doesn't make it easy for people to discern the good from bad advice unless they already have good knowledge here (in which case they'll also gain little from the post), since the post provides very little sources or arguments and has the same confident tone for all its advice (rather than presenting some things as more tentative).

Overall, I think that this post is below the epistemic standards we should want for the Forum. (I'm aware this is somewhat rude/aggressive to write, and I promise I come across as friendlier in person! Also, in part [1] of this comment I provide some concrete recommendations / constructive advice that may be helpful for future posts. And to be clear, I do expect that the author was well-intentioned and trying to be helpful when writing this.)

[1] Regarding poor epistemics / reasoning transparency in my view: It sounds like the author deliberately chose to release a short-ish and maybe quickly-ish written post rather than no post at all ("I won't be arguing every point fully. [...] That's because doing otherwise would have made the article ten times longer"). That's a reasonable choice. But in this case I think it turns out too much of this advice is probably bad for the resulting quick post to be useful to many people. I think the ideal version of this post would involve maybe a bit more word count and a bit longer time spent (but doesn't have to be much more) and would:

Indicate differing levels of confidence in different claims
Probably be substantially less confident on average
- I'm not saying that all posts should be low-confidence. But I think that should probably be the case when:
  - the author has limited experience
    - I'm basing this on the author saying "Hiring is a wide field and I've only tilled a small patch of it myself". To be clear, I'm not saying people should only write posts about things they have lots of experience on, just that limited experience plus the following things should warrant lower confidence.
  - The author didn't have time to properly find sources or explain the arguments
    - This is relevant because in the process of doing so one often realises oneself that the claims rest on shakier ground than one thought.
  - The main source seems to be a podcast/site (Manager Tools) which (I think) itself has poor epistemics
    - I've engaged a fair bit with Manager Tools, and think it's useful overall, but in my experience they also usually provide no evidence or arguments for their claims except their personal experience, and I think many of their claims/advice are bad. So when I'm engaging with it I'm doing so with skepticism and sifting for a few takeaways that are in hindsight obviously true or that I can test out easily myself.
Provide more of the reasoning/arguments for various claims, even if that just means saying more of what one already believes or thinks rather than thinking things through in more detail or finding more sources
Be based on more careful reading of sources, to avoid drawing from one source a conclusion that seems exactly opposite to what the source finds
- (I haven't checked any of the other claims/sources properly, so this may not be the only such issue)
Probably defer less to sources which don't themselves have strong evidence bases or reasoning transparency (Manager Tools)
Maybe find more high-quality sources (though it'd also be reasonable to not bother with that)

[2] Some things in this post that I think are inaccurate or bad advice (though unfortunately don't have time to explain my reasoning for most of these):

The claim that personality tests aren't predictive of job performance
- My prior belief from reading some papers a while ago and from what I think would make sense (given how well I think Big Five personality tests can predict some other things) is that they are predictive. And the paper cited in this post for the claim that they're not predictive actually seems to be showing they are ~~one of the most predictive things~~. [Edited to add: Whoops, I misread the source myself. Though still the source seems to be saying they're weakly/moderately predictive, and have unusually high incremental validity when added to other specific assessments, rather than saying "not predictive".]
Saying it's bad to outsource things you don't know because you can't assess performance well in those cases, without flagging that this might not apply anywhere near as much in cases where an outsourcee comes highly recommended by people you trust and whose situation is similar to yours, and when the outsourcee has credible signals of sharing a lot of your values
- (E.g., if they're a credibly EA-aligned hiring service/advisor who comes recommended from relevant other people based on performance there.)
Suggesting filtering only via resumes rather than instead/also via screening questions that are basically a very short work test
Suggesting giving many interviews and relying heavily on them, and suggesting candidates answering the same questions from each interviewee
- I think that latter advice could only make sense if interviewees don't have a pre-set rubric and are just assessing answers subjectively, which seems unwise to me, or if the interviews are close to unstructured rather than structured, which also seems unwise to me. Also, relative to work tests, this takes up a lot of candidate and esp. hirer time.
Making final decisions based on a meeting where people verbally share their assessments one at a time (which could then be anchored by whoever speaks first or groupthink or whatever), rather than collecting scores along the way and sticking to them by default (though with room to manoeuvre) to minimise noise (see also the book Noise)

Those last three things basically seem consistent with the ways traditional hiring is bad and seems based on prizing subjective judgement and not really quantitatively assessing things or checking what research shows works best, in contrast to what's common in EA orgs + leading tech firms (as far as I'm aware). And that seems consistent with my experience of the kind of advice Manager Tools tends to give.

Also, these are just most of the things that stood out to me when casually listening to the post (in audio form), and that's enough to make me think that there may also be some other issues.

Richard MöhnOct 18 20229

It would be nice if you moved your last paragraph first. Recommending (in bold) based on a casual listening not to read a post feels unfair. (Speaking about fairness, there are other posts about hiring (eg. https://forum.effectivealtruism.org/posts/XpnJKvr5BKEKcgvdD/perhaps-the-highest-leverage-meta-skill-an-ea-guide-to#comments) that are argued more thinly than my post and have gotten much less criticism.) I agree with your alternative recommendation to read the post with skepticism. That's the case with any post.

I agree with some of your points. Especially the one about ways to outsource without knowing about the subject. I might work it into the article. (Two sentences added 2022-10-19.) (2022-10-22: I've worked it into the article.) Overall I think there are misunderstandings, which I should have worked harder to avoid.

The main point is not hiring advice. (2022-10-20: Added clarifying comments to this end.) The main point is that people who hire need to learn more about hiring. I'm merely demonstrating this by countering Hirely, which I do based on hiring guidance that I'm fairly confident is better than theirs. How this came about I describe in the section about the article's history.
I agree that it would be nice to have more epistemic nuance and diligence. But this article wouldn't exist had I tried for that, because I don't have more time. Given the choice between not publishing the article and publishing it with some epistemic hubris, I thought the latter is more beneficial.
The main resource for this article is Manager Tools because I think their hiring guidance is an excellent starting point. This is based on wider reading and listening and thinking and experiencing, which I have been doing for years. I agree that their arguments are often sloppy and there are areas other than hiring in which they're weaker than they admit. Your claim that ‘they also usually provide no evidence or arguments for their claims except their personal experience’ is wrong as it stands. They provide evidence for some of their claims, which I've linked in my section on why Manager Tools is a decent authority. (Note that I wrote ‘a decent’, not ‘the ultimate’ – epistemic humility.) I wish they would provide evidence for other claims, too. And the bulk of each podcast/whitepaper is made up of arguments for their claims, spotty though they might be at times. Personal experience can be a valid support for arguments, too, if it's believable that you have a lot of it. It is believable that MT has a lot of personal experience (and data) about hiring, which I also point out in my section about them.

By the way:

I do think I'm fairly transparent and indicate levels of confidence to some degree, perhaps in a different way than you're used to. I don't have time to read and digest the full Reasoning Transparency article. But if I look at the top recommendations:
- ‘Open with a linked summary of key takeaways.’ – That's the title in my article.
- ‘Throughout a document, indicate which considerations are most important to your key takeaways.’ – With some squinting that's the table of contents in my article.
- ‘Throughout a document, indicate how confident you are in major claims, and what support you have for them.’ – I do indicate my support (Manager Tools) and why I think it's a support. I am quite confident in my main claim and in the main supporting claims. Note also that I don't write that outsourcing hiring is terrible, but that you need to be careful. And I don't write ‘Hirely will cause you to make a bad hire’, but that they put you at risk of making a bad hire. Among others.
I think I've refuted part of your claim about personality tests in other comments. Added 2022-10-22: Even if Big Five assessments can predict some things well, I would expect that to only be the case when they're not gamed. The one I know can easily be gamed. (Eg. just answer ‘very accurate’/‘very inaccurate’ to all the question that sound like pushing up/down the conscientiousness score.) And an important part of hiring is keeping out people that make themselves look better than they are.
Expanded 2022-10-23: Some of the arguments against the ‘bad’ hiring advice had already been addressed in my article. Eg.: I point out that the résumé screen is a flawed filter and needs to be supplemented or replaced, depending on the situation. I briefly argue why it's okay to take a lot of time interviewing. There is also an easy fix for anchoring/groupthink in the interview results capture meeting (as well as other, more subtle, reasons to do it): Ask people to send you their recommendations and justifications before the meeting.
Added 2022-10-19, edited 2022-10-23: In general, the items about ‘inaccurate or bad advice’ have common-sense arguments for and against them. (If anyone wants me to write any of them out, let me know.) It would come down to analysing Manager Tools' (unfortunately non-public) data as well as the research you're referring to. I would expect ‘traditional hiring’ when done right to work roughly equally well to whatever good EA orgs and leading tech firms do. My default expectation when people say that the ‘traditional way to do X is bad’ is that it actually works well when done right, but that it has a bad image because it got corrupted over time/is usually done mediocrely.

LinchOct 17 202223

They ask applicants to take a Big Five assessment. Personality tests are not predictive of job performance (at least integrity and conscientiousness tests aren’t: selval). And they can be gamed.

This is not how I read your link (pg 265 in Psychological Bulletin 1998, Vol. 124, No. 2, pg.4 in pdf). In the relevant metrics, it seems like incremental validity (above just using intelligence tests) for both conscientiousness and integrity tests is very high, on comparable levels (!) to work sample performance and structured interviews.

Richard MöhnOct 17 20221

Thanks for pointing that out. I had only looked at the validity of each method on its own and not at the validity gain numbers. Don't the results indicate, though, that you would have to also subject the candidate to a GMA test if you want to get validity from conscientiousness and integrity tests? And GMA tests are rarely performed in hiring processes.

Pulling up an addendum from below (added 2022-10-19):

I would explain the high incremental validity by the fact that a GMA test barely measures conscientiousness and integrity. In fact, footnote ‘c,d’ mentions that ‘the correlation between integrity and ability is zero’. But conscientiousness and integrity are important for job performance (depending on the job [and the hiring manager]). I would expect much lower incremental validity over structured interviews or work samples. Because these, when done well, tell a lot about conscientiousness and integrity by themselves.

MichaelA🔸Oct 18 20222

[Update: This comment of mine was wrong, but I still think the claim in the post is contradicted by the source cited; see below.]

It looks like integrity and conscientiousness tests were also the 3rd and 4th highest rated things for the "Validity" column itself, out of a large list? And they appear to have ranked above interviews and some CV-like things (e.g., reference checks and job experience (years)), yet your post recommends interviews and CVs.

I'm pretty confused by this apparent misreading of the source. I think readers should treat that as a reason for being more skeptical of the rest of the post. My guess is it'd be worth you "moving a bit slower" (to check sources more carefully etc.) and stating things less confidently in future posts.

Richard MöhnOct 18 20228

They are the third and fourth row in the table, but the rows aren't ordered by the validity column. When you order by the validity column, integrity tests are 8th and conscientiousness tests are 12th unless I've miscounted.

I admit that I cherry-picked this article, basically only looked at the validity numbers in the table, and don't know anything else from that literature. This post has a wider view for those interested: https://forum.effectivealtruism.org/posts/j9JnmEZtgKcpPj8kz/hiring-how-to-do-it-better On the other hand, the validity table was excerpted on an 80,000 Hours page for years, which gives me some confidence by proxy. Also, the points of my post rely only lightly on this article.

Added 2022-10-19: It would be nice if you removed the bolding on the text that you found to be wrong. Otherwise the impatient reader is apt to miss the bracketed text above.

MichaelA🔸Oct 18 20227

Oh crap, my bad, should've looked closer! (And that's ironic given my comment, although at least I'd say the epistemic standards for comments can/should be lower than for posts.) Sorry about that.

Though still I think "not predictive" seems like a strong misrepresentation of that source. I think correlations of .41 and .31 would typically be regarded as moderate/weak and definitely not "approximately zero". (And then also the apparently high incremental validity is interesting, though not as immediately easy to figure out the implications of.)

I agree that "the points of [your] post rely only lightly on this article". But I still think this is a sufficiently strong & (in my view, seemingly) obvious* misrepresentation of the source that it would make sense to see this issue as a reason for readers to be skeptical of any other parts of the post that they haven't closely checked.

(I also find it surprising that in your two replies in this thread you didn't note that the table indeed seems inconsistent with "not predictive".)

*I.e., it doesn't seem like it requires a very close/careful reading of the source to determine that it's saying something very different to "not predictive". (Though I may be wrong - I only jumped to the table, and I guess it's possible other parts of the paper are very misleadingly written.)

LinchOct 18 20224

Particularly when the most predictive things in that table were .51 and .54.

Richard MöhnOct 18 20221

Okay, you convince me. I've rewritten that item.

The reason why I wasn't noting that the table is inconsistent with ‘not predictive’ is that I was unconsciously equating ‘not predictive’ with ‘not sufficiently predictive to be worth the candidate's time’. Only your insisting made me think about it more carefully. Given that unconscious semantics, it's not a strong misrepresentation of the source either. But of course it's sloppy and you were right to point it out.

I hope this somewhat restores your expectation of my epistemic integrity. Also, I think there is not just evidence against, but also evidence for epistemic integrity in my article. That should factor into how ‘skeptical’ readers ought to be. Examples: The last paragraph of the introduction. The fact that I edit the article based on comments once I'm convinced that a comment is correct. The fact that I call out edits and don't just rewrite history. The fact that it's well-structured overall (not necessarily at the paragraph level), which makes it easy to respond to claims. The fact that I include and address possible objections to my points.

Addendum: I would explain the high incremental validity by the fact that a GMA test barely measures conscientiousness and integrity. In fact, footnote ‘c,d’ mentions that ‘the correlation between integrity and ability is zero’. But conscientiousness and integrity are important for job performance (depending on the job). I would expect much lower incremental validity over structured interviews or work samples. Because these, when done well, tell a lot about conscientiousness and integrity by themselves.

alex lawsenSep 5 202215

I'm guessing here, but I imagine that the source of the downvotes might be that this piece is a specific criticism of one organisation, framed as a more general commentary on hiring. I also suspect that the organisation is guessable (there's lots of quite specific detail, including quotes from job ads), though I haven't guessed.

I suspect that either a general piece about pitfalls to avoid when hiring, or an open criticism of "hirely" (potentially having given them a chance to respond), would be better received.

(I haven't up or down voted, as I haven't dug into the object level claims yet)

KirstenSep 5 202223

I agree with Alex.

In this article I also saw lots of general assertions without a clear explanation as to why people should believe you. For example, you express disdain for scenario questions, but I'm an experienced hiring manager and I've often seen scenario questions (with follow up questions to understand the candidate's thought process and normal way of working) referenced as an example of good practice in an interview. I would need a clearer explanation from you than "people could be nervous" as to why you think these are not useful. (I also disliked the repeated use of the word "stud" given its sexual connotations!)

I also didn't upvote or downvote.

Richard MöhnOct 18 20226

It turns out that the article was mass-downvoted by throwaway accounts. So there weren't many downvotes to find a reason for after all. Thanks for your comment anyway. It allowed me to partially address some weaknesses of my article.

Richard MöhnSep 5 20225

Thank you, too, for explaining your reservations! And sorry about the word ‘stud’ – I didn't know the connotations. I've now replaced it with ‘ace’.

I'll try to explain the general assertions and my disdain for scenario questions. This is not arguing against the points you made. It's just a clarification, which I should work into the article.

General assertions – I wasn't aware of this as a problem of my article. Thanks for pointing it out! – There are so many claims in the article that I don't have space and time to argue them all properly. That's why I have lots of Manager Tools links throughout, since they have whole podcasts/whitepapers on each topic, where they do argue things (mostly) properly.
Scenario questions – a better way to structure the argument: Assume you need someone who can manage projects well and you have limited time. Which is going to give you more (and more reliable) information about a candidate's ability to manage projects? Asking about how they've managed past projects or about how they would manage project X? Which is more predictive? Now, an experienced person would probably justify scenario answers with examples from their experience. And someone inexperienced could only cite from the course they just visited. So I see how scenario questions can work. Behavioural questions are just a more direct way of getting at what I want to know. (This is still only a plausibility argument. Ultimately it comes down to data, which I haven't looked at. But I mostly trust MT to have looked at the data.)

Richard MöhnSep 5 20222

Thanks for attempting an explanation! I've now added a bit of clarification at the end of the introduction. Ie. I did write a closed criticism and gave them a chance to respond. They didn't want me to publish it as-is, so I ‘rewrote’ it to be more general. As is obvious now, I should have been less lazy and rewritten the whole thing rather than only searching and replacing the org name and rewriting the introduction.

To reduce guessability further, I've now rewritten the quote as much as I could while preserving its flavour.

Richard MöhnOct 18 20221

It turns out that the article was mass-downvoted by throwaway accounts. So there weren't many downvotes to find a reason for after all. Your comment helped me clarify some points, though.

Charles HeOct 18 202211

The performance of the org you're describing is wildly bad by itself. What you wrote seems credible (besides the style things I mentioned).

You have also experienced serious misconduct/abuse by voting manipulation against you.

Can you say whether the org you're criticizing is an "EA org", or "EA funded", or has people known to the community? Can you say if the people who abusively mass voted you are part of this org you criticized?

EA isn't some big corporation, or friends who get funding for each other. It seems good to communicate this.

Richard MöhnOct 19 20221

I appreciate your questions and can't comment on them right now, I'm afraid. Sorry. I hope to be able to answer in a few weeks' or months' time. Please have patience.

(Thanks also for your other comment. I'll respond to that later today or tomorrow.)

Richard MöhnJan 22 2023-11

Howie_LempelSep 5 20227

[Unfortunately didn't have time to read this whole post but thought it was worth chiming in with a narrow point.]

I like Manager Tools and have recommended it but my impression is that some of their advice is better optimized for big, somewhat corporate organizations than for small startups and small nonprofits with an unusual amount of trust among staff. I'd usually recommend somebody pair MT with a source of advice targeted at startups (e.g. CEO Within though the topics only partially overlap) so you know when the advice differs and can pick between them.

Richard MöhnSep 5 20225

Good point, thanks! Manager Tools usually explain their guidance in detail, which makes it adaptable to all kinds of organizations. And since MT itself is a small company with, I guess, an unusual amount of trust among staff, I don't think they would put out material that fails to apply to them.

But I do agree that wider reading is necessary. Paul Graham's essays, for example, are a good counterpoint to MT's corporate emphasis, too.

Howie_LempelSep 6 20227

"I don't think they would put out material that fails to apply to them."

I think we mostly agree but I don't think that's necessarily true. My impression is that they mainly study what's useful to their clients and from what I can glean from their book, those clients are mostly big and corporate. I think they might fall outside of their main target audience.

+1 to Paul grahams essays.

Richard MöhnSep 6 20223

Addendum – from https://www.manager-tools.com/2019/01/manager-tools-data-one-ones-part-1-hall-fame-guidance:

Going back to company size, we've done 3 studies comparing the effect of WO3s in small, medium, and large organizations. We have never been able to find any significant difference in R&R improvements based on organization size. We have measured statistically similar improvements in companies of less than 50, and companies greater than 100,000 employees.

WO3s … weekly one-on-ones, R&R … results and retention

This is not directly relevant to the article above, but it's about one-on-ones, which are a core MT thing and which chapter 4 of the Effective Manager book is about.

Another excerpt, talking about their data in general:

We've measured and followed tens of thousands of managers in academia, and government, and non-profits. Hospitals, and charities, and religious organizations, and medical practices, and retail firms like grocery stores and mall-based clothing chains. We've measured over a quarter of a million managers in the Fortune 500 alone.

They often say that their guidance is for 90 % of people 90 % of the time. And their goal is: ‘Every manager effective, every professional productive.’

(Since I realize that I sound like a shill for MT, I'll say again that I'm not affiliated nor have any hidden agenda. It's just that my article refers to a lot of MT material and I'm trying to add evidence for their authority.)

Richard MöhnSep 6 20221

Makes sense. I'm a bit worried that people reading this will take away: ‘We're a small shop, therefore MT doesn't apply at all.’ This is not the case and I think Howie would agree. I've never worked at a big organization and MT still has helped me a lot. I've also read and listened to a ton of non-MT material on leadership, doing work, business, processes etc. So I could well be putting MT guidance in its proper context without being aware of it.

Howie_LempelSep 6 20223

I definitely agree that takeaway would be a mistake. I think my view is more like "if the specifics of what MT says on a particular topic don't feel like they really fit your organisation, you should not feel bound to them. Especially if you're a small organisation with an unusual culture or if their advice seems to clash with conventional wisdom from other sources, especially in silicon valley.

I'd endorse their book as useful for managers at any org. A lot of the basic takeaways (especially having consistent one on ones) seem pretty robust and it would be surprising if you shouldn't do them at all.

Richard MöhnSep 6 20221

I agree. Thanks for taking the time to hash this out with me!

Charles HeOct 18 20224

This article is hard to read and the tone is strong, which makes it come off as ranty. This is bad because it seems like it has substantive content that you've thought out.

For example, the second paragraph has a few good ideas, but these have been chopped up. This makes it laborious to read:

What do you want from a hiring process? A good hire. Crucially, no bad hire. And for those people whom you haven’t hired to be mostly happy with how things went. Because you care for them.

The issues continues in the third paragraph. Even though the ideas are good, you're coming across as overbearing. This is distracting, which is especially bad since this paragraph gives the overview/purpose of the article and introduces the key org (Hirely) that you're talking about.

Sadly, you’re at risk of making a bad hire and disgruntling your other applicants if you don’t know what you’re doing. If you don’t know what you’re doing, outsourcing isn’t a solution, either, because you don’t know how to judge the actions of those you’re outsourcing to. I will demonstrate this by example of a hiring process I’ve observed as an outsider, in which the hiring firm (call them Hirely) acted in a way that would have seemed sensible to the average founder who knows little about hiring, but to me looked like blundering. Even if you don’t plan to outsource hiring, the following points are worth thinking about.

Other comments:

After this, it seems like there's with multiple sections of meta ("Added 2022-10-16" and "This is a repurposed article with a history"). These suggest serious misconduct by someone hostile to you, but this is sort of buried.
Terms like "Manager Tools" and "Hirely" are really important to you, but this requires closer reading to figure out what they really mean and most people won't push past this.
- Your views/promotion of Manager Tools seems pretty disjoint about the other issues in this post.
You've given each hyperlink it's own custom tag. This convention/process seems wildly different and seems to make a lot of extra amount of work for you?

A lot of your content is thoughtful and thinks from the perspective of the "users"/"customers".

It sounds like you have good perspectives, and small tweaks, like a summary up front, would add a lot.

Richard MöhnOct 20 20222

Thanks for taking the time to write up balanced feedback! I was surprised.

I'm starting to understand the message from your and other comments that the tone of the article is distracting from the content and even causing people to misunderstand it. When I write another article, I will take more time to work on the tone.

I knew that the introduction is written in a choppy way, but I didn't expect that it would be hard to read. Thanks for telling me that. Good point also that I should introduce key terms more explicitly. You pointing this out made me see it as another source of confusion.

One clarification about ‘promoting Manager Tools’: The reason is that I'm using arguments from authority a lot. From the authority of Manager Tools. In order for those arguments to be convincing, I need to establish that Manager Tools is reliable. That's why I write how their guidance leads to success etc. Now, arguments from authority tend to be weak (see also https://en.wikipedia.org/wiki/Argument_from_authority). If I had the time, I would write out all the arguments properly, gather data etc. But I don't have the time, unfortunately. Also, it's not crucial for my main point, particularly my original main point. My original main point is that Hirely does bad work (they're hopefully doing better now – that's why I half-assedly repurposed the article) and one can be convinced of that without being convinced by each and every sub-point.

The hyperlink thing is to comply with my own demand from an old LessWrong post: https://www.lesswrong.com/posts/ytStDZ7BaC23GvGPb/please-give-your-links-speaking-names It wasn't that much extra work because I wrote a little Clojure script to transform one Markdown file into another.

As to the summary up-front: Recently I try to structure articles in a way that the title and the table of contents can be read as the summary. In this article this got lost a bit because of the repurposing.

Jonas HallgrenOct 17 20222

Thank you for this! I'm hoping that this enables me to spend a lot less time on hiring in the future. I feel that this is a topic that could easily have taken me 3x the effort to understand if I hadn't gotten some very good resources from this post so I will definitely check out the book and again, awesome post!

Richard MöhnOct 17 20221

Thanks! I'm glad you liked it!

[comment deleted]Sep 6 20221

Deleted by Richard Möhn, 09/06/2022

Reason: This was a comment on a spam comment saying that I'm going to report it. The spam comment has been deleted, so this comment is obsolete.