Stop Hiring Data Scientists

We are hiring armies of Data Scientists when what we need are armies of analysts and engineers (or developer or however else you self-identify).

At the moment there’s a colossal mismatch between what hiring managers and executives want delivered and who they’re hiring to do the job. There are still lots of situations where a Data Scientist is the right person for the job, but we need to get a better understanding of what those situations are.

The world is very data rich and insights poor. We need more passionate people mining and delivering insights and fewer people doing surface level analysis that is ungrounded in real science. Many times, an analyst with strong knowledge of the business can deliver far more insights than a Data Scientist who is new to the business. For many true “Data Science” roles, we’re probably well-suited with a Statistician. For most mislabeled Data Science roles, we’re probably better off with an analyst or engineer.

  1. Stop hiring Data Scientists when you need an analyst.

  2. Stop hiring Data Scientists when you need an engineer.

Sounds simple! Let’s dive in.

Stop hiring Data Scientists when you need an analyst.

I’m not entirely sure what the right ratio of analysts to scientists is on a data team. I think it’s totally dependent on the problem you’re trying to solve. What I do know is that hiring 5 Data Scientists to build basic reports on a business’s baseline metrics is a misguided pursuit.

There are lots of proper Statisticians out there who do great work, day in and day out. We really never hear about those folks anymore in the data world. Lots of the really good Data Scientists out there are really just statisticians with a new title. Others are a mix of stats, engineering, math, and programming that are a jack of all trades, master of none.

It’s no secret why everyone is calling themselves a Data Scientist in 2020. Imagine you’re a Statistician who does great, honest work (science?), day in and day out. According to Indeed, Glassdoor, etc, your salary is somewhere between $70k and $110k each year. And then you look over your shoulder and see a Data Scientist bringing in somewhere between $90k and $165k each year. And in your heart you know that “Data Scientist” doesn’t have nearly the Statistics chops that you’ve got. Maybe they can write a little bit more Python than you or maybe a little bit of R, probably nothing you couldn’t pick up in a few months... Maybe they can do SELECT * FROM Table; and get some data but nothing too sophisticated.

So you go to a few company websites, a few job boards, email a few friends and colleagues, and all of a sudden you’ve landed a shiny new job at a different company where you’re now the proud wearer of the title “Data Scientist.”

Congrats! Conveniently, you also came in with your shiny graduate degree, your prestigious research background, all of a sudden your $95k salary at your previous company is $130k at your new company. Wow! What an upgrade. And then your boss meets with you and assigns you your work plan. You toil over the work and very quickly realize your day-to-day has gone from a highly skilled statistician workload to that of a SQL warrior. All of a sudden you’re spending 90% of your time building reports, delivering PowerPoints, and building Power BI dashboards to share daily user metrics. You half laugh, half cry as you realize you’re now doing the work of an entry-level data analyst. 10 years of graduate school and another 5 as a postdoc to spend your day writing a few SQL queries and maintain old dashboards.

That’s not to say you don’t appreciate the work of analysts. In fact, you believe analysts are foundational to the business, without them the data world wouldn’t go round. If the analysts of the world suddenly stopped showing up to work, leadership would lose their minds; many worlds would collapse into excel spreadsheets from hell. You just know that at your core you’re a statistician, and you prefer to provide the big win at the end of a long research cycle or small one-off consultancies, rather than a bunch of small wins day to day or week to week as an analyst. One isn’t greater than the other; it’s just a different mindset.

And yet again, Data Science has claimed another highly skilled worker and given them the job of an analyst with the salary of a mid-level manager. This story isn’t unique, and plenty of people from various backgrounds can relate to this experience.

Not only has this statistician found herself in a win-lose situation (win because she’s making lots of money, lose because she hates her job), the business has found itself in a lose-lose situation. The business is paying $130k for a role they could have filled with a highly skilled analyst for $75–100k. They’re also getting a lower quality of work because the statistician just isn’t interested in doing the work. The statistician is clearly overqualified for this work, but studies show that overqualified workers underperform when their heart isn’t in the work.

In today’s world, an analyst many times is just someone humble enough to not label themself a Data Scientist. And many “Data Scientists” are analysts with enough confidence to slap a more prestigious role on their nameplate. Heck, their boss can’t tell the difference, so why not change their title and ask for $40k more/yr?

Stop hiring Data Scientists when you need engineers.

I’m hearing story after story come out of data teams that reflect this second reality. Many data teams are at the core of building applications that leverage state of the art models to deliver some novel use-case. Let’s imagine a data team that is building a text classifier using BERT. Imagine this is a core feature of a soon to be deployed web application.

It’s hilarious to me the amount of team building that involves gathering a bunch of Data Scientists in a room to build this sort of application. What you clearly need is a team of engineers — maybe some front-end devs, a few back-end folks, maybe a ML Engineer or two whose worked with these models in the past. The last thing you need is a bunch of Data Scientists running around trying to tell you how the model works or sleeping on the desk while they await the opportunity to run model validation.

It probably doesn’t hurt to have 1 Data Scientist in the room. However, you probably don’t need someone with intermediate R knowledge hopping in at every turn. You need someone with really great statistical chops that can do really great science and keep the team honest as they deliver a working solution. Leave the code writing to the engineers. A dashboard isn’t going to deliver inference at scale.

Data Scientists are not here to build you scalable applications, build you front-ends, build your data pipeline, or really any of the tasks that are foundational to building a shippable piece of software. Where a good Data Scientist can chip in is in making sure you’re performing good science, making sure the application has the capability needed to deliver real-world results, etc… They’re not there to write performant code. They’re essential in certain scenarios, a hinderance in others. Good management is essential for delegating the work and building these teams properly.

In today’s world, lots of developers and engineers are finding it convenient to label themselves a Data Scientist. They’ve got a CS undergrad and maybe even an MS in CS. Their Statistics background is a few quick Wikipedia articles or maybe an Engineering Statistics course. But meh, they don’t really need to understand it because their boss just wants an application that works by using existing technologies and pre-built Stats libraries. So they slap the Data Scientist role on their nameplate and try and command a few extra bucks. And they’re a great engineer so they’re gonna ship some really great software.

The problem is that engineer’s (labeled a Data Scientist) boss now thinks they need to go hire 3 more Data Scientists to deliver more of the same end-product. All of a sudden they go out and hire some really great Stats people who 3 months later realize they’re trying to do an engineer’s job with a Statistician’s background. Fast forward a little while and they’ve lost interest and it’s a win-lose (win because they’re making a great salary, lose because they hate their job) for them and a lose-lose for the business.

Sound familiar?

by Luke Posey

Katie Reynolds-Da Silva