Data Scientist: "Sexiest" Job of the 21st Century?

According to an article published in the Harvard Business Review last year, Data Scientist is the sexiest job of the 21st century. That's right, the guy that writes computer codes to find new insights in data (putting it simply).

Of course, this is only if you define a "sexy job" as one that is in high demand and where you can demand a sizable paycheck. Turns out, pay for data scientists is upwards of $225,000 even for people straight out of graduate school.

Did programming just become more interesting? See this infographic on the so-called Data Scientists.

 [Sexiest job of the 21st century]

Comments (22)

Sep 7, 2013

I work with a guy who does this. I think he's aiming for a trader or PM type role though. These are the same people that try to convince you that being an engineer is the best deal ever.

Sep 7, 2013

Hate to say it, but Harvard calling a data scientist sexy is like Gary Gygax (creator of Dungeons and Dragons) calling Worlds of Warcraft a game for well-adjusted jocks.

Also the data scientists and financial/industrial engineers are going to be putting HBS and the other B-schools out of business in thirty years. I don't know why they're promoting this. We have been watching the death of the service economy over the past 40 years, and while MBAs will be the last employees, and most valuable employees to succumb, they'll succumb too.

Sep 7, 2013
IlliniProgrammer:

We have been watching the death of the service economy over the past 40 years

Who is this 'we' you refer to? Services are a much larger percent of the economy than it was 40 years ago.

Sep 7, 2013
DickFuld:
IlliniProgrammer:

We have been watching the death of the service economy over the past 40 years

Who is this 'we' you refer to? Services are a much larger percent of the economy than it was 40 years ago.

40 years ago, there were these people that sat in elevators all day long and took the elevator to the floor you wanted to go to. They were called "elevator operators".

30 years ago, you didn't pump your own gas. People pumped it for you.

20 years ago, when you wanted to order from a catalog, or dialed 411, you got a live human being on the other end of the phone.

10 years ago, when you executed a trade, there was a live human being at the other end. When you wanted to get a mortgage, a human being reviewed the facts and circumstances and made a decision. An accountant probably actually filed your parents' taxes by hand. And you went to the mall to buy stuff from a sales clerk, rather than going online. All checkout counters were staffed by individuals, too.

You get my point. And I believe it shows in the numbers- a lot of service economy sectors have lost jobs over the past 10 years. Information, Retail, and Transportation are all on that list. And despite a growing population of college graduates, Business Services has only broken even:

http://www.bls.gov/emp/ep_table_201.htm

Sep 7, 2013

A data scientist isn't merely a programmer. In fact most data scientists are mediocre programmers but great at numerical computing. What I find really appealing about the field, besides the fact that it revolves around repeatedly 'finding the needle in the haystack', is that it involves combining area expertise, stats chops and programming skills, with one no less important than the other[1]. I think buyside quants were the first data scientists and there is negligible difference between the two roles. The dude the coined the term 'data scientist' while at Facebook, Jeff Hammerbacher, was a former fixed income quant at Bear.

[1] http://www.niemanlab.org/images/drew-conway-data-s...

Sep 7, 2013

I think the whole data scientist thing is over-rated. Not in the sense that analyzing big data is not going to be important - it is going to become mandatory to survive...but the fact that people assume that vendors aren't going to start offering tools/UIs that the layman with basic stats knowledge would be able to leverage. Look at Palantir or Tableau for instance.

Please don't quote Patrick Bateman.

Sep 7, 2013
DBCooper:

I think the whole data scientist thing is over-rated. Not in the sense that analyzing big data is not going to be important - it is going to become mandatory to survive...but the fact that people assume that vendors aren't going to start offering tools/UIs that the layman with basic stats knowledge would be able to leverage. Look at Palantir or Tableau for instance.

Exactly. I'm having a hard time seeing how this won't be commoditized.

Sep 7, 2013

It is already commoditized.

Sep 7, 2013

There are R and python packages to run data mining techniques. Some business Intelligence software come with those tools also. The value of a data scientist is knowing what method to use (random forest, regression, etc) and being able to make adjustments to the algorithms. And then being able to communicate that effectively and "telling the story." Also, you need to understand the problem and have some understanding of statistics to avoid the wrong conclusion, e.g. correlation and causation.

I was doing something for work where I wanted to know if there was a correlation between 2 items and lag. So X happens and days later Y happens. It was easy enough to run this in R, but I didn't know how to interpret the results. Sure, 3 day lag had the highest correlation but what does that actually mean? And what are the drawbacks of the method used that would make the result misleading? That's where you need a data scientist that knows these things.

Sep 7, 2013

The IT bubble occured a decade ago. Evidently, data engineers/scientists never conquered the world with their alleged superiority. Humility is a virtue.

Enjoy your evening

Victory at all costs, victory in spite of all terror, victory however long and hard the road may be; for without victory, there is no survival.

Winston Churchill

Sep 7, 2013
Beli3f:

The IT bubble occured a decade ago. Evidently, data engineers/scientists never conquered the world with their alleged superiority. Humility is a virtue.

Enjoy your evening

Exactly. DS is a new hype in a long trend of hypes. These are the statisticians, data miners, etc. of days past, now hired at inflated salaries thanks to the VC-istan bubble needing to provide their networks with nice jobs, or not realizing how easy the job actually is. Significance of lagged correlation, for example, is something taught in high school (or that was taught at mine, anyway) - hardly something you need a PhD in stats from Harvard for. The programming side of things is laughable and can be picked up in weeks by someone motivated and unemployed, if you want to use industry tools (do Andrew Ng's coursera course, and then teach yourself pandas or R and a bit of unix).

This being said I use the hype extensively to get visas approved for otherwise borderline programmers, since Singapore has as usual jumped onto the bandwagon and companies here are pressuring the MOM to draw more DS into the country or train them locally. It's $250k in Silicon Valley for well connected Stanford kids, over here I pick up 30 year olds with killer code for $60k.

If you want to run basic stats tests on your data to figure out what's going on, learn stats, it won't take you very long and it's free.

If you want to build systems like recommendation engines (the other part, allegedly, of DS) then hire a programmer comfortable in something like Clojure or Haskell, not just because these are great languages to build reliable fast systems in quickly, but because the kind of people who willingly choose these less employable languages over Java are perfect for this kind of job.

Where the DS hype surprised me is in reinforcing the fact that most management teams barely look at data in even the simplest ways, since apparently it is now news that you should test for significance before making a decision based on an apparent trend.

Sep 7, 2013

Guys this is basically my job, but it's hardly sexy. It's about as sexy as playing dungeons and dragons. The fact that Harvard is calling this job sexy makes all of this even more funny to me. I'm laughing at this about as hard as Eddie would be laughing if HBS called him an elite blue-blooded east coast douchebag.

Next they are going to call insurance actuaries exciting people who love taking risk.

But seriously, a data scientist is an OK gig. An insurance actuary has an even better gig. He's just afraid of spiders. And dirt. And driving a car without a seatbelt. And data scientists are about as sexy as the weird table full of D&D, Magic the Gathering, and Warhammer 40,000 nerds from high school who know exactly how a nuclear reactor operates and find that more fun to talk about than girls.

Back to my rusty Honda. Actually my Level 9 Cleric needs to cast a flamestrike spell on the air elemental first. Crap, those things have good reflex saves.

Sep 7, 2013

Yeah, I'm under the impression that the % of full-time employment represented by service sectors, which (would include both programming and financial services) has only increased. ;)

Sep 8, 2013

There is a difference between "service economy" and a "knowledge based economy". North America is moving more and more towards a "knowledge based economy".

Sep 8, 2013

There's an overall trend of convergence toward completely data-driven decision making at a strategic level. Data scientists and statisticians are becoming more and more prevalent in strategy groups, and firms like Correlation Ventures and Google Ventures use statistical models to find good venture capital investments. This appears to be a trend that's just starting as well. For those who don't know - VC is probably one of the most qualitative investment strategies in existence.

Sep 8, 2013

I'd agree with a lot of the dismissive comments a few months ago and to a certain extent still do. But recently I decided to take part in a data science competition for the fun of it. The competition was basically designing a predictive analytics algorithm. I had 2 undergrad courses understanding of stats and a year of R on the job.

I actually did quite well, reaching a decent finish at top 10% through a massive feature engineering, and a final ensemble blend of 6 models iirc. It was taught mainly after taking Andrew Ng's Machine Learning Course on Coursera. And hell, screw this talk of commoditized data science tools, the open-source tools are way better and still easy to use. Finding the R packages and figuring out how to integrate them was quite easy actually.

So far this pretty much confirms what a lot of the dismissive posters are saying... BUT the winners and top 10ish of that competition were on a whole other level. They were typically PhDs or had years of experience or were already top Data Scientists themselves. They had all the skills that people say make up a ideal Data Scientist. The interesting thing however was that they won by squeezing out a measely 1-2% in performance compared to someone like me. That doesn't sound impressive but after going through that competition, I was seriously impressed at their skills and it takes a thorough understanding of the underlying models that are used brainlessly by many now to accomplish their feat.

Of course, for many (most probably) companies and organizations, that extra squeeze in performance does not really matter and it's better if they just hire the typical statistician, data miner, etc. of days past. In certain critical industries and tasks, that extra oomph might matter more and that's where a data scientist would be needed.

73 good sir!

Sep 8, 2013
AgentBishop:

The interesting thing however was that they won by squeezing out a measely 1-2% in performance compared to someone like me. That doesn't sound impressive but after going through that competition, I was seriously impressed at their skills and it takes a thorough understanding of the underlying models that are used brainlessly by many now to accomplish their feat.

That's another point. You took a 20h online class, have some CS skills, and you get within 1-2% - in a Kaggle type contest, which is gameable unlike most real life problems - of the absolute top guys in the field.

When I look at this, and I can hire someone like you (actually, with much more ML experience) for $60k, and the top guy gets me 1% extra for $250k, on top of which I have to worry that he's top ranked on Kaggle so might leave at any point, costing me more hiring fees, a new learning period, etc. it's not a very tough choice.

On top of this, let's say you built an engine that boosts average basket by 15%, and him 17%. I probably wouldn't detect the difference in an A/B test. For example, one of my engines is doing +20% in one country and +15% in another. It's then very hard to go to management and say "need to spend almost 5 times as much on this guy for that extra 1%". I don't know whether it would be different in big corp (where 1% is a lot of money) but I think there are other issues there like 10 layers of management ossifying any decision making and code changes and improvements, together with being forced to use inferior, outdated tools because that's all you had when you started (cf C++ at Google).

I agree that open source tools are superior to a lot of the commoditized stuff. We tried a few commercial solutions for rec engines, none of which had any impact whatsoever on sales. This being said, pulling a library effortlessly and rolling it out without having to think much about its content is my definition of commoditized content, it is like Java programmers with Eclipse, they don't have to think much and just push bits around until it works.

Sep 8, 2013

Agree that it's too hyped up these days (sort of like quant finance pre-2008).
What's contributing to the hype, in my opinion, is the low barrier to entry. Anyone can take Andrew Ng's ML course online, participate in a Kaggle competition, and call himself a "data scientist" (and in fact, many do). But there is a huge difference between this "data scientist" and a truly expert data scientist who can (for example) implement a parallelized deep neural network over GPU clusters.

The former will be making 60K-100K at a large corporation (banks/insurance etc.) being "part of the wheel". The latter will be making 200K at Google/Facebook doing cutting edge research.

As previous posters have noted, the difference in performance between a world-class data scientist and an average one (such as myself) is not that much. And in traditional corporate settings, your senior executives will not care that you improved the lift of the previous model by 5% using the state-of-the-art algorithm. But at Google, they will most definitely care because they know that's where the $$ is.

As a side note, I am a little bit annoyed by the media's portrayal of data scientists as math wizards. For most aspects of data science, the mathematics is actually quite shallow. It's the programming aspect that's the differentiator.

And I find it hilarious that the anatomy if a data scientist pic is that of a woman.

Sep 9, 2013
is-t:

Agree that it's too hyped up these days (sort of like quant finance pre-2008).

What's contributing to the hype, in my opinion, is the low barrier to entry. Anyone can take Andrew Ng's ML course online, participate in a Kaggle competition, and call himself a "data scientist" (and in fact, many do). But there is a huge difference between this "data scientist" and a truly expert data scientist who can (for example) implement a parallelized deep neural network over GPU clusters.

The former will be making 60K-100K at a large corporation (banks/insurance etc.) being "part of the wheel". The latter will be making 200K at Google/Facebook doing cutting edge research.

Ah, but these are computer scientists working for Google and Facebook's research department, who happen to work on problems currently branded as DS. In the 80s, parallel processing was called parallel processing, although machine learning was known as AI. These guys would work in Lisp instead of Python or C++ or Haskell (where parallelism is trivial).

And I find it hilarious that the anatomy if a data scientist pic is that of a woman.

Hilary... :P

Sep 8, 2013
Comment