Data Scientist: "Sexiest" Job of the 21st Century?

According to an article published in the Harvard Business Review last year, Data Scientist is the sexiest job of the 21st century. That's right, the guy that writes computer codes to find new insights in data (putting it simply).

Of course, this is only if you define a "sexy job" as one that is in high demand and where you can demand a sizable paycheck. Turns out, pay for data scientists is upwards of $225,000 even for people straight out of graduate school.

Did programming just become more interesting? See this infographic on the so-called Data Scientists.

 [Sexiest job of the 21st century]

 

Hate to say it, but Harvard calling a data scientist sexy is like Gary Gygax (creator of Dungeons and Dragons) calling Worlds of Warcraft a game for well-adjusted jocks.

Also the data scientists and financial/industrial engineers are going to be putting HBS and the other B-schools out of business in thirty years. I don't know why they're promoting this. We have been watching the death of the service economy over the past 40 years, and while MBAs will be the last employees, and most valuable employees to succumb, they'll succumb too.

 
DickFuld:
IlliniProgrammer:

We have been watching the death of the service economy over the past 40 years

Who is this 'we' you refer to? Services are a much larger percent of the economy than it was 40 years ago.

40 years ago, there were these people that sat in elevators all day long and took the elevator to the floor you wanted to go to. They were called "elevator operators".

30 years ago, you didn't pump your own gas. People pumped it for you.

20 years ago, when you wanted to order from a catalog, or dialed 411, you got a live human being on the other end of the phone.

10 years ago, when you executed a trade, there was a live human being at the other end. When you wanted to get a mortgage, a human being reviewed the facts and circumstances and made a decision. An accountant probably actually filed your parents' taxes by hand. And you went to the mall to buy stuff from a sales clerk, rather than going online. All checkout counters were staffed by individuals, too.

You get my point. And I believe it shows in the numbers- a lot of service economy sectors have lost jobs over the past 10 years. Information, Retail, and Transportation are all on that list. And despite a growing population of college graduates, Business Services has only broken even:

http://www.bls.gov/emp/ep_table_201.htm

 

A data scientist isn't merely a programmer. In fact most data scientists are mediocre programmers but great at numerical computing. What I find really appealing about the field, besides the fact that it revolves around repeatedly 'finding the needle in the haystack', is that it involves combining area expertise, stats chops and programming skills, with one no less important than the other1. I think buyside quants were the first data scientists and there is negligible difference between the two roles. The dude the coined the term 'data scientist' while at Facebook, Jeff Hammerbacher, was a former fixed income quant at Bear.

1 http://www.niemanlab.org/images/drew-conway-data-science-venn-diagram.j…

 

I think the whole data scientist thing is over-rated. Not in the sense that analyzing big data is not going to be important - it is going to become mandatory to survive...but the fact that people assume that vendors aren't going to start offering tools/UIs that the layman with basic stats knowledge would be able to leverage. Look at Palantir or Tableau for instance.

Please don't quote Patrick Bateman.
 
DBCooper:

I think the whole data scientist thing is over-rated. Not in the sense that analyzing big data is not going to be important - it is going to become mandatory to survive...but the fact that people assume that vendors aren't going to start offering tools/UIs that the layman with basic stats knowledge would be able to leverage. Look at Palantir or Tableau for instance.

Exactly. I'm having a hard time seeing how this won't be commoditized.

 

There are R and python packages to run data mining techniques. Some business Intelligence software come with those tools also. The value of a data scientist is knowing what method to use (random forest, regression, etc) and being able to make adjustments to the algorithms. And then being able to communicate that effectively and "telling the story." Also, you need to understand the problem and have some understanding of statistics to avoid the wrong conclusion, e.g. correlation and causation.

I was doing something for work where I wanted to know if there was a correlation between 2 items and lag. So X happens and days later Y happens. It was easy enough to run this in R, but I didn't know how to interpret the results. Sure, 3 day lag had the highest correlation but what does that actually mean? And what are the drawbacks of the method used that would make the result misleading? That's where you need a data scientist that knows these things.

 

The IT bubble occured a decade ago. Evidently, data engineers/scientists never conquered the world with their alleged superiority. Humility is a virtue.

Enjoy your evening

Victory at all costs, victory in spite of all terror, victory however long and hard the road may be; for without victory, there is no survival. Winston Churchill
 
Beli3f:

The IT bubble occured a decade ago. Evidently, data engineers/scientists never conquered the world with their alleged superiority. Humility is a virtue.

Enjoy your evening

Exactly. DS is a new hype in a long trend of hypes. These are the statisticians, data miners, etc. of days past, now hired at inflated salaries thanks to the VC-istan bubble needing to provide their networks with nice jobs, or not realizing how easy the job actually is. Significance of lagged correlation, for example, is something taught in high school (or that was taught at mine, anyway) - hardly something you need a PhD in stats from Harvard for. The programming side of things is laughable and can be picked up in weeks by someone motivated and unemployed, if you want to use industry tools (do Andrew Ng's coursera course, and then teach yourself pandas or R and a bit of unix).

This being said I use the hype extensively to get visas approved for otherwise borderline programmers, since Singapore has as usual jumped onto the bandwagon and companies here are pressuring the MOM to draw more DS into the country or train them locally. It's $250k in Silicon Valley for well connected Stanford kids, over here I pick up 30 year olds with killer code for $60k.

If you want to run basic stats tests on your data to figure out what's going on, learn stats, it won't take you very long and it's free.

If you want to build systems like recommendation engines (the other part, allegedly, of DS) then hire a programmer comfortable in something like Clojure or Haskell, not just because these are great languages to build reliable fast systems in quickly, but because the kind of people who willingly choose these less employable languages over Java are perfect for this kind of job.

Where the DS hype surprised me is in reinforcing the fact that most management teams barely look at data in even the simplest ways, since apparently it is now news that you should test for significance before making a decision based on an apparent trend.

 
Best Response

Guys this is basically my job, but it's hardly sexy. It's about as sexy as playing dungeons and dragons. The fact that Harvard is calling this job sexy makes all of this even more funny to me. I'm laughing at this about as hard as Eddie would be laughing if HBS called him an elite blue-blooded east coast douchebag.

Next they are going to call insurance actuaries exciting people who love taking risk.

But seriously, a data scientist is an OK gig. An insurance actuary has an even better gig. He's just afraid of spiders. And dirt. And driving a car without a seatbelt. And data scientists are about as sexy as the weird table full of D&D, Magic the Gathering, and Warhammer 40,000 nerds from high school who know exactly how a nuclear reactor operates and find that more fun to talk about than girls.

Back to my rusty Honda. Actually my Level 9 Cleric needs to cast a flamestrike spell on the air elemental first. Crap, those things have good reflex saves.

 

There's an overall trend of convergence toward completely data-driven decision making at a strategic level. Data scientists and statisticians are becoming more and more prevalent in strategy groups, and firms like Correlation Ventures and Google Ventures use statistical models to find good venture capital investments. This appears to be a trend that's just starting as well. For those who don't know - VC is probably one of the most qualitative investment strategies in existence.

 

I'd agree with a lot of the dismissive comments a few months ago and to a certain extent still do. But recently I decided to take part in a data science competition for the fun of it. The competition was basically designing a predictive analytics algorithm. I had 2 undergrad courses understanding of stats and a year of R on the job.

I actually did quite well, reaching a decent finish at top 10% through a massive feature engineering, and a final ensemble blend of 6 models iirc. It was taught mainly after taking Andrew Ng's Machine Learning Course on Coursera. And hell, screw this talk of commoditized data science tools, the open-source tools are way better and still easy to use. Finding the R packages and figuring out how to integrate them was quite easy actually.

So far this pretty much confirms what a lot of the dismissive posters are saying... BUT the winners and top 10ish of that competition were on a whole other level. They were typically PhDs or had years of experience or were already top Data Scientists themselves. They had all the skills that people say make up a ideal Data Scientist. The interesting thing however was that they won by squeezing out a measely 1-2% in performance compared to someone like me. That doesn't sound impressive but after going through that competition, I was seriously impressed at their skills and it takes a thorough understanding of the underlying models that are used brainlessly by many now to accomplish their feat.

Of course, for many (most probably) companies and organizations, that extra squeeze in performance does not really matter and it's better if they just hire the typical statistician, data miner, etc. of days past. In certain critical industries and tasks, that extra oomph might matter more and that's where a data scientist would be needed.

73 good sir!
 
AgentBishop:

The interesting thing however was that they won by squeezing out a measely 1-2% in performance compared to someone like me. That doesn't sound impressive but after going through that competition, I was seriously impressed at their skills and it takes a thorough understanding of the underlying models that are used brainlessly by many now to accomplish their feat.

That's another point. You took a 20h online class, have some CS skills, and you get within 1-2% - in a Kaggle type contest, which is gameable unlike most real life problems - of the absolute top guys in the field.

When I look at this, and I can hire someone like you (actually, with much more ML experience) for $60k, and the top guy gets me 1% extra for $250k, on top of which I have to worry that he's top ranked on Kaggle so might leave at any point, costing me more hiring fees, a new learning period, etc. it's not a very tough choice.

On top of this, let's say you built an engine that boosts average basket by 15%, and him 17%. I probably wouldn't detect the difference in an A/B test. For example, one of my engines is doing +20% in one country and +15% in another. It's then very hard to go to management and say "need to spend almost 5 times as much on this guy for that extra 1%". I don't know whether it would be different in big corp (where 1% is a lot of money) but I think there are other issues there like 10 layers of management ossifying any decision making and code changes and improvements, together with being forced to use inferior, outdated tools because that's all you had when you started (cf C++ at Google).

I agree that open source tools are superior to a lot of the commoditized stuff. We tried a few commercial solutions for rec engines, none of which had any impact whatsoever on sales. This being said, pulling a library effortlessly and rolling it out without having to think much about its content is my definition of commoditized content, it is like Java programmers with Eclipse, they don't have to think much and just push bits around until it works.

 

Agree that it's too hyped up these days (sort of like quant finance pre-2008). What's contributing to the hype, in my opinion, is the low barrier to entry. Anyone can take Andrew Ng's ML course online, participate in a Kaggle competition, and call himself a "data scientist" (and in fact, many do). But there is a huge difference between this "data scientist" and a truly expert data scientist who can (for example) implement a parallelized deep neural network over GPU clusters.

The former will be making 60K-100K at a large corporation (banks/insurance etc.) being "part of the wheel". The latter will be making 200K at Google/Facebook doing cutting edge research.

As previous posters have noted, the difference in performance between a world-class data scientist and an average one (such as myself) is not that much. And in traditional corporate settings, your senior executives will not care that you improved the lift of the previous model by 5% using the state-of-the-art algorithm. But at Google, they will most definitely care because they know that's where the $$ is.

As a side note, I am a little bit annoyed by the media's portrayal of data scientists as math wizards. For most aspects of data science, the mathematics is actually quite shallow. It's the programming aspect that's the differentiator.

And I find it hilarious that the anatomy if a data scientist pic is that of a woman.

 
is-t:

Agree that it's too hyped up these days (sort of like quant finance pre-2008).
What's contributing to the hype, in my opinion, is the low barrier to entry. Anyone can take Andrew Ng's ML course online, participate in a Kaggle competition, and call himself a "data scientist" (and in fact, many do). But there is a huge difference between this "data scientist" and a truly expert data scientist who can (for example) implement a parallelized deep neural network over GPU clusters.

The former will be making 60K-100K at a large corporation (banks/insurance etc.) being "part of the wheel". The latter will be making 200K at Google/Facebook doing cutting edge research.

Ah, but these are computer scientists working for Google and Facebook's research department, who happen to work on problems currently branded as DS. In the 80s, parallel processing was called parallel processing, although machine learning was known as AI. These guys would work in Lisp instead of Python or C++ or Haskell (where parallelism is trivial).

And I find it hilarious that the anatomy if a data scientist pic is that of a woman.

Hilary... :P
 

Ut dolorem omnis quo et cum voluptatibus corrupti ab. Sed consequatur libero enim quia voluptatem reiciendis. Ut eligendi et harum cumque sit nihil ullam.

Suscipit corporis rerum quibusdam illum veniam. Est quae voluptas commodi earum ratione ea. Fugit error harum accusamus ratione cumque et libero.

Perferendis fugiat nesciunt dolore vitae facere atque. Labore voluptatum quae sunt eligendi. Id cumque eos atque facilis non sint. Debitis ducimus est suscipit.

 

Dolores id sed eum ab similique impedit magni. Aut repellat accusantium qui quod. Repellendus maiores minus fuga sapiente.

Blanditiis nobis dolorem minima cumque eos assumenda laudantium blanditiis. Sit cum officia esse adipisci dicta. Aut accusantium quod dolor consequuntur voluptatibus sit. Aut ratione quidem officiis in.

Ut tempora eum officia aliquid et ad. Et dolorem et deleniti nemo.

Laborum iusto corporis molestiae repellendus accusantium aut. Numquam alias dolore aut ad. Distinctio voluptates dolor iste sit error ipsam beatae. Illum ad omnis quod voluptatum nulla. Possimus voluptatem aliquid et saepe est voluptatem tenetur. Quasi ipsum at asperiores unde. Aliquam non reprehenderit ut aut.

Career Advancement Opportunities

March 2024 Investment Banking

  • Jefferies & Company 02 99.4%
  • Goldman Sachs 19 98.8%
  • Harris Williams & Co. (++) 98.3%
  • Lazard Freres 02 97.7%
  • JPMorgan Chase 03 97.1%

Overall Employee Satisfaction

March 2024 Investment Banking

  • Harris Williams & Co. 18 99.4%
  • JPMorgan Chase 10 98.8%
  • Lazard Freres 05 98.3%
  • Morgan Stanley 07 97.7%
  • William Blair 03 97.1%

Professional Growth Opportunities

March 2024 Investment Banking

  • Lazard Freres 01 99.4%
  • Jefferies & Company 02 98.8%
  • Goldman Sachs 17 98.3%
  • Moelis & Company 07 97.7%
  • JPMorgan Chase 05 97.1%

Total Avg Compensation

March 2024 Investment Banking

  • Director/MD (5) $648
  • Vice President (19) $385
  • Associates (86) $261
  • 3rd+ Year Analyst (13) $181
  • Intern/Summer Associate (33) $170
  • 2nd Year Analyst (66) $168
  • 1st Year Analyst (202) $159
  • Intern/Summer Analyst (144) $101
notes
16 IB Interviews Notes

“... there’s no excuse to not take advantage of the resources out there available to you. Best value for your $ are the...”

Leaderboard

1
redever's picture
redever
99.2
2
BankonBanking's picture
BankonBanking
99.0
3
Betsy Massar's picture
Betsy Massar
99.0
4
Secyh62's picture
Secyh62
99.0
5
kanon's picture
kanon
98.9
6
DrApeman's picture
DrApeman
98.9
7
dosk17's picture
dosk17
98.9
8
GameTheory's picture
GameTheory
98.9
9
CompBanker's picture
CompBanker
98.9
10
Jamoldo's picture
Jamoldo
98.8
success
From 10 rejections to 1 dream investment banking internship

“... I believe it was the single biggest reason why I ended up with an offer...”