Data Scientist: "Sexiest" Job of the 21st Century?
According to an article published in the Harvard Business Review last year, Data Scientist is the sexiest job of the 21st century. That's right, the guy that writes computer codes to find new insights in data (putting it simply).
Of course, this is only if you define a "sexy job" as one that is in high demand and where you can demand a sizable paycheck. Turns out, pay for data scientists is upwards of $225,000 even for people straight out of graduate school.
Did programming just become more interesting? See this infographic on the so-called Data Scientists.
I work with a guy who does this. I think he's aiming for a trader or PM type role though. These are the same people that try to convince you that being an engineer is the best deal ever.
Hate to say it, but Harvard calling a data scientist sexy is like Gary Gygax (creator of Dungeons and Dragons) calling Worlds of Warcraft a game for well-adjusted jocks.
Also the data scientists and financial/industrial engineers are going to be putting HBS and the other B-schools out of business in thirty years. I don't know why they're promoting this. We have been watching the death of the service economy over the past 40 years, and while MBAs will be the last employees, and most valuable employees to succumb, they'll succumb too.
30 years ago, you didn't pump your own gas. People pumped it for you.
20 years ago, when you wanted to order from a catalog, or dialed 411, you got a live human being on the other end of the phone.
10 years ago, when you executed a trade, there was a live human being at the other end. When you wanted to get a mortgage, a human being reviewed the facts and circumstances and made a decision. An accountant probably actually filed your parents' taxes by hand. And you went to the mall to buy stuff from a sales clerk, rather than going online. All checkout counters were staffed by individuals, too.
You get my point. And I believe it shows in the numbers- a lot of service economy sectors have lost jobs over the past 10 years. Information, Retail, and Transportation are all on that list. And despite a growing population of college graduates, Business Services has only broken even:
http://www.bls.gov/emp/ep_table_201.htm
A data scientist isn't merely a programmer. In fact most data scientists are mediocre programmers but great at numerical computing. What I find really appealing about the field, besides the fact that it revolves around repeatedly 'finding the needle in the haystack', is that it involves combining area expertise, stats chops and programming skills, with one no less important than the other1. I think buyside quants were the first data scientists and there is negligible difference between the two roles. The dude the coined the term 'data scientist' while at Facebook, Jeff Hammerbacher, was a former fixed income quant at Bear.
1 http://www.niemanlab.org/images/drew-conway-data-science-venn-diagram.j…
I think the whole data scientist thing is over-rated. Not in the sense that analyzing big data is not going to be important - it is going to become mandatory to survive...but the fact that people assume that vendors aren't going to start offering tools/UIs that the layman with basic stats knowledge would be able to leverage. Look at Palantir or Tableau for instance.
Exactly. I'm having a hard time seeing how this won't be commoditized.
It is already commoditized.
There are R and python packages to run data mining techniques. Some business Intelligence software come with those tools also. The value of a data scientist is knowing what method to use (random forest, regression, etc) and being able to make adjustments to the algorithms. And then being able to communicate that effectively and "telling the story." Also, you need to understand the problem and have some understanding of statistics to avoid the wrong conclusion, e.g. correlation and causation.
I was doing something for work where I wanted to know if there was a correlation between 2 items and lag. So X happens and days later Y happens. It was easy enough to run this in R, but I didn't know how to interpret the results. Sure, 3 day lag had the highest correlation but what does that actually mean? And what are the drawbacks of the method used that would make the result misleading? That's where you need a data scientist that knows these things.
The IT bubble occured a decade ago. Evidently, data engineers/scientists never conquered the world with their alleged superiority. Humility is a virtue.
Enjoy your evening
This being said I use the hype extensively to get visas approved for otherwise borderline programmers, since Singapore has as usual jumped onto the bandwagon and companies here are pressuring the MOM to draw more DS into the country or train them locally. It's $250k in Silicon Valley for well connected Stanford kids, over here I pick up 30 year olds with killer code for $60k.
If you want to run basic stats tests on your data to figure out what's going on, learn stats, it won't take you very long and it's free.
If you want to build systems like recommendation engines (the other part, allegedly, of DS) then hire a programmer comfortable in something like Clojure or Haskell, not just because these are great languages to build reliable fast systems in quickly, but because the kind of people who willingly choose these less employable languages over Java are perfect for this kind of job.
Where the DS hype surprised me is in reinforcing the fact that most management teams barely look at data in even the simplest ways, since apparently it is now news that you should test for significance before making a decision based on an apparent trend.
Guys this is basically my job, but it's hardly sexy. It's about as sexy as playing dungeons and dragons. The fact that Harvard is calling this job sexy makes all of this even more funny to me. I'm laughing at this about as hard as Eddie would be laughing if HBS called him an elite blue-blooded east coast douchebag.
Next they are going to call insurance actuaries exciting people who love taking risk.
But seriously, a data scientist is an OK gig. An insurance actuary has an even better gig. He's just afraid of spiders. And dirt. And driving a car without a seatbelt. And data scientists are about as sexy as the weird table full of D&D, Magic the Gathering, and Warhammer 40,000 nerds from high school who know exactly how a nuclear reactor operates and find that more fun to talk about than girls.
Back to my rusty Honda. Actually my Level 9 Cleric needs to cast a flamestrike spell on the air elemental first. Crap, those things have good reflex saves.
Yeah, I'm under the impression that the % of full-time employment represented by service sectors, which (would include both programming and financial services) has only increased. ;)
There is a difference between "service economy" and a "knowledge based economy". North America is moving more and more towards a "knowledge based economy".
There's an overall trend of convergence toward completely data-driven decision making at a strategic level. Data scientists and statisticians are becoming more and more prevalent in strategy groups, and firms like Correlation Ventures and Google Ventures use statistical models to find good venture capital investments. This appears to be a trend that's just starting as well. For those who don't know - VC is probably one of the most qualitative investment strategies in existence.
I'd agree with a lot of the dismissive comments a few months ago and to a certain extent still do. But recently I decided to take part in a data science competition for the fun of it. The competition was basically designing a predictive analytics algorithm. I had 2 undergrad courses understanding of stats and a year of R on the job.
I actually did quite well, reaching a decent finish at top 10% through a massive feature engineering, and a final ensemble blend of 6 models iirc. It was taught mainly after taking Andrew Ng's Machine Learning Course on Coursera. And hell, screw this talk of commoditized data science tools, the open-source tools are way better and still easy to use. Finding the R packages and figuring out how to integrate them was quite easy actually.
So far this pretty much confirms what a lot of the dismissive posters are saying... BUT the winners and top 10ish of that competition were on a whole other level. They were typically PhDs or had years of experience or were already top Data Scientists themselves. They had all the skills that people say make up a ideal Data Scientist. The interesting thing however was that they won by squeezing out a measely 1-2% in performance compared to someone like me. That doesn't sound impressive but after going through that competition, I was seriously impressed at their skills and it takes a thorough understanding of the underlying models that are used brainlessly by many now to accomplish their feat.
Of course, for many (most probably) companies and organizations, that extra squeeze in performance does not really matter and it's better if they just hire the typical statistician, data miner, etc. of days past. In certain critical industries and tasks, that extra oomph might matter more and that's where a data scientist would be needed.
When I look at this, and I can hire someone like you (actually, with much more ML experience) for $60k, and the top guy gets me 1% extra for $250k, on top of which I have to worry that he's top ranked on Kaggle so might leave at any point, costing me more hiring fees, a new learning period, etc. it's not a very tough choice.
On top of this, let's say you built an engine that boosts average basket by 15%, and him 17%. I probably wouldn't detect the difference in an A/B test. For example, one of my engines is doing +20% in one country and +15% in another. It's then very hard to go to management and say "need to spend almost 5 times as much on this guy for that extra 1%". I don't know whether it would be different in big corp (where 1% is a lot of money) but I think there are other issues there like 10 layers of management ossifying any decision making and code changes and improvements, together with being forced to use inferior, outdated tools because that's all you had when you started (cf C++ at Google).
I agree that open source tools are superior to a lot of the commoditized stuff. We tried a few commercial solutions for rec engines, none of which had any impact whatsoever on sales. This being said, pulling a library effortlessly and rolling it out without having to think much about its content is my definition of commoditized content, it is like Java programmers with Eclipse, they don't have to think much and just push bits around until it works.
Agree that it's too hyped up these days (sort of like quant finance pre-2008). What's contributing to the hype, in my opinion, is the low barrier to entry. Anyone can take Andrew Ng's ML course online, participate in a Kaggle competition, and call himself a "data scientist" (and in fact, many do). But there is a huge difference between this "data scientist" and a truly expert data scientist who can (for example) implement a parallelized deep neural network over GPU clusters.
The former will be making 60K-100K at a large corporation (banks/insurance etc.) being "part of the wheel". The latter will be making 200K at Google/Facebook doing cutting edge research.
As previous posters have noted, the difference in performance between a world-class data scientist and an average one (such as myself) is not that much. And in traditional corporate settings, your senior executives will not care that you improved the lift of the previous model by 5% using the state-of-the-art algorithm. But at Google, they will most definitely care because they know that's where the $$ is.
As a side note, I am a little bit annoyed by the media's portrayal of data scientists as math wizards. For most aspects of data science, the mathematics is actually quite shallow. It's the programming aspect that's the differentiator.
And I find it hilarious that the anatomy if a data scientist pic is that of a woman.
selling more nappies for walmart (or getting more clicks on fbook)... not really interesting at all.
Ut dolorem omnis quo et cum voluptatibus corrupti ab. Sed consequatur libero enim quia voluptatem reiciendis. Ut eligendi et harum cumque sit nihil ullam.
Suscipit corporis rerum quibusdam illum veniam. Est quae voluptas commodi earum ratione ea. Fugit error harum accusamus ratione cumque et libero.
Perferendis fugiat nesciunt dolore vitae facere atque. Labore voluptatum quae sunt eligendi. Id cumque eos atque facilis non sint. Debitis ducimus est suscipit.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...
Dolores id sed eum ab similique impedit magni. Aut repellat accusantium qui quod. Repellendus maiores minus fuga sapiente.
Blanditiis nobis dolorem minima cumque eos assumenda laudantium blanditiis. Sit cum officia esse adipisci dicta. Aut accusantium quod dolor consequuntur voluptatibus sit. Aut ratione quidem officiis in.
Ut tempora eum officia aliquid et ad. Et dolorem et deleniti nemo.
Laborum iusto corporis molestiae repellendus accusantium aut. Numquam alias dolore aut ad. Distinctio voluptates dolor iste sit error ipsam beatae. Illum ad omnis quod voluptatum nulla. Possimus voluptatem aliquid et saepe est voluptatem tenetur. Quasi ipsum at asperiores unde. Aliquam non reprehenderit ut aut.