'Big data'

So I've been invited to interview with a financial services 'startup', specifically, it's a peer-to-peer lending company, for an analyst position. The job posting specified that experience in SQL/Tableau would be helpful but they are aware that I have limited experience in these areas.

So with that introduction out of the way, I have been informed I will be required to undertake a 'big data manipulation' exercise in Excel as part of the interview. Google hasn't been helpful in terms of precedents, so now I come to the wise heads of WSO seeking guidance.

Has anyone encountered something similar? Have you come across any guides/introductions to 'big data' manipulation in Excel?

 

Pretty sure after a few million data points Excel just freezes, so there won't be any "big data" manipulation at all. If you're working with a real statistical language you might be expected to clean the data, filter on certain conditions, etc

 

Thanks for taking the time to respond hughwattmate. Indeed, I assume the actual exercise won't deal in what is commonly described as 'big data'. However, I still think there are certain data manipulation 'tricks' I should consider reviewing in Excel.

 
ladubs111:
Of course it will continue, what do you think the talking heads are paid to do? Be objective? fuck no. they are there to draw viewers to driver ratings which then drives the affiliate fees and ad revenues for the company. Extreme political views draws viewers like Fox News.

I don't think it'll ever fully go away. My point is more that these guys really looked bad and like they've got egg on their face because the data was right up front of them and they simply chose to ignore it. At some point, their credibility goes to zero with the average voter (I'm discounting the tribal voters, here.)

At least from the perspective of campaigns themselves, I imagine candidates on both sides of the aisle would benefit from analyzing the data objectively and adjusting their operations accordingly.

Anyway. In my view, the best analysis uses hard data and draws conclusions. The soft stuff can play a part, and it's really great when a narrative is built around data, as opposed to trying to cherry pick data to build a narrative (which is what many talking heads have made a living doing.)

 
ladubs111:
Of course it will continue, what do you think the talking heads are paid to do? Be objective? fuck no. they are there to draw viewers to driver ratings which then drives the affiliate fees and ad revenues for the company. Extreme political views draws viewers like Fox News.

Exactly. No one wants to watch a "well... If this then this, and if this then this, but most likely that..."

 

you will always have more republicans respond to these polls over democrats (a million different reasons why), and that my friend is why Romney was leading in most of the polls. Plus viewer ratings over political correctness for the tv stations a least.

 

That's all great, but there was a rational argument as to why the "talking heads" thought that way. Voter turnout was truly a guessing game and many thought that oversampling of democrats occurred, which it did.

 
johnwayne7:
That's all great, but there was a rational argument as to why the "talking heads" thought that way. Voter turnout was truly a guessing game and many thought that oversampling of democrats occurred, which it did.

Based on what, exactly? Cherry picking data to fit a worldview is not rational, it's wishful thinking.

 

What about no. 3?

3. People who run campaigns and know how the system actually works*, as opposed to no. 1 who are primarily propagandists or self promoters.

Nearly everyone they had on the Stanford University's excellent series of talks (election 2012, on itunesU) predicted Obama winning, or rather that the republicans would lose, well ahead of the debates. People with real expertise and historical appreciation like Mark McKinnon, the guy who put Bush in office twice, and others of varying political stripes were among those** who called it well in advance.

I think a combination of 2 & 3 would be most effective. Big data analytics with an understanding of how the system really works. Unfortunately I can more easily, perhaps cynically, see this used for more manipulative campaigning rather than to inform voters, i.e. more sophisticated types of talking heads.

P.S. I'm a Moneyball fan too. P.P.S. Parallels with finance / investing? Buffet, Renaissance Technologies or maybe even LTCM?

Notes * I can't think of a snazzy name that ends with heads. ** They did have a polling expert / researcher, can't remember his name, give his data driven views as well. He was even more emphatic about an Obama win, or rather a republican loss.

 

The idea that Democratic turnout would be somewhere between 2004 and 2008 levels while Republican turnout would be greater than 2008 levels was very reasonable. Gallup and Rass both posted more pro-GOP identification skew and virtually all polls prior to the election had GOP enthusiasm as greater. There really was no reason to expect a virtually identical +D split as in 2008, during which the GOP hated their candidate and were demoralized by 8 years of Bush disappointment while the Democrats were very very energized by an incredibly optimistic Obama campaign.

There are a ton of assumptions built into polls as they try to correct for various things. It wasn't all that unreasonable to balk at the heavy Dem split assumptions in some of the polls. Intrade gave romney a roughly 1/3rd chance prior to the election.

They weren't using gut instincts, they were using internal polls with different assumptions that were plausible but turned out to be wrong.

 

Stats are fine, but when you are dealing with the human element it is best to not put all your weight on the numbers. Democrats relied on low propensity voters. If it rained, snowed or a bunch of other things you could have a lot of people stay home.

http://en.wikipedia.org/wiki/United_States_presidential_election,_2008

http://en.wikipedia.org/wiki/United_States_presidential_election,_2012

Obama got 8MM less votes than in 2008. Turn out was lower. It was a closer race. The demographics switched with Obama losing the white vote and maintaining the minority vote.

I am sorry, but while the stats are nice and more objective than Fox and MSNBC, stats cannot predict that Romney gets over 1MM less total votes than McCain, especially considering the way the economy is and Obama fatigue.

Republicans just didn't do a good job getting people to the polls and the Obama racial component helped him rule the day. Either way turn out sucks which really is surprising considering the situation we are in. Americans are just lazy fucks.

 

I can't speak for the OP, but I'm interested in knowing more about the big data consulting industry in general. I know a lot about being a data scientist, but much less about being a data science consultant. A search of the forums turned up a lot of unanswered questions on this topic.

My main question is really how does it compare to regular management consulting - in terms of hours, workload, travel, salary, exit opportunities, etc. I'm also curious if the job itself is less technical than regular data science positions due to its consulting nature.

I took OP's question to be about the recruiting timeline - upon rereading it, that's not what he was asking, but I am curious as to what the recruiting timeline looks like.

If you are/were a big data consultant, and you feel like sharing some thoughts (in response to my questions or just in general) it will be much appreciated.

 

I can only speak to this second hand. I have a friend who is in Deloitte's Data Analytics group (name might be slightly off) but from what I gather it's basically big data consulting. He went that route right after school where he majored in comp sci and econ.

From what i've heard, he works slightly easier hours than the S&O folks, salary is slightly less at the pre-mba level, still travels monday to thursday, projects are usually 3-6 months, and it's still team based like mgmt consulting. The biggest difference he described to me was how he feels much less "front office" and client-facing than the S&O teams. He thinks he's building a focused skill set that's very different from what mgmt consulting gets you and this concerns him about his exit opps. I'm pretty sure it's less technical than regular data science positions, because he said that while he does code daily there are also support teams that back his team up remotely and handle a lot of the more intense coding. This is also a concern of his because he knows he's not getting as technically skilled as those folks are and is also missing out on the client skills that S&O folks get.

All that said, he's very happy with the work, his main complaint is not how it fares versus regular data positions or mgmt consulting, it is the travel and the impact that has on his life.

Hopefully this was helpful. I'd be great if others out there could chime in too.

 

It will really depend on the projects. There might be far more analytics projects compared to data science projects and even fewer big data projects. Many of the big data projects are early stages to where a data scientist isn't needed for 6-12-18 months out. No point in having a big data scientist if say, the hadoop ecosystem, isn't even implemented yet.

I think there are tons of data science and advanced analytic opportunities, but the challenge is if everything being sold is 'big data' this and 'big data' that, it becomes one of those you need the big data platform before you even can think about data science. Now if you're talking strategy and figuring out what business use cases are involved or architected out some NextGen system that includes real-time and data science use cases, there are needs for that, but again, it all depends on what projects are won and what has been agreed upon.

There are many partnerships in the space, even among competitors. So if your company is doing the early strategy work, but somebody else is doing the implementation and somebody else the offshoring/maintenance of said system, what you wind up doing will depend on where your company lies in that realm.

If you're strategy only, you might get to do some cool things, but if you're more into actualy doing the hands on work, you might be disappointed. On a site like this most people seem to prefer strategy, but if you're more a techie or stat guy over an MBA person, you probably prefer actually implementing something. So If it's implementation, you might get to build models or create infographs and use some algorithms. Then again, it depends as some firms "implementation" will mean business implementation and not tech implementation. Far different. If it's maintenance, you'll probably just be doing things like system admin, hadoop admin, tuning, etc.

And it will depend where you work because there are certain firms that just want bodies more than anything else and so you wind up on a big data project that lasts 3-12 months and all you do all day long is write Hive SQL queries to replace their existing database technologies. The "architecture" is pretty much just trying to replace what they already have because nobody wants to 'rock the boat' but they want hadoop or big data. Nobody would really call that big data or data science, but more than half the projects wind up being that exact scenario.

Or you wind up on a data science project that is 'replace SAS with R' and so on. It sounds exciting, but it really is just as it seems and not much more. That is the challenge in the space as everybody talks about all the things they do, but many projects are simply just projects that replace what they already had with open source tools or cheaper tools. Or at least try to do that. And half the time it's not even big data.

I will say this though. If you wind up on projects in and around the silicon valley or certain banks around the world, they are so far ahead of everybody else and it's not just talk at some conference or on bloomberg, so you probably will work on big data and data science.

Then again the other challenge is some clients want people to stick around for years and not just do something cool for 3 months and walk away.
I think that's where a lot of these big data projects struggle. Strategy is great until nobody can implement that strategy. Even if you wind up creating some great nextGen architecture and build out all these use cases and how they will gain a huge ROI from this new system that might predict the future in a way, it all ends if nobody can actually implement it or nobody is around to keep it running.

I ran into various projects where the client really had blank checks to write for consultants to stick around for 2 years, but the company just didn't have the resources. And it wasn't exactly local so it was one of those "who was going to relocate for 2+ years" and so the project died before it ever had a chance.

One last thing, in consulting, and especially in the big data and data science space, you can become pigeonholed very fast if you're any good and work hard. It might not matter, but it might screw you over long term. It's one of those "you're too valuable to do anything else" concepts. If you prove to be good at say using Tableau and R and Hive and so on and aren't a person who rocks the boat, you might wind up always being utilized, but you also might wind up doing the same kind of projects over and over and over again.
Some people don't mind, others wind up being the Tech guy at a consulting firm who winds up stuck because they are valuable, but they aren't exactly promoted because of it. Plus always being the rock star techie at some firms, and at something that many others don't know yet, you wind up rarely being able to network because you're the only guy really doing all the work on these kinds of projects.

So if you just see it as 'get the experience and leave in a few years' go for it. But if you want to learn the business side of things, learn strategy, network, be a good consultant, etc, well big data can be a dangerous area in consulting if you know your stuff and work hard. Great right now, but in a few years when it's no longer the "in thing" what do you do as the person who isn't that cool anymore, but isn't exactly a 'consultant' per say?

 

The other important aspect of cheap natural gas in the US is that it means cheap ethane/ethylene and their derivatives. Just think of the possible effects on the US economy and manufacturing if it starts to become cheaper to produce certain types of plastic in the US than it is in Asia. This is in addition to the energy benefits mentioned above.

 

You use both.

You absolutely need to learn SQL for data extraction. For the data analysis, you can use SAS, R, SPSS, or even Excel.

You can also write sql code into SAS or R and then handle the objects within the programs.

 

Just download R and attempt the modules

I would steer away from excel for anything sophisticated.

My favorite is SAS and R is quickly unseating it. Also try talkstats.com.

Making money is art and working is art and good business is the best art - Andy Warhol
 
dwight schrute:
Just download R and attempt the modules

I would steer away from excel for anything sophisticated.

My favorite is SAS and R is quickly unseating it. Also try talkstats.com.

Exactly! What I love about R is that it is completely free too! You should definitely check it out http://www.r-project.org/

My formula for success is rise early, work late and strike oil - JP Getty
 

I've noticed a lot of people are saying that about R. However, I am slightly skeptical because I know a private sector statistician who told me that R doesn't handle big data sets well (among other things). In any event, it's probably good to learn SAS and R, if you have the time.

 
econ:
I've noticed a lot of people are saying that about R. However, I am slightly skeptical because I know a private sector statistician who told me that R doesn't handle big data sets well (among other things). In any event, it's probably good to learn SAS and R, if you have the time.
Yeah I've noticed some performance issues starting ~500 mb. There are some packages that allow you to store the data on your hard drive although its counterproductive once you have to load the data into memory.
Quarterlife:
Speaking of R, I recommend this for self-taught purpose
I've heard approving reviews of it before. Is it primarily aimed at beginners or will intermediate users still find it useful?
Making money is art and working is art and good business is the best art - Andy Warhol
 

Vero dolorem sed ducimus sit. Doloremque sunt minima voluptatibus corrupti ut consectetur ea. Eum id voluptatem quia laboriosam consectetur architecto id. Voluptatem est accusantium dolore qui. Sit quia iure ex consequuntur hic.

Ipsa id est aut libero exercitationem laborum. Odio ipsam laudantium ipsam magnam ut voluptatum et beatae. Laboriosam cumque rem ullam iusto vel. Numquam quis quia fuga rerum exercitationem. Aut omnis eius id ea dicta est fuga.

Aspernatur cum cum quia aut. Dolor fuga eum in omnis beatae ut quis. In odit illo mollitia delectus id facere. Placeat voluptate debitis libero. Animi voluptatem aut velit laborum unde voluptatibus facere. Quia corrupti qui beatae quis omnis sit. Est est ut quia.

Illo dolores architecto reiciendis non qui dolores placeat eos. Culpa corrupti earum enim quia rerum esse.

My formula for success is rise early, work late and strike oil - JP Getty
 

A eos quis perferendis inventore odio eaque sapiente error. Cupiditate adipisci illum rerum aperiam atque. Illum praesentium sequi aliquid in officia. Cumque qui dolorem hic.

Earum mollitia accusamus soluta doloremque. Quas quibusdam ea nobis aut laborum cupiditate ipsam. Consequatur quaerat sit corporis vel ab non. Aut facilis et voluptas magnam eligendi similique quia aliquam. Saepe sit dolores qui et necessitatibus aspernatur.

Modi voluptas enim qui est perspiciatis. Aut maiores in quia. Nesciunt omnis dicta impedit exercitationem libero. Aut atque rerum pariatur.

Neque ut commodi expedita aut asperiores odio. Molestiae dolores eum error tempora ad qui eum.

Career Advancement Opportunities

March 2024 Investment Banking

  • Jefferies & Company 02 99.4%
  • Goldman Sachs 19 98.8%
  • Harris Williams & Co. (++) 98.3%
  • Lazard Freres 02 97.7%
  • JPMorgan Chase 03 97.1%

Overall Employee Satisfaction

March 2024 Investment Banking

  • Harris Williams & Co. 18 99.4%
  • JPMorgan Chase 10 98.8%
  • Lazard Freres 05 98.3%
  • Morgan Stanley 07 97.7%
  • William Blair 03 97.1%

Professional Growth Opportunities

March 2024 Investment Banking

  • Lazard Freres 01 99.4%
  • Jefferies & Company 02 98.8%
  • Goldman Sachs 17 98.3%
  • Moelis & Company 07 97.7%
  • JPMorgan Chase 05 97.1%

Total Avg Compensation

March 2024 Investment Banking

  • Director/MD (5) $648
  • Vice President (19) $385
  • Associates (86) $261
  • 3rd+ Year Analyst (13) $181
  • Intern/Summer Associate (33) $170
  • 2nd Year Analyst (66) $168
  • 1st Year Analyst (202) $159
  • Intern/Summer Analyst (144) $101
notes
16 IB Interviews Notes

“... there’s no excuse to not take advantage of the resources out there available to you. Best value for your $ are the...”

Leaderboard

1
redever's picture
redever
99.2
2
BankonBanking's picture
BankonBanking
99.0
3
Betsy Massar's picture
Betsy Massar
99.0
4
Secyh62's picture
Secyh62
99.0
5
GameTheory's picture
GameTheory
98.9
6
dosk17's picture
dosk17
98.9
7
DrApeman's picture
DrApeman
98.9
8
CompBanker's picture
CompBanker
98.9
9
kanon's picture
kanon
98.9
10
numi's picture
numi
98.8
success
From 10 rejections to 1 dream investment banking internship

“... I believe it was the single biggest reason why I ended up with an offer...”