'Big data'
So I've been invited to interview with a financial services 'startup', specifically, it's a peer-to-peer lending company, for an analyst position. The job posting specified that experience in SQL/Tableau would be helpful but they are aware that I have limited experience in these areas.
So with that introduction out of the way, I have been informed I will be required to undertake a 'big data manipulation' exercise in Excel as part of the interview. Google hasn't been helpful in terms of precedents, so now I come to the wise heads of WSO seeking guidance.
Has anyone encountered something similar? Have you come across any guides/introductions to 'big data' manipulation in Excel?
Pretty sure after a few million data points Excel just freezes, so there won't be any "big data" manipulation at all. If you're working with a real statistical language you might be expected to clean the data, filter on certain conditions, etc
Thanks for taking the time to respond hughwattmate. Indeed, I assume the actual exercise won't deal in what is commonly described as 'big data'. However, I still think there are certain data manipulation 'tricks' I should consider reviewing in Excel.
How Big Data Triumphs Over Gut Instincts (Originally Posted: 11/09/2012)
However you may feel about the outcome of Tuesday's Presidential election, one thing is clear:
Big Data triumphed over Gut Instincts.
This is not a post about policy disagreements or who had the best vision for America's future. This is a post about the ongoing rise of Big Data and the People who embrace it.
It's also a post about the difficulty of objective analysis when the information you're analyzing is contrary to your beliefs and desires. The lead-up to the election brought prognosticators out of the woodwork. Countless pundits and "experts" weighed on on what they believed would happen. For the most part, they fit into two camps:
Talking Heads could be found on any number of cable news shows. They based their electoral predictions on their gut instincts. On flighty ideas like "momentum," and their opinion on "the mood of the country." One well-respected pundit claimed that the "vibrations were right" for a Romney win, whatever that even means. They might throw a few numbers out there, like unemployment figures or the debt, but they'd rarely look to opinion polls. Unless of course they could cherry pick polls in their favor.
Stat Heads could mainly be found on the internet. They based their electoral predictions not on gut feelings or nebulous concepts, but on unbiased analysis of hard data. They didn't ignore the polls, they studied them and ran them through models. Their analysis delivered probabilities that matched up extraordinarily well with the election's actual results.
What was clear to a clear-minded outsider was that the Talking Heads almost exclusively made predictions that matched their desired outcome. They weren't conducting well-thought out analyses so much as they were trying desperately to convince the world that what they wanted to believe was actually the truth.
To be clear, this happened on both sides of the aisle. While folks like Dick Morris were the worst of the Talking Heads, this sort of thing happened with the left wing pundits of the world after polls starting coming out post-Debate #1.
Watching all of this reminded me of my post on the baseball MVP race, which pits Triple Crown winner Miguel Cabrera against once-in-a-lifetime rookie Mike Trout. As I see it, the talking heads who were ranting and raving about nebulous gut feelings and "momentum" while ignoring key data are no different than the dinosaurs of baseball who judge a player on his RBI total while scoffing at advanced statistics like WAR and OPS.
What I wonder is, will this sort of non-sense continue in future elections? Now, I obviously don't think Talking Heads are going to just fall off the face of the Earth. The voting public at large is much less informed about election issues than a big time baseball fan is about his sport. But, could we perhaps see a hybrid of sorts going forward?
In Nate Silver's book, The Signal and the Noise, he dives into the post-Moneyball era of baseball and points out something incredibly interesting. While stats and deep data analysis is paramount to a modern club's success, it has yet to kill off scouting. In fact, at teams like the Oakland A's, scouting is more important than ever. Only through a hybrid of soft and hard analysis can teams consistently and reliably forecast players' expected abilities and contributions. And only through that combination can teams consistently build out a winning ball club.
My question for WSO is, will we see this sort of hybrid approach play a bigger role in political coverage going forward? Or are we permanently doomed to overblown horse-race reporting that ignores the data and fails to report on the issues?
Of course it will continue, what do you think the talking heads are paid to do? Be objective? fuck no. they are there to draw viewers to driver ratings which then drives the affiliate fees and ad revenues for the company. Extreme political views draws viewers like Fox News.
I don't think it'll ever fully go away. My point is more that these guys really looked bad and like they've got egg on their face because the data was right up front of them and they simply chose to ignore it. At some point, their credibility goes to zero with the average voter (I'm discounting the tribal voters, here.)
At least from the perspective of campaigns themselves, I imagine candidates on both sides of the aisle would benefit from analyzing the data objectively and adjusting their operations accordingly.
Anyway. In my view, the best analysis uses hard data and draws conclusions. The soft stuff can play a part, and it's really great when a narrative is built around data, as opposed to trying to cherry pick data to build a narrative (which is what many talking heads have made a living doing.)
Exactly. No one wants to watch a "well... If this then this, and if this then this, but most likely that..."
I want the Fox News chicks to analyze my poll.
you will always have more republicans respond to these polls over democrats (a million different reasons why), and that my friend is why Romney was leading in most of the polls. Plus viewer ratings over political correctness for the tv stations a least.
That's all great, but there was a rational argument as to why the "talking heads" thought that way. Voter turnout was truly a guessing game and many thought that oversampling of democrats occurred, which it did.
Based on what, exactly? Cherry picking data to fit a worldview is not rational, it's wishful thinking.
What about no. 3?
3. People who run campaigns and know how the system actually works*, as opposed to no. 1 who are primarily propagandists or self promoters.
Nearly everyone they had on the Stanford University's excellent series of talks (election 2012, on itunesU) predicted Obama winning, or rather that the republicans would lose, well ahead of the debates. People with real expertise and historical appreciation like Mark McKinnon, the guy who put Bush in office twice, and others of varying political stripes were among those** who called it well in advance.
I think a combination of 2 & 3 would be most effective. Big data analytics with an understanding of how the system really works. Unfortunately I can more easily, perhaps cynically, see this used for more manipulative campaigning rather than to inform voters, i.e. more sophisticated types of talking heads.
P.S. I'm a Moneyball fan too. P.P.S. Parallels with finance / investing? Buffet, Renaissance Technologies or maybe even LTCM?
Notes * I can't think of a snazzy name that ends with heads. ** They did have a polling expert / researcher, can't remember his name, give his data driven views as well. He was even more emphatic about an Obama win, or rather a republican loss.
The idea that Democratic turnout would be somewhere between 2004 and 2008 levels while Republican turnout would be greater than 2008 levels was very reasonable. Gallup and Rass both posted more pro-GOP identification skew and virtually all polls prior to the election had GOP enthusiasm as greater. There really was no reason to expect a virtually identical +D split as in 2008, during which the GOP hated their candidate and were demoralized by 8 years of Bush disappointment while the Democrats were very very energized by an incredibly optimistic Obama campaign.
There are a ton of assumptions built into polls as they try to correct for various things. It wasn't all that unreasonable to balk at the heavy Dem split assumptions in some of the polls. Intrade gave romney a roughly 1/3rd chance prior to the election.
They weren't using gut instincts, they were using internal polls with different assumptions that were plausible but turned out to be wrong.
Stats are fine, but when you are dealing with the human element it is best to not put all your weight on the numbers. Democrats relied on low propensity voters. If it rained, snowed or a bunch of other things you could have a lot of people stay home.
http://en.wikipedia.org/wiki/United_States_presidential_election,_2008
http://en.wikipedia.org/wiki/United_States_presidential_election,_2012
Obama got 8MM less votes than in 2008. Turn out was lower. It was a closer race. The demographics switched with Obama losing the white vote and maintaining the minority vote.
I am sorry, but while the stats are nice and more objective than Fox and MSNBC, stats cannot predict that Romney gets over 1MM less total votes than McCain, especially considering the way the economy is and Obama fatigue.
Republicans just didn't do a good job getting people to the polls and the Obama racial component helped him rule the day. Either way turn out sucks which really is surprising considering the situation we are in. Americans are just lazy fucks.
At least we will get Obama care.
Big Data Consulting (Originally Posted: 04/20/2015)
How does recruiting at the associate level work for big data consulting? I'm quite interested in this aspect of consulting as a stats masters student.
Bump! I'm interested in knowing as well.
do you have specific firms you're asking about? I don't think many people are going to help you with such a vague question
I can't speak for the OP, but I'm interested in knowing more about the big data consulting industry in general. I know a lot about being a data scientist, but much less about being a data science consultant. A search of the forums turned up a lot of unanswered questions on this topic.
My main question is really how does it compare to regular management consulting - in terms of hours, workload, travel, salary, exit opportunities, etc. I'm also curious if the job itself is less technical than regular data science positions due to its consulting nature.
I took OP's question to be about the recruiting timeline - upon rereading it, that's not what he was asking, but I am curious as to what the recruiting timeline looks like.
If you are/were a big data consultant, and you feel like sharing some thoughts (in response to my questions or just in general) it will be much appreciated.
I can only speak to this second hand. I have a friend who is in Deloitte's Data Analytics group (name might be slightly off) but from what I gather it's basically big data consulting. He went that route right after school where he majored in comp sci and econ.
From what i've heard, he works slightly easier hours than the S&O folks, salary is slightly less at the pre-mba level, still travels monday to thursday, projects are usually 3-6 months, and it's still team based like mgmt consulting. The biggest difference he described to me was how he feels much less "front office" and client-facing than the S&O teams. He thinks he's building a focused skill set that's very different from what mgmt consulting gets you and this concerns him about his exit opps. I'm pretty sure it's less technical than regular data science positions, because he said that while he does code daily there are also support teams that back his team up remotely and handle a lot of the more intense coding. This is also a concern of his because he knows he's not getting as technically skilled as those folks are and is also missing out on the client skills that S&O folks get.
All that said, he's very happy with the work, his main complaint is not how it fares versus regular data positions or mgmt consulting, it is the travel and the impact that has on his life.
Hopefully this was helpful. I'd be great if others out there could chime in too.
It will really depend on the projects. There might be far more analytics projects compared to data science projects and even fewer big data projects. Many of the big data projects are early stages to where a data scientist isn't needed for 6-12-18 months out. No point in having a big data scientist if say, the hadoop ecosystem, isn't even implemented yet.
I think there are tons of data science and advanced analytic opportunities, but the challenge is if everything being sold is 'big data' this and 'big data' that, it becomes one of those you need the big data platform before you even can think about data science. Now if you're talking strategy and figuring out what business use cases are involved or architected out some NextGen system that includes real-time and data science use cases, there are needs for that, but again, it all depends on what projects are won and what has been agreed upon.
There are many partnerships in the space, even among competitors. So if your company is doing the early strategy work, but somebody else is doing the implementation and somebody else the offshoring/maintenance of said system, what you wind up doing will depend on where your company lies in that realm.
If you're strategy only, you might get to do some cool things, but if you're more into actualy doing the hands on work, you might be disappointed. On a site like this most people seem to prefer strategy, but if you're more a techie or stat guy over an MBA person, you probably prefer actually implementing something. So If it's implementation, you might get to build models or create infographs and use some algorithms. Then again, it depends as some firms "implementation" will mean business implementation and not tech implementation. Far different. If it's maintenance, you'll probably just be doing things like system admin, hadoop admin, tuning, etc.
And it will depend where you work because there are certain firms that just want bodies more than anything else and so you wind up on a big data project that lasts 3-12 months and all you do all day long is write Hive SQL queries to replace their existing database technologies. The "architecture" is pretty much just trying to replace what they already have because nobody wants to 'rock the boat' but they want hadoop or big data. Nobody would really call that big data or data science, but more than half the projects wind up being that exact scenario.
Or you wind up on a data science project that is 'replace SAS with R' and so on. It sounds exciting, but it really is just as it seems and not much more. That is the challenge in the space as everybody talks about all the things they do, but many projects are simply just projects that replace what they already had with open source tools or cheaper tools. Or at least try to do that. And half the time it's not even big data.
I will say this though. If you wind up on projects in and around the silicon valley or certain banks around the world, they are so far ahead of everybody else and it's not just talk at some conference or on bloomberg, so you probably will work on big data and data science.
Then again the other challenge is some clients want people to stick around for years and not just do something cool for 3 months and walk away.
I think that's where a lot of these big data projects struggle. Strategy is great until nobody can implement that strategy. Even if you wind up creating some great nextGen architecture and build out all these use cases and how they will gain a huge ROI from this new system that might predict the future in a way, it all ends if nobody can actually implement it or nobody is around to keep it running.
I ran into various projects where the client really had blank checks to write for consultants to stick around for 2 years, but the company just didn't have the resources. And it wasn't exactly local so it was one of those "who was going to relocate for 2+ years" and so the project died before it ever had a chance.
One last thing, in consulting, and especially in the big data and data science space, you can become pigeonholed very fast if you're any good and work hard. It might not matter, but it might screw you over long term. It's one of those "you're too valuable to do anything else" concepts. If you prove to be good at say using Tableau and R and Hive and so on and aren't a person who rocks the boat, you might wind up always being utilized, but you also might wind up doing the same kind of projects over and over and over again.
Some people don't mind, others wind up being the Tech guy at a consulting firm who winds up stuck because they are valuable, but they aren't exactly promoted because of it. Plus always being the rock star techie at some firms, and at something that many others don't know yet, you wind up rarely being able to network because you're the only guy really doing all the work on these kinds of projects.
So if you just see it as 'get the experience and leave in a few years' go for it. But if you want to learn the business side of things, learn strategy, network, be a good consultant, etc, well big data can be a dangerous area in consulting if you know your stuff and work hard. Great right now, but in a few years when it's no longer the "in thing" what do you do as the person who isn't that cool anymore, but isn't exactly a 'consultant' per say?
Is Big Data getting Bigger? (Originally Posted: 12/08/2012)
Here we are at the end of 2012, in the coming weeks there will be a whole host of social media chatter about best and worst of 2012 in movies, politics, business, finance and even Youtube videos. So, which industries do you think will have the biggest Movers in 2013. With improvement in jobs’ numbers, housing recovery gaining momentum and the biggest increase in home prices in six years according to Corelogic, Will this positive trend continue into 2013? Following are some of the industries/businesses that have been making a big splash in news this year, and I expect will sustain in the coming years. What do you think? Shale Oil & Gas: There have been a lot of discussions this year about the vast increase in production of Oil & Natural Gas from US Shale plays expected going forward. According to the Energy Information Administration’s 2013 Annual Energy Outlook, And with Natural Gas prices significantly higher in many parts of Asia, the US is expected to become a net exporter of Liquefied Natural Gas by 2016, supported by the findings of a study by Nera Economic Consulting regarding the economic benefits of doing so. While there have also been many debates about the sustainability of Shale Gas plays because of the high expected decline rates in production over time and the economic impact on prices inside the US due to exports, this will be a raging debate in 2013 as decisions are being made by the Department of Energy about approvals for LNG export.
Moving on, another interesting piece of news this year has been Starbucks’ Teavana acquisition and its foray into the tea industry and more recently, its $7 ‘Geisha’ Grande that caused a lot of curiosity and amusement. Check out this segment that was on
if you haven’t. While I don’t see the ‘Geisha’ making any big waves that will be sustained among coffee drinkers, another implementation on the part of Starbucks, payment through the Square Wallet app that was rolled out in its stores, which lets us pay by registering a credit card and paying automatically at the store, will be a fast growing trend in 2013. Here’s an article that I stumbled upon, about using the Square Wallet to pay as well as tip by Daniel Terdiman on CNet.
Which brings us to ‘Big Data’- In terms of frequency of usage, it has to be there alongside Socialism and Capitalism, which were the most looked up words on Merriam Webster this year. Denoting the processing and usage of enormous data that is generated everyday through social media and other online presence, it particularly gained prominence after the Presidential elections, for gathering data from all over the country and targeting individual voters for campaigning. This was covered by Time magazine immediately after the elections about how the Obama campaign ran the elections 66,000 times every night and allocated resources based on the results the next morning. With every click and every detail of our life stored and used for targeted advertising online, retail is the next most obvious place where we see Big Data used everyday. With startups like Cloudera making analysis of Big data much easier and accessible, I am looking forward to see how this changes the business landscape in the coming years.
And finally, the most exciting of all, is Google’s Kansas city experiment. I would love to see a Google cable company extended to the rest of the US, and who wouldn’t want to switch to a 1 Gbps broadband speed from a measly Mbps! At the least, it will force the cable companies to take a hard look at their service especially since Google has promised that ‘its installer will come to your house at the time of your appointment, not in some vague "window" that requires you to be home for 4 hours at a stretch’ according to a BI report.
What do you think? What are other major ideas/trends do you think will be sustained going into 2013?
I for one wouldn't mind seeing more compressed natural gas powered vehicles like buses, shuttles, etc. I know CAT has looked into a cng/hybrid drivetrain for some of their products. That would be cool.
tablets continue to slowly ease into all daily functions in life. further consolidation of the financial sector, and massive layoffs! growing number of trolls on WSO
The other important aspect of cheap natural gas in the US is that it means cheap ethane/ethylene and their derivatives. Just think of the possible effects on the US economy and manufacturing if it starts to become cheaper to produce certain types of plastic in the US than it is in Asia. This is in addition to the energy benefits mentioned above.
This is interesting
True about tablets. I wonder if payment mechanisms will be based on facial recognition next
The Age of Big Data (Originally Posted: 02/14/2012)
Anybody know a good on-line training program that teaches data analysis?
You mean SQL?
SAS is more for data analysis. SQL is more about pulling data from databases, updating databases, etc.
Oh ok.
Similar to SAS, we use ''Stata'' here.
You use both.
You absolutely need to learn SQL for data extraction. For the data analysis, you can use SAS, R, SPSS, or even Excel.
You can also write sql code into SAS or R and then handle the objects within the programs.
Just download R and attempt the modules
I would steer away from excel for anything sophisticated.
My favorite is SAS and R is quickly unseating it. Also try talkstats.com.
Exactly! What I love about R is that it is completely free too! You should definitely check it out http://www.r-project.org/
I've noticed a lot of people are saying that about R. However, I am slightly skeptical because I know a private sector statistician who told me that R doesn't handle big data sets well (among other things). In any event, it's probably good to learn SAS and R, if you have the time.
use eviews
Speaking of R, I recommend this for self-taught purpose
Vero dolorem sed ducimus sit. Doloremque sunt minima voluptatibus corrupti ut consectetur ea. Eum id voluptatem quia laboriosam consectetur architecto id. Voluptatem est accusantium dolore qui. Sit quia iure ex consequuntur hic.
Ipsa id est aut libero exercitationem laborum. Odio ipsam laudantium ipsam magnam ut voluptatum et beatae. Laboriosam cumque rem ullam iusto vel. Numquam quis quia fuga rerum exercitationem. Aut omnis eius id ea dicta est fuga.
Aspernatur cum cum quia aut. Dolor fuga eum in omnis beatae ut quis. In odit illo mollitia delectus id facere. Placeat voluptate debitis libero. Animi voluptatem aut velit laborum unde voluptatibus facere. Quia corrupti qui beatae quis omnis sit. Est est ut quia.
Illo dolores architecto reiciendis non qui dolores placeat eos. Culpa corrupti earum enim quia rerum esse.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...
A eos quis perferendis inventore odio eaque sapiente error. Cupiditate adipisci illum rerum aperiam atque. Illum praesentium sequi aliquid in officia. Cumque qui dolorem hic.
Earum mollitia accusamus soluta doloremque. Quas quibusdam ea nobis aut laborum cupiditate ipsam. Consequatur quaerat sit corporis vel ab non. Aut facilis et voluptas magnam eligendi similique quia aliquam. Saepe sit dolores qui et necessitatibus aspernatur.
Modi voluptas enim qui est perspiciatis. Aut maiores in quia. Nesciunt omnis dicta impedit exercitationem libero. Aut atque rerum pariatur.
Neque ut commodi expedita aut asperiores odio. Molestiae dolores eum error tempora ad qui eum.