How is data science used in deal sourcing?

Dumb question here, but I've heard more and more how some firms are using data science to drive their sourcing efforts (i.e., identifying targets to approach outside of a process) -- can anyone explain this in a little more detail?  What would kind of metrics would algorithms be looking at, and from what sources?

Comments (12)

  • Associate 1 in CorpDev
May 11, 2022 - 9:48am

SourceScrub utilizes data science to scrub conferences and industry publications for potential targets. They use hiring numbers as a proxy for growth. Only tool I have seen that advertises using data science to help with sourcing.

  • Associate 3 in PE - LBOs
May 11, 2022 - 10:41am

Makes sense, thanks!   I think a few other service providers (Grata?) use AI/ML in their scraping too.  Any thoughts on how an in-house team would use it?  I'm guessing in a similar way: in-house data team grabs similar metrics from public sources (linkedin, conference/trade attendee lists, websites, etc.) for the sourcing team to prioritize direct outreach?  

Anyone here have experience with this?

Most Helpful
May 11, 2022 - 11:01am
PWM Hopeful, what's your opinion? Comment below:

Coming from data science in a different area I can maybe help you take a step in the right direction. When people think of Data Science they might think fancy algos but in reality it's more like "How can we use data to make smarter decisions?", then work backwards from there. So here's how I'd work backwards for sourcing to solve the question "Who should I reach out to for sourcing?":
A) Historically, In X vertical, companies begin entering a process at Y revenue number/employee count at Z probability. Capture this data in an Excel sheet for all I care.
B) In X vertical, revenues are forecasted to increase by Y percent. Relate this back to the relationship in A.
C) Capture the name of those companies, and then scrape LinkedIn using Python for "Company Name, Title = VP"
D) Populate the name, title, and chance this person could enter a process.

Ta da. You now have a rough probability of a company entering a process based on historical data and have started building a pipeline. This took me all of 5 minutes to type and I have zero expertise in this industry so imagine the possibilities. Make sense?

Just had my trade dispute rejected by Schwab for a loss of 35k. This single issue alone should be a gigantic red flag to anyone who trades on their platform.

If they have a system error, and you do not video record your trading (they actually said this), they will not honour their fuck up. Switching everything away from them. Fuck this company.

  • 7
May 11, 2022 - 5:32pm
arb432, what's your opinion? Comment below:

First off, most PE/GE firms aren't doing any real data science, if by that you mean analyzing very large datasets in programs like SQL and drawing conclusions from that data.

As another poster mentioned, PE/GE firms might use tools like Sourcescrub to find leads. And Sourcescrub itself just scrapes the web and probably does some manual data gathering / cleansing so that they can sell their platform to PE end users. 

I work in growth equity and the process PWM Hopeful outlined is spot on. You are looking for targets in X vertical, with revenue/EBITDA ranging between $Y-$Z, and growth rates above W%. You start by going on Sourcescrub and looking at other industry market maps to find all the relevant companies in X vertical. You pull in LinkedIn headcount info which tends to be a good way to back into revenue / growth rates, maybe pull in prior funding history, then use LinkedIn to find the person you want to be reaching out to at each target company (almost always the CEO). I doubt most PE firms are using python to scrape LinkedIn, usually you just find the CEO and get his/her email to reach out (Sourcescrub has CEO email address).

Once you have all this info, you populate your firm's CRM (Salesforce, Dealcloud, etc.) and start reaching out, usually with a template email or something that is slightly adjusted to reflect the interest your firm has in X vertical. Then you keep reaching out (often several times) until you get a response, or if you don't get a response, you try to find another avenue to get in touch with the company. That could be through a banker, through the PE firm that currently owns the business, 2nd degree LinkedIn connections, etc.

Hopefully this helps. As you can probably tell, this is not a very technically-sophisticated process. You could probably train a high-schooler to do it...

May 11, 2022 - 11:26pm
PWM Hopeful, what's your opinion? Comment below:
arb432

First off, most PE/GE firms aren't doing any real data science, if by that you mean analyzing very large datasets in programs like SQL and drawing conclusions from that data.

As another poster mentioned, PE/GE firms might use tools like Sourcescrub to find leads. And Sourcescrub itself just scrapes the web and probably does some manual data gathering / cleansing so that they can sell their platform to PE end users. 

I work in growth equity and the process PWM Hopeful outlined is spot on. You are looking for targets in X vertical, with revenue/EBITDA ranging between $Y-$Z, and growth rates above W%. You start by going on Sourcescrub and looking at other industry market maps to find all the relevant companies in X vertical. You pull in LinkedIn headcount info which tends to be a good way to back into revenue / growth rates, maybe pull in prior funding history, then use LinkedIn to find the person you want to be reaching out to at each target company (almost always the CEO). I doubt most PE firms are using python to scrape LinkedIn, usually you just find the CEO and get his/her email to reach out (Sourcescrub has CEO email address).

Once you have all this info, you populate your firm's CRM (Salesforce, Dealcloud, etc.) and start reaching out, usually with a template email or something that is slightly adjusted to reflect the interest your firm has in X vertical. Then you keep reaching out (often several times) until you get a response, or if you don't get a response, you try to find another avenue to get in touch with the company. That could be through a banker, through the PE firm that currently owns the business, 2nd degree LinkedIn connections, etc.

Hopefully this helps. As you can probably tell, this is not a very technically-sophisticated process. You could probably train a high-schooler to do it...

You're stating it's not technically sophisticated because you don't know anything about the technicals and are posing like you do. It can be infinitely technical if you had any respect or knowledge of the skillset instead of just regurgitating what the first poster + myself said. Class is now in session.

First off - SQL isn't a program - it's a querying language. It queries the data, hence SQL stands for Structured Query Language................ This is the equivalent of saying SUMIF is a program instead of an Excel function, or C# is a program instead of a language. Categorically wrong.

Second, nobody is doing analysis of datasets using SQL. Since - you know - it queries the data, not analyzes it. That's what Python, R, Stata, SPSS, etc. are for.

I see why you believe it isn't technical, you're ignorant of technicalities. 

Don't make me continue, but I will if you pose or disrespect the skillset again. 

Just had my trade dispute rejected by Schwab for a loss of 35k. This single issue alone should be a gigantic red flag to anyone who trades on their platform.

If they have a system error, and you do not video record your trading (they actually said this), they will not honour their fuck up. Switching everything away from them. Fuck this company.

  • 8
  • 1
  • Analyst 2 in IB - Cov
May 12, 2022 - 1:42am

Math guy here. 95% of finance people don't understand the difference between a query language and a general programming language, don't bother. 

May 12, 2022 - 9:29pm
arb432, what's your opinion? Comment below:

When I spoke about it being "not very technically-sophisticated" I was referring to the sourcing process I described in paragraphs 3 and 4, not what you described above, which I agree is technical. Sorry I didn't make that clear.

Most PE firms aren't using any sort of sophisticated data analysis (which again, I agree is technical, and beyond my understanding) to source deals. Maybe a few, but not most. 

May 26, 2022 - 9:40pm
dafftt, what's your opinion? Comment below:

There is nothing in what you said that entails a skill set that a high schooler cannot acquire. The fact that you have zero expertise is evident from how you described the process to identify prospects. Which is totally fine, except that you seem to think you have the expertise to lecture some anonymous person trying to help. Not cool.

The tool for analyses depends on the task at hand, type of data, skills available, etc. SQL can be used to analyze data in some use cases, but you seem to think "select * from linkedin" is its peak capability.

It's cool to show off your expertise. But two caveats. First, acquire some expertise. Second, please be humble. We all make mistakes, and we learn.

May 12, 2022 - 9:45pm
m_1, what's your opinion? Comment below:

We aren't using data science but we use some light tech.

Plugging into ppp data, import/export records, SEMRush API, BuiltWith.com's API, and a handful others to build a list of targets. Then our overseas employee scrapes their email when it's not picked up by our scraper. Then auto email! 

Works strangely well.

Example of one of those indicators in use would be import/export records. If containers being imported are over X, you know the target is in your ~range. Can use growth of imports as a proxy for growth too. 

None are perfect, but it saves a lot of time.

  • Intern in IB - Cov
May 13, 2022 - 6:22am

Voluptas molestiae repellendus dolores et quaerat sit assumenda. Voluptatem aperiam alias aut et omnis reprehenderit quidem. Quia atque autem dolorem maiores dolores. Voluptatem explicabo quidem et atque dicta. Exercitationem odio explicabo et impedit eaque vel sint. Ipsa magnam nam corrupti accusamus aliquid. Amet accusamus vitae sit distinctio.

Qui quaerat suscipit in corrupti. Et omnis vitae in cumque natus quasi quibusdam. Commodi laborum repellendus perferendis dolorem. Eos perferendis facere explicabo nam recusandae sapiente dolor. Quia accusamus dolorum quo nihil voluptas sapiente.

Start Discussion

Career Advancement Opportunities

June 2022 Private Equity

  • The Riverside Company 99.4%
  • Apollo Global Management 98.9%
  • Warburg Pincus 98.3%
  • KKR (Kohlberg Kravis Roberts) 97.7%
  • Blackstone Group 97.1%

Overall Employee Satisfaction

June 2022 Private Equity

  • Blackstone Group 99.4%
  • KKR (Kohlberg Kravis Roberts) 98.8%
  • The Riverside Company 98.3%
  • Ardian 97.7%
  • Bain Capital 97.1%

Professional Growth Opportunities

June 2022 Private Equity

  • The Riverside Company 99.4%
  • Warburg Pincus 98.9%
  • Bain Capital 98.3%
  • Apollo Global Management 97.7%
  • Blackstone Group 97.1%

Total Avg Compensation

June 2022 Private Equity

  • Principal (8) $676
  • Director/MD (20) $595
  • Vice President (79) $360
  • 3rd+ Year Associate (81) $273
  • 2nd Year Associate (183) $266
  • 1st Year Associate (345) $226
  • 3rd+ Year Analyst (27) $157
  • 2nd Year Analyst (71) $134
  • 1st Year Analyst (210) $120
  • Intern/Summer Associate (21) $67
  • Intern/Summer Analyst (245) $58