Complex Task in Excel

Having exhausted many other options, I have turned to the resident Excel monkeys to see if any of you could help with this project I have to do.

I have two spreadsheets, both with lists of names of companies. The first spreadsheet has about 4000 company names and the second has about 9000. The one with 9000 includes all of the company names on the 4000-name sheet plus about 5000 more.

In a third spreadsheet, I need to match all of the 4000 names with their own names from the 9000-name sheet, in two columns. The reason this is difficult is because the names are spelled differently on each spreadsheet (in many, but not all, of the cases). For example, on one sheet, the company could be called "____ Brothers Inc." while the same company on the other sheet could be listed as "_____ Brothers". This is why I can't (?) use one of Excel's functions to match and list the companies on the third spreadsheet.

So any ideas on how to progress with this? I will probably need to use some VBA program but I don't know where to start. I would greatly appreciate any advice.

17 Comments
 

try using vlookup true for approximate matches, it might work

9000 list is in workbook A, cells A1:A9000 4000 list is in workbook B, cells A1:A4000

In workbook A, copy paste the 9000 list from column A to column B so that the same thing is repeated in adjoining columns. This is your array

Select cells B1:4000 in workbook B and type in =vlookup(A1,array,2,true). it should display the company name from your 9000 list even if it's slightly different

hopefully I explained ok

 
Naboorutry using vlookup true for approximate matches, it might work

9000 list is in workbook A, cells A1:A9000 4000 list is in workbook B, cells A1:A4000

In workbook A, copy paste the 9000 list from column A to column B so that the same thing is repeated in adjoining columns. This is your array

Select cells B1:4000 in workbook B and type in =vlookup(A1,array,2,true). it should display the company name from your 9000 list even if it's slightly different

hopefully I explained ok

i wasn't aware that you could do approximate matches for text strings. is it reliable?

Money Never Sleeps? More like Money Never SUCKS amirite?!?!?!?
 

Isn't the range lookup value here FALSE (for approximate) matches? If you use the FALSE value I'm pretty sure you can use an asterik wild card. But otherwise, I'm pretty sure Naroobru is right. It should work.

If you use a VBA code, you'll have to use string matching or percent string matching (I think). If you have more than one company with Brothers in the name...ugh.

 
Best Response

dude....

i did a TON of projects like these when i was just starting out. a vlookup will probably help with some matches, but unless you can develop an extremely complex algorithm to match the two lists, there is going to be a manual aspect to this, no way around it.

there are some methods that can make your life a little easier though. let's call your list of 4000 "List A", and your list of 9000 "List B". For both List A and List B, insert a column before the list, and using the "left" function, pull in the first 14 letters of each company name. If your actual list starts in cell B1, your formula should look something like this (copied down for the entire list):

=+LEFT(B1, 14)

Now, for both lists, replace the number "14" with a cell reference, so you can easily change the number (like put the number 14 in Cell Z1, and reference that cell... be sure to anchor down to cell Z1 so that it looks like $Z$1).

Next, do a vlookup off the truncated names in List B to the truncated names in List A... pulling in the ORIGINAL name from List A. Filter the vlookup column to anything that isn't returning an error value, and eyeball the list to see if any of the matches don't make sense (if they don't, then delete the formula in the cell).

Once this is done, replace that "14" number (in this case it was in Cell Z1) with 13, and repeat the process. Keep doing it until you get to 6 or 7... this should get you through most of the list. The rest you will have to do manually.

The general idea of the above method is to match the first "X" letters of each list to see if you get a match. You need to be careful as the number of letters get lower, as the matches become less reliable. For example, if you were matching the first 3 letters of both lists, then any company that started with "The ..." would show up as a match.

Hope this helps.

Money Never Sleeps? More like Money Never SUCKS amirite?!?!?!?
 
sayandaruladude....

i did a TON of projects like these when i was just starting out. a vlookup will probably help with some matches, but unless you can develop an extremely complex algorithm to match the two lists, there is going to be a manual aspect to this, no way around it.

there are some methods that can make your life a little easier though. let's call your list of 4000 "List A", and your list of 9000 "List B". For both List A and List B, insert a column before the list, and using the "left" function, pull in the first 14 letters of each company name. If your actual list starts in cell B1, your formula should look something like this (copied down for the entire list):

=+LEFT(B1, 14)

Now, for both lists, replace the number "14" with a cell reference, so you can easily change the number (like put the number 14 in Cell Z1, and reference that cell... be sure to anchor down to cell Z1 so that it looks like $Z$1).

Next, do a vlookup off the truncated names in List B to the truncated names in List A... pulling in the ORIGINAL name from List A. Filter the vlookup column to anything that isn't returning an error value, and eyeball the list to see if any of the matches don't make sense (if they don't, then delete the formula in the cell).

Once this is done, replace that "14" number (in this case it was in Cell Z1) with 13, and repeat the process. Keep doing it until you get to 6 or 7... this should get you through most of the list. The rest you will have to do manually.

The general idea of the above method is to match the first "X" letters of each list to see if you get a match. You need to be careful as the number of letters get lower, as the matches become less reliable. For example, if you were matching the first 3 letters of both lists, then any company that started with "The ..." would show up as a match.

Hope this helps.

great idea

 

still looking? this is a vba solution, involving reversing a concatenate, standardising the format, and looking for a match in a range of cells.

i'm knackered, but if you need a more detailed explanation I can code it up for you.

example the fucktard brothers inc. on A

fucktard brothers on B

the reverse concat will split it into the fucktard brothers inc and fucktard brothers, it'll match the 2 in the second one against 2 cells in the first and score it as a 2. You'll find all cells with a value of 2 or higher and you know you've got the right one.

 

If the problem is mostly with things like Co. or Inc., you can do the following: Ctrl+H, in the 'find what" field you can type " Co," or " Inc." and in replace field - just space. Then, make sure to hit "replace all" button. After that, use VLOOKUP as described above. It should solve the problem with most common things like Inc. but you may still need to go manually through other mistakes.

 

Almost commented with something I learned in training incorporating ASCII stuff into a lookup function, but doesn't work. Then remembered we had an almost identical problem in one of my algorithms class in ug. It was a f'ing pain in the ass problem where you write an algorithm that cell references the lexicographical ordering or something, was a while ago. Couldn't answer any specific technical questions, but essentially you'd want create a variable, let's say "i" that would be the upper bound of the algorithm i.e. number of letters you'd want to iterate through, write a FOR loop that would check / match your text strings characters length, until they match, then a build in a nested loop until n-1 string lengths. Essentially the point up to here is to match the first letters in the string (if true), and do the same for each parced out string length you've already looped through until there's nothing left to match.

One thing to keep in mind, your matching strings not arrays. A string isn't a collection of "letters" or what have you, i.e. excel reads "fuckbug" as one thing not a collection of independent letters next to each other like we (humans) do.

Ace all your PE interview questions with the WSO Private Equity Prep Pack: http://www.wallstreetoasis.com/guide/private-equity-interview-prep-questions
 

Thanks for the suggestions. I have used the left function to pull the first 14 letters from each column. Then I put the two columns with the "left" truncated names next to each other to make a field for the vlookup function. I am trying to match the name in the first row of the first truncated column to its other name in the second column. So my formula looks like this:

=VLOOKUP(F3,F3:G9016,2,FALSE)

F3 is name 1 in left column 1, the second part is the range of both columns (Should it only be the column of the other names?), 2 is the column of the other names, false for exact match ( I have tried TRUE but it doesn't change anything). I am only getting the name that is in cell F3. What is wrong with the function?

 
arguewithatreeyou could do it easily with SAS. Sort by the company name, merge the two spreadsheets, and drop the duplicate entrees

Piggybacking on this idea, why not alphabetize both lists by first letter in order to more easily compare companies?

 

In addition to the recommended workarounds here, a good place to start would also be to copy and paste both columns side by side onto a new spreadsheet, sort both columns alphabetically, and use conditional formatting-->Highlight Cell Rules-->Duplicate Values to identify what cells are identical. Once those are accounted for, unfortunately you're going to have go through the values that it didn't hit and cross reference them manually.

I'm always trying to find better ways to compare between two identical (yet not) datasets, so if there are better practices around, I'd be interested in learning about them too.

 

Cum voluptatibus earum odio quos. Perspiciatis velit expedita at ipsa qui porro laborum. Velit deleniti et a reiciendis blanditiis fuga. Nostrum odit vero est. Et qui repellendus quis aut.

Career Advancement Opportunities

June 2026 Investment Banking

  • Evercore 01 99.4%
  • Moelis & Company 01 98.8%
  • JPMorgan 01 98.2%
  • Guggenheim Partners 01 97.7%
  • Morgan Stanley 07 97.1%

Overall Employee Satisfaction

June 2026 Investment Banking

  • Moelis & Company No 99.4%
  • Morgan Stanley 01 98.8%
  • Evercore 01 98.2%
  • BMO Capital Markets 12 97.6%
  • Banco Santander 01 97.1%

Professional Growth Opportunities

June 2026 Investment Banking

  • Moelis & Company No 99.4%
  • Evercore No 98.8%
  • Morgan Stanley 05 98.2%
  • JPMorgan No 97.7%
  • BMO Capital Markets 12 97.1%

Total Avg Compensation

June 2026 Investment Banking

  • Vice President (14) $434
  • Associates (43) $259
  • 3rd+ Year Analyst (8) $210
  • 2nd Year Analyst (22) $179
  • Intern/Summer Associate (13) $156
  • 1st Year Analyst (75) $151
  • Intern/Summer Analyst (65) $101
notes
16 IB Interviews Notes

“... there’s no excuse to not take advantage of the resources out there available to you. Best value for your $ are the...”

Leaderboard

1
redever's picture
redever
99.2
2
BankonBanking's picture
BankonBanking
99.0
3
kanon's picture
kanon
99.0
4
Secyh62's picture
Secyh62
99.0
5
DrApeman's picture
DrApeman
98.9
6
Betsy Massar's picture
Betsy Massar
98.9
7
CompBanker's picture
CompBanker
98.9
8
dosk17's picture
dosk17
98.9
9
GameTheory's picture
GameTheory
98.9
10
Jamoldo's picture
Jamoldo
98.8
success
From 10 rejections to 1 dream investment banking internship

“... I believe it was the single biggest reason why I ended up with an offer...”