Complex Task in Excel
Having exhausted many other options, I have turned to the resident Excel monkeys to see if any of you could help with this project I have to do.
I have two spreadsheets, both with lists of names of companies. The first spreadsheet has about 4000 company names and the second has about 9000. The one with 9000 includes all of the company names on the 4000-name sheet plus about 5000 more.
In a third spreadsheet, I need to match all of the 4000 names with their own names from the 9000-name sheet, in two columns. The reason this is difficult is because the names are spelled differently on each spreadsheet (in many, but not all, of the cases). For example, on one sheet, the company could be called "____ Brothers Inc." while the same company on the other sheet could be listed as "_____ Brothers". This is why I can't (?) use one of Excel's functions to match and list the companies on the third spreadsheet.
So any ideas on how to progress with this? I will probably need to use some VBA program but I don't know where to start. I would greatly appreciate any advice.
Also interested
try using vlookup true for approximate matches, it might work
9000 list is in workbook A, cells A1:A9000 4000 list is in workbook B, cells A1:A4000
In workbook A, copy paste the 9000 list from column A to column B so that the same thing is repeated in adjoining columns. This is your array
Select cells B1:4000 in workbook B and type in =vlookup(A1,array,2,true). it should display the company name from your 9000 list even if it's slightly different
hopefully I explained ok
i wasn't aware that you could do approximate matches for text strings. is it reliable?
No, not always, but it's a good starting point. I would recommend quickly scanning afterwards to see if it did anything outrageous just in case.
Isn't the range lookup value here FALSE (for approximate) matches? If you use the FALSE value I'm pretty sure you can use an asterik wild card. But otherwise, I'm pretty sure Naroobru is right. It should work.
If you use a VBA code, you'll have to use string matching or percent string matching (I think). If you have more than one company with Brothers in the name...ugh.
build a left function into a vlookup function and that should get you a lot of the names
dude....
i did a TON of projects like these when i was just starting out. a vlookup will probably help with some matches, but unless you can develop an extremely complex algorithm to match the two lists, there is going to be a manual aspect to this, no way around it.
there are some methods that can make your life a little easier though. let's call your list of 4000 "List A", and your list of 9000 "List B". For both List A and List B, insert a column before the list, and using the "left" function, pull in the first 14 letters of each company name. If your actual list starts in cell B1, your formula should look something like this (copied down for the entire list):
=+LEFT(B1, 14)
Now, for both lists, replace the number "14" with a cell reference, so you can easily change the number (like put the number 14 in Cell Z1, and reference that cell... be sure to anchor down to cell Z1 so that it looks like $Z$1).
Next, do a vlookup off the truncated names in List B to the truncated names in List A... pulling in the ORIGINAL name from List A. Filter the vlookup column to anything that isn't returning an error value, and eyeball the list to see if any of the matches don't make sense (if they don't, then delete the formula in the cell).
Once this is done, replace that "14" number (in this case it was in Cell Z1) with 13, and repeat the process. Keep doing it until you get to 6 or 7... this should get you through most of the list. The rest you will have to do manually.
The general idea of the above method is to match the first "X" letters of each list to see if you get a match. You need to be careful as the number of letters get lower, as the matches become less reliable. For example, if you were matching the first 3 letters of both lists, then any company that started with "The ..." would show up as a match.
Hope this helps.
great idea
still looking? this is a vba solution, involving reversing a concatenate, standardising the format, and looking for a match in a range of cells.
i'm knackered, but if you need a more detailed explanation I can code it up for you.
example the fucktard brothers inc. on A
fucktard brothers on B
the reverse concat will split it into the fucktard brothers inc and fucktard brothers, it'll match the 2 in the second one against 2 cells in the first and score it as a 2. You'll find all cells with a value of 2 or higher and you know you've got the right one.
If the problem is mostly with things like Co. or Inc., you can do the following: Ctrl+H, in the 'find what" field you can type " Co," or " Inc." and in replace field - just space. Then, make sure to hit "replace all" button. After that, use VLOOKUP as described above. It should solve the problem with most common things like Inc. but you may still need to go manually through other mistakes.
Almost commented with something I learned in training incorporating ASCII stuff into a lookup function, but doesn't work. Then remembered we had an almost identical problem in one of my algorithms class in ug. It was a f'ing pain in the ass problem where you write an algorithm that cell references the lexicographical ordering or something, was a while ago. Couldn't answer any specific technical questions, but essentially you'd want create a variable, let's say "i" that would be the upper bound of the algorithm i.e. number of letters you'd want to iterate through, write a FOR loop that would check / match your text strings characters length, until they match, then a build in a nested loop until n-1 string lengths. Essentially the point up to here is to match the first letters in the string (if true), and do the same for each parced out string length you've already looped through until there's nothing left to match.
One thing to keep in mind, your matching strings not arrays. A string isn't a collection of "letters" or what have you, i.e. excel reads "fuckbug" as one thing not a collection of independent letters next to each other like we (humans) do.
Thanks for the suggestions. I have used the left function to pull the first 14 letters from each column. Then I put the two columns with the "left" truncated names next to each other to make a field for the vlookup function. I am trying to match the name in the first row of the first truncated column to its other name in the second column. So my formula looks like this:
=VLOOKUP(F3,F3:G9016,2,FALSE)
F3 is name 1 in left column 1, the second part is the range of both columns (Should it only be the column of the other names?), 2 is the column of the other names, false for exact match ( I have tried TRUE but it doesn't change anything). I am only getting the name that is in cell F3. What is wrong with the function?
you could do it easily with SAS. Sort by the company name, merge the two spreadsheets, and drop the duplicate entrees
Piggybacking on this idea, why not alphabetize both lists by first letter in order to more easily compare companies?
Yeah but the names don't necessarily match up, one list has all the names from the other list plus a lot more.
In addition to the recommended workarounds here, a good place to start would also be to copy and paste both columns side by side onto a new spreadsheet, sort both columns alphabetically, and use conditional formatting-->Highlight Cell Rules-->Duplicate Values to identify what cells are identical. Once those are accounted for, unfortunately you're going to have go through the values that it didn't hit and cross reference them manually.
I'm always trying to find better ways to compare between two identical (yet not) datasets, so if there are better practices around, I'd be interested in learning about them too.
Cum voluptatibus earum odio quos. Perspiciatis velit expedita at ipsa qui porro laborum. Velit deleniti et a reiciendis blanditiis fuga. Nostrum odit vero est. Et qui repellendus quis aut.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...