Help me overcome this nightmare: PDF to Excel
I have been out of IB for almost a year, but cannot shake this reoccurring nightmare: people asking me to build models from financials in PDF. It was cute at first, but has quickly grown burdensome as my industry has complex financials (e.g. dozens of line items just for revenue and 30+ year projections). Please help me keep my sanity.
What is the best software for converting PDFs to Excel? Cost is of no concern to me, but it has to be a software package that I can install on my computer.
You can write VBA code to help with the delimiter such that you might be able to copy/paste and run macro.
Essentially any packaged product you find likely isn't going to solve your problem 100%.
Thanks for the suggestion. Any idea where I can start? I can of course do some Googling myself, but without too much knowledge of VBA, I'm not sure where I'd start.
I'll do it for you for some IB referrals :)
You can use Able2Extract or PDF2XL.
Or I can do it for you and we can arrange payment in workout supplements.
I was going to say I think MrExcel.com recommended the first one. If you really did want to automate a delimiter macro you could check out some freelance programming sites (www.fiverr.com). Obviously don't send anything real, but close enough that what you get will actually be useful. Otherwise there is plenty of legacy code out there to modify, but this would require some knowledge of VBA. If i find anything more substantial I will PM you.
Have you tried Able2Extract or PDF2XL? I keep hearing good things about them.
What workout supplements are you looking for? I know a creatine guy. Just saying.
Once you get hooked on high end designer synthetic cocktails made for farm animals it is difficult to go back to stuff like creatine.
I prefer Able2Extract of the two but they both have free trials so you can decide for yourself.
I frequently use PDF2XL by CogniView for my current internship, works great! You simply indicate the row/column by moving around the margins and with a touch of button, it pops out in a excel worksheet
cogniview .com/pdf-to-excel/pdf2xl-basic
+1
I've looked into this too. The best solution I could find: an intern.
Struggling with getting PDF data to Excel? (Originally Posted: 09/30/2017)
Question for all (junior?) monkeys: is getting data from PDF files (annual reports, investor presentations, industry reports etc.) to Excel something you struggle with in your role? If so, how often?
And how do you solve it? Type by hand/ship to India/use Factset instead/hand it over to the intern (...)/existing tool etc.?
Unfortunately, they won't let me bring in an intern...But, you're definitely correct.
Answers here:
https://www.wallstreetoasis.com/forums/help-me-overcome-this-nightmare-…
Thanks. I saw this one earlier but was curious if it is commonplace at banks to have this kind of software? Are the needs that frequent?
Use tabula http://tabula.technology/ if the data is in table form. Used widely in the big data space and works like a charm.
I've tried Tabula a long time ago but actually wasn't very happy with it. It was quite slow and also cumbersome to work with for my needs.
Sounds like the OP wants to sell a new product he will develop or has developed.
/delete
Give Nuance Power PDF a try. I think there is a free trial available
I've used pdftables.com; which is pretty good for standard-formatted items. Had to use it to strip returns from 10 years of investor letters.
Acrobat can also do this and is slightly better, but you have to select each table individually.
pdftables sucks ass, I just tried it :/
Doing a vba macro is probably the best way to go.
However if that's not an option, you could also try to first convert the PDF to word, and then see if the Word raw data is easier to manipulate than the PDF - it probably will be, since word and Excel both run on VBA at least.
Another nice thing is that if you're tech-stupid, Word had a handy tool that lets you "record" a macro, which basically means you hit "record" and then do some actions and word, then hit "record" again. Any actions you performed will be recorded automatically as a VBA script, so if you're creative enough, you may be able to utilize VBA without actually understanding its mechanics. I have definitely used this trick before to bring some order to unstructured PDF text.
Try the premium adobe acrobat (believe it is called DC). you can export files in a number of formats, including excel. Has worked fairly well for me the few times I've used it
I love PDF Pro, but unfortunately, it cannot handle some scanned PDFs. Yes, you read that right....I'm getting financials that someone exported to PDF, then printed, and then scanned in again.
Bet you are loving Corp Dev right now..!
The OCR does not help with the scanned text? I find pro is 85-90% with OCR.
Can you get it in notepad instead? Convert from notepad to Excel is fairly easy. PDF is very finicky and my bank has issues with this too.
Unfortunately, I don't think this is an option, but I will try. Thanks.
I was once hired as an intern to do this... listening to an audiobook helps
There are software packages but since you say cost is not a problem for you I recommend an outsource group to India. My team uses an outsource company in India for $1,800 / month to do literally every mundane thing you can imagine. We have pre-built like 100 company and news screens with them and they just send us weekly one-off reports, they screen massive company lists for us and yes, they will turn around any PDF financial statement document withing about 24 hours. They also tend to do this stuff while you're sleeping given time zones.
It's a different type of solution but for my team it's low cost to outsource stuff that isn't super time sensitive.
Who do you use? I haven't had rates quoted at that level before, so very interested.
Can you not just google "PDF to Excel" and use all the free ones online?
My only worry is that you have no idea what they're doing with the documents, but especially if they're public financials it shouldn't matter.
I literally just used one of those free online PDF to XXX to get an industry rankings list into Excel
http://www.datawatch.com/in-action/use-cases/pdf-excel-data-extraction/
Assuming that the pdf's you're looking at aren't images, you do realize that within adobe reader if you hold down alt it will allow you to highlight any column in a straight line by selecting with your mouse (regardless of any formatting), for easy copy pasting into excel...Doing that should cut down your time by like 20x vs manually typing out numbers or trying to organize an unformatted mess
As a poster above alluded to no software solution will be 100% accurate. You're better off with either a quasi-manual approach or farming it out to some other humans
i had the same problem and bought Adobe Acrobat Reader DC...just download it from their site, requires a subscription (equates to ~$30/yr) and it has a function that allows you to convert PDFs to excel, word, powerpoint, etc. saves a ton of time for pdfs with huge tables
delete
IDEA software can covert PDF to Excel quite well. I haven't used it too much but know people who have. The software also has variety of functions that extend beyond PDF to excel conversions which may be an added bonus
FactSet doc sourcing will do this for you
There are a ton of tools online that you can use.
If you can't use those, PDF -> Word. It'll all be there in a janky format, so you need to use a VBA macro/python/etc to sanitize the data and throw it into excel. If you're familiar with python, use XLwings - it let's you plug python code right into VBA.
Abby fine reader = magic Thank me later.
Hire an intern
Rerum quas est aut minima cumque et architecto. Ut molestiae delectus vero beatae officia. Libero magnam suscipit voluptatem. Non deleniti sapiente quia ut qui consequatur.
Suscipit nemo ducimus voluptate ratione rerum. Omnis ipsum vel ut maiores deleniti sit.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...
Nemo error deleniti corporis quidem aut consequatur inventore. Eum similique ullam tenetur recusandae corrupti eius. Voluptas id minus fugit iste. Asperiores magnam aliquam laborum dolorem. Eveniet accusantium doloremque repellat quibusdam at. Qui cum repudiandae et nisi omnis est labore.
Nobis est aut velit et harum. Hic non ad fuga dignissimos quos inventore illum.
Placeat harum odit laboriosam iure. Debitis quasi laudantium tempore tempora. Vel quidem ea rerum non quisquam quis delectus. Exercitationem quis error dolorem maxime. Molestiae tempore repellat doloremque quibusdam delectus.
Quia et aut aut magnam. Facere quod voluptatem aliquid velit temporibus. Ducimus id consequatur eius. In quasi sit quisquam earum. Quia rerum illo magnam repellendus ex.
Odit facere non vitae et. Quia in cupiditate saepe. Illum dolor incidunt sed dolorem. Perspiciatis aliquam sed sit sit deserunt.
Odit nam pariatur commodi et saepe laboriosam et temporibus. Ea illum sit eius quam nisi. Reprehenderit amet iusto quia deserunt. Eaque dolores eum suscipit unde et fugiat veniam libero. Qui sunt voluptates ratione ab quae.
Mollitia natus consectetur voluptate sed consequuntur. Modi odit eos autem voluptatibus deleniti provident molestias. Exercitationem nulla perspiciatis sit. Reiciendis maiores et voluptatum fugiat necessitatibus aut. Rem eum quaerat consequuntur quia reprehenderit.
Vel ducimus ab velit vel harum aut. Nihil cum distinctio voluptas quia accusamus unde exercitationem. Omnis sit libero voluptas quaerat. Minus aperiam qui at qui praesentium dignissimos.