Help me overcome this nightmare: PDF to Excel
I have been out of IB for almost a year, but cannot shake this reoccurring nightmare: people asking me to build models from financials in PDF. It was cute at first, but has quickly grown burdensome as my industry has complex financials (e.g. dozens of line items just for revenue and 30+ year projections). Please help me keep my sanity.
What is the best software for converting PDFs to Excel? Cost is of no concern to me, but it has to be a software package that I can install on my computer.
You can write VBA code to help with the delimiter such that you might be able to copy/paste and run macro.
Essentially any packaged product you find likely isn't going to solve your problem 100%.
Thanks for the suggestion. Any idea where I can start? I can of course do some Googling myself, but without too much knowledge of VBA, I'm not sure where I'd start.
I'll do it for you for some IB referrals :)
You can use Able2Extract or PDF2XL.
Or I can do it for you and we can arrange payment in workout supplements.
I was going to say I think MrExcel.com recommended the first one. If you really did want to automate a delimiter macro you could check out some freelance programming sites (www.fiverr.com). Obviously don't send anything real, but close enough that what you get will actually be useful. Otherwise there is plenty of legacy code out there to modify, but this would require some knowledge of VBA. If i find anything more substantial I will PM you.
Have you tried Able2Extract or PDF2XL? I keep hearing good things about them.
What workout supplements are you looking for? I know a creatine guy. Just saying.
Once you get hooked on high end designer synthetic cocktails made for farm animals it is difficult to go back to stuff like creatine.
I prefer Able2Extract of the two but they both have free trials so you can decide for yourself.
I frequently use PDF2XL by CogniView for my current internship, works great! You simply indicate the row/column by moving around the margins and with a touch of button, it pops out in a excel worksheet
cogniview .com/pdf-to-excel/pdf2xl-basic
+1
I've looked into this too. The best solution I could find: an intern.
Struggling with getting PDF data to Excel? (Originally Posted: 09/30/2017)
Question for all (junior?) monkeys: is getting data from PDF files (annual reports, investor presentations, industry reports etc.) to Excel something you struggle with in your role? If so, how often?
And how do you solve it? Type by hand/ship to India/use Factset instead/hand it over to the intern (...)/existing tool etc.?
Unfortunately, they won't let me bring in an intern...But, you're definitely correct.
Answers here:
https://www.wallstreetoasis.com/forums/help-me-overcome-this-nightmare-…
Thanks. I saw this one earlier but was curious if it is commonplace at banks to have this kind of software? Are the needs that frequent?
Use tabula http://tabula.technology/ if the data is in table form. Used widely in the big data space and works like a charm.
I've tried Tabula a long time ago but actually wasn't very happy with it. It was quite slow and also cumbersome to work with for my needs.
Sounds like the OP wants to sell a new product he will develop or has developed.
/delete
Give Nuance Power PDF a try. I think there is a free trial available
I've used pdftables.com; which is pretty good for standard-formatted items. Had to use it to strip returns from 10 years of investor letters.
Acrobat can also do this and is slightly better, but you have to select each table individually.
pdftables sucks ass, I just tried it :/
Doing a vba macro is probably the best way to go.
However if that's not an option, you could also try to first convert the PDF to word, and then see if the Word raw data is easier to manipulate than the PDF - it probably will be, since word and Excel both run on VBA at least.
Another nice thing is that if you're tech-stupid, Word had a handy tool that lets you "record" a macro, which basically means you hit "record" and then do some actions and word, then hit "record" again. Any actions you performed will be recorded automatically as a VBA script, so if you're creative enough, you may be able to utilize VBA without actually understanding its mechanics. I have definitely used this trick before to bring some order to unstructured PDF text.
Try the premium adobe acrobat (believe it is called DC). you can export files in a number of formats, including excel. Has worked fairly well for me the few times I've used it
I love PDF Pro, but unfortunately, it cannot handle some scanned PDFs. Yes, you read that right....I'm getting financials that someone exported to PDF, then printed, and then scanned in again.
Bet you are loving Corp Dev right now..!
The OCR does not help with the scanned text? I find pro is 85-90% with OCR.
Can you get it in notepad instead? Convert from notepad to Excel is fairly easy. PDF is very finicky and my bank has issues with this too.
Unfortunately, I don't think this is an option, but I will try. Thanks.
I was once hired as an intern to do this... listening to an audiobook helps
There are software packages but since you say cost is not a problem for you I recommend an outsource group to India. My team uses an outsource company in India for $1,800 / month to do literally every mundane thing you can imagine. We have pre-built like 100 company and news screens with them and they just send us weekly one-off reports, they screen massive company lists for us and yes, they will turn around any PDF financial statement document withing about 24 hours. They also tend to do this stuff while you're sleeping given time zones.
It's a different type of solution but for my team it's low cost to outsource stuff that isn't super time sensitive.
Who do you use? I haven't had rates quoted at that level before, so very interested.
Can you not just google "PDF to Excel" and use all the free ones online?
My only worry is that you have no idea what they're doing with the documents, but especially if they're public financials it shouldn't matter.
I literally just used one of those free online PDF to XXX to get an industry rankings list into Excel
http://www.datawatch.com/in-action/use-cases/pdf-excel-data-extraction/
Assuming that the pdf's you're looking at aren't images, you do realize that within adobe reader if you hold down alt it will allow you to highlight any column in a straight line by selecting with your mouse (regardless of any formatting), for easy copy pasting into excel...Doing that should cut down your time by like 20x vs manually typing out numbers or trying to organize an unformatted mess
As a poster above alluded to no software solution will be 100% accurate. You're better off with either a quasi-manual approach or farming it out to some other humans
i had the same problem and bought Adobe Acrobat Reader DC...just download it from their site, requires a subscription (equates to ~$30/yr) and it has a function that allows you to convert PDFs to excel, word, powerpoint, etc. saves a ton of time for pdfs with huge tables
delete
IDEA software can covert PDF to Excel quite well. I haven't used it too much but know people who have. The software also has variety of functions that extend beyond PDF to excel conversions which may be an added bonus
FactSet doc sourcing will do this for you
There are a ton of tools online that you can use.
If you can't use those, PDF -> Word. It'll all be there in a janky format, so you need to use a VBA macro/python/etc to sanitize the data and throw it into excel. If you're familiar with python, use XLwings - it let's you plug python code right into VBA.
Abby fine reader = magic Thank me later.
Hire an intern
Commodi repellat in nisi vel voluptatibus quis. Praesentium reprehenderit quis tempora. Veritatis molestiae vitae pariatur alias rerum consequatur. Ullam quasi velit ut.
Molestiae dolores reprehenderit ut. Nulla voluptate et eius vero dolores quas aut. Aut nulla ut voluptas cumque amet. Dolorem qui iure doloribus est accusamus id. Quaerat voluptatem ea sunt. Eius aspernatur mollitia sint facilis sit. Ducimus sit magni amet sint eius omnis quis.
Aliquid fuga voluptatum soluta quia doloremque amet reprehenderit. Autem id illum optio delectus laborum eius ducimus.
Neque deleniti dignissimos rerum maiores et veniam laboriosam reiciendis. Consequuntur omnis quia id saepe. Nisi iste nihil id ut et. Illo velit voluptas esse optio non accusamus distinctio rerum.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...
Rem tenetur veritatis enim ut quae saepe adipisci facilis. Quaerat iste laudantium quos et sed quae. Aperiam id repellendus veniam nihil. Dolores error ea sit animi inventore aliquid. Est et sed enim ullam ipsum eum provident.
Facere harum doloribus consequatur hic quae velit vel omnis. Non totam neque assumenda voluptas voluptatum aut nihil unde. Porro qui sed dolores occaecati cupiditate eum aut. Rerum explicabo sint aperiam dolorum veritatis voluptas. Ullam officiis porro doloremque. Possimus magnam ex dolorem aperiam nemo.
Cum porro hic repellendus vel ipsam quo culpa. Optio nam consequatur et facilis quibusdam. Et nisi soluta necessitatibus accusamus. Ea consequatur modi dolorem suscipit eum.
Totam et voluptatem iste repellat et odit deleniti unde. Nulla quo perferendis vitae. Perferendis ut quo id dolores occaecati doloremque mollitia recusandae. Eligendi soluta eaque qui nihil.
Ea nisi quam commodi consequatur minima voluptatem laudantium. Veritatis rerum molestiae a et inventore nihil.
Explicabo voluptas est minus voluptatibus. Impedit optio temporibus sunt nihil iure. Repellendus enim natus dolor ipsum dolor enim voluptate.