Help me overcome this nightmare: PDF to Excel
I have been out of IB for almost a year, but cannot shake this reoccurring nightmare: people asking me to build models from financials in PDF. It was cute at first, but has quickly grown burdensome as my industry has complex financials (e.g. dozens of line items just for revenue and 30+ year projections). Please help me keep my sanity.
What is the best software for converting PDFs to Excel? Cost is of no concern to me, but it has to be a software package that I can install on my computer.
You can write VBA code to help with the delimiter such that you might be able to copy/paste and run macro.
Essentially any packaged product you find likely isn't going to solve your problem 100%.
Thanks for the suggestion. Any idea where I can start? I can of course do some Googling myself, but without too much knowledge of VBA, I'm not sure where I'd start.
I'll do it for you for some IB referrals :)
You can use Able2Extract or PDF2XL.
Or I can do it for you and we can arrange payment in workout supplements.
I was going to say I think MrExcel.com recommended the first one. If you really did want to automate a delimiter macro you could check out some freelance programming sites (www.fiverr.com). Obviously don't send anything real, but close enough that what you get will actually be useful. Otherwise there is plenty of legacy code out there to modify, but this would require some knowledge of VBA. If i find anything more substantial I will PM you.
Have you tried Able2Extract or PDF2XL? I keep hearing good things about them.
What workout supplements are you looking for? I know a creatine guy. Just saying.
Once you get hooked on high end designer synthetic cocktails made for farm animals it is difficult to go back to stuff like creatine.
I prefer Able2Extract of the two but they both have free trials so you can decide for yourself.
I frequently use PDF2XL by CogniView for my current internship, works great! You simply indicate the row/column by moving around the margins and with a touch of button, it pops out in a excel worksheet
cogniview .com/pdf-to-excel/pdf2xl-basic
+1
I've looked into this too. The best solution I could find: an intern.
Struggling with getting PDF data to Excel? (Originally Posted: 09/30/2017)
Question for all (junior?) monkeys: is getting data from PDF files (annual reports, investor presentations, industry reports etc.) to Excel something you struggle with in your role? If so, how often?
And how do you solve it? Type by hand/ship to India/use Factset instead/hand it over to the intern (...)/existing tool etc.?
Unfortunately, they won't let me bring in an intern...But, you're definitely correct.
Answers here:
https://www.wallstreetoasis.com/forums/help-me-overcome-this-nightmare-…
Thanks. I saw this one earlier but was curious if it is commonplace at banks to have this kind of software? Are the needs that frequent?
Use tabula http://tabula.technology/ if the data is in table form. Used widely in the big data space and works like a charm.
I've tried Tabula a long time ago but actually wasn't very happy with it. It was quite slow and also cumbersome to work with for my needs.
Sounds like the OP wants to sell a new product he will develop or has developed.
/delete
Give Nuance Power PDF a try. I think there is a free trial available
I've used pdftables.com; which is pretty good for standard-formatted items. Had to use it to strip returns from 10 years of investor letters.
Acrobat can also do this and is slightly better, but you have to select each table individually.
pdftables sucks ass, I just tried it :/
Doing a vba macro is probably the best way to go.
However if that's not an option, you could also try to first convert the PDF to word, and then see if the Word raw data is easier to manipulate than the PDF - it probably will be, since word and Excel both run on VBA at least.
Another nice thing is that if you're tech-stupid, Word had a handy tool that lets you "record" a macro, which basically means you hit "record" and then do some actions and word, then hit "record" again. Any actions you performed will be recorded automatically as a VBA script, so if you're creative enough, you may be able to utilize VBA without actually understanding its mechanics. I have definitely used this trick before to bring some order to unstructured PDF text.
Try the premium adobe acrobat (believe it is called DC). you can export files in a number of formats, including excel. Has worked fairly well for me the few times I've used it
I love PDF Pro, but unfortunately, it cannot handle some scanned PDFs. Yes, you read that right....I'm getting financials that someone exported to PDF, then printed, and then scanned in again.
Bet you are loving Corp Dev right now..!
The OCR does not help with the scanned text? I find pro is 85-90% with OCR.
Can you get it in notepad instead? Convert from notepad to Excel is fairly easy. PDF is very finicky and my bank has issues with this too.
Unfortunately, I don't think this is an option, but I will try. Thanks.
I was once hired as an intern to do this... listening to an audiobook helps
There are software packages but since you say cost is not a problem for you I recommend an outsource group to India. My team uses an outsource company in India for $1,800 / month to do literally every mundane thing you can imagine. We have pre-built like 100 company and news screens with them and they just send us weekly one-off reports, they screen massive company lists for us and yes, they will turn around any PDF financial statement document withing about 24 hours. They also tend to do this stuff while you're sleeping given time zones.
It's a different type of solution but for my team it's low cost to outsource stuff that isn't super time sensitive.
Who do you use? I haven't had rates quoted at that level before, so very interested.
Can you not just google "PDF to Excel" and use all the free ones online?
My only worry is that you have no idea what they're doing with the documents, but especially if they're public financials it shouldn't matter.
I literally just used one of those free online PDF to XXX to get an industry rankings list into Excel
http://www.datawatch.com/in-action/use-cases/pdf-excel-data-extraction/
Assuming that the pdf's you're looking at aren't images, you do realize that within adobe reader if you hold down alt it will allow you to highlight any column in a straight line by selecting with your mouse (regardless of any formatting), for easy copy pasting into excel...Doing that should cut down your time by like 20x vs manually typing out numbers or trying to organize an unformatted mess
As a poster above alluded to no software solution will be 100% accurate. You're better off with either a quasi-manual approach or farming it out to some other humans
i had the same problem and bought Adobe Acrobat Reader DC...just download it from their site, requires a subscription (equates to ~$30/yr) and it has a function that allows you to convert PDFs to excel, word, powerpoint, etc. saves a ton of time for pdfs with huge tables
delete
IDEA software can covert PDF to Excel quite well. I haven't used it too much but know people who have. The software also has variety of functions that extend beyond PDF to excel conversions which may be an added bonus
FactSet doc sourcing will do this for you
There are a ton of tools online that you can use.
If you can't use those, PDF -> Word. It'll all be there in a janky format, so you need to use a VBA macro/python/etc to sanitize the data and throw it into excel. If you're familiar with python, use XLwings - it let's you plug python code right into VBA.
Abby fine reader = magic Thank me later.
Hire an intern
Quas error quidem atque repellendus architecto mollitia nobis. Et nesciunt quis non nostrum. Aut voluptatum ut vel et iusto. Id sunt id qui voluptates temporibus. Ratione qui facere cupiditate molestias harum. Labore cupiditate veniam nisi natus sapiente enim velit.
A suscipit quia voluptatem asperiores ipsa aperiam corporis. Itaque velit temporibus velit rerum. Quia rerum perferendis sit cum autem ut accusantium. Repellendus magni odio minima doloribus dolores. Dolores molestiae accusantium optio voluptas ea alias temporibus in. Delectus et aut aut qui iusto.
Aut provident tempore aut. Beatae nesciunt et pariatur qui deserunt. Neque sit tempore optio sunt. Repudiandae aut unde deserunt commodi ad rerum. Rerum ut quia nemo delectus.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...
Placeat est laudantium officiis harum. Illum voluptatem qui repudiandae recusandae ut nisi. Eum id quis eaque.
Temporibus autem et eum reprehenderit. Voluptatum cumque deleniti perferendis.
Perferendis sed error voluptatem alias nihil sit qui et. Repellendus assumenda itaque omnis aliquam sunt. Et libero corporis autem commodi. Quis sed nesciunt exercitationem veniam ut aspernatur.
Esse at molestiae ex odit odio. Animi ipsa beatae porro voluptas.
Aspernatur qui incidunt exercitationem ipsum ut sed facilis. Et dolor ratione non illo. Omnis odio aut tenetur voluptates non quam in. Dolores occaecati id labore voluptatem sed iusto. Ipsa molestiae odit iure nemo quo eius.
Aliquid eius dolores corporis sunt quis aut. Voluptatem ex debitis harum rem.