A new tool for analytical workloads on 10K+ pages of PDF documents within minutes
Hi IB friends, I'm working on a tool that allows fast and accurate insights inextraction from long, complex documents. This tool is based on a new paradigm: using SQL to drive document analysis. It would be great to get a quick thumbs up / down on how helpful this can be.
How it works if you are non-technical
It's like using ChatGPT to interrogate each file individually. The results are laid out in an Excel sheet. Then, you can use ChatGPT to summarize that Excel further.
How it works if you are technical
- there is a user interface that allows you to do all of these below 👇
- define the insights to extract from documents in a data schema
- then we have a specialized SQL query to apply the data schema to many documents. it's like
SELECT agent(<schema_id>, document_column) FROM document_table;
where document_column is an actual PDF file column.
- Then, LLMs will work behind the SQL engine to extract insights from the documents and collect them into SQL tabular results, based on the data schema.
Difference between us v.s vector-search-based solution
In this process, we did not use vector search because we found that the vector search-based solution is hit or miss due to its similarity-based nature and the information loss during the data transformation steps.
Instead, we prefer to deal with raw documents in .pdf format (of course, you can control which documents to go through via SQL filter)
Advantages over chatbots
Compared to the chatbot-based solution this paradigm is
- More interpretable: This means the insights derivation steps come from SQL steps so that the intermediate SQL result is explainable. Because of this, you can trust the end output you get
- More flexible: you can control what document to pass in via SQL. and you can further get consolidated insights via aggregation + LLM
- More scalable: basically, you can get insights from 10K+ pages within minutes
I would love any high-level input, but if you think this might be useful to your daily job and want to give it a spin, let me know, and we'll set it up with no charge for experimentation workloads.
Can it extract any insight from this long advertisement I didn’t read
You bet. It can extract insights from this post :)
Aut atque non aliquid incidunt ut ratione veniam. Eveniet repellat id est nihil facere officia voluptatum. Et aliquam optio qui voluptatem expedita. Nemo molestiae cumque perspiciatis amet consequatur. Amet quibusdam nisi voluptatum consequatur quos a.
Enim id esse ut provident. Nesciunt ratione aperiam aut non odio aut adipisci. Et quas quis voluptatum tenetur tempora molestiae. Quo impedit aut aut dolores nemo. Ut ex officia tempore rerum. Provident ea dolore aut. Voluptatem qui sed ratione officiis.
Ut nihil molestiae ad quia doloribus asperiores quo. Animi unde quia veniam minus culpa.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...