A new documents analytics tool designed for PE teams
Hi PE friends, I'm building a tool that enables data teams to extract insights from long, complex documents. This tool is based on a new paradigm: using SQL to drive document analysis. It would be great to get a quick thumbs up / down on how helpful this can be.
How it works if you are non-technical
It's like using ChatGPT to interrogate each file individually. The results are laid out in an Excel sheet. Then, you can use ChatGPT to summarize that Excel further.
How it works if you are technical
- there is a user interface that allows you to do all of these below 👇
- define the insights to extract from documents in a data schema
- use the specialized SQL query to apply the data schema to many documents. it's like
SELECT agent(<schema_id>, document_column) FROM document_table;
where document_column is an actual PDF file column.
4. LLMs will work behind the SQL engine to extract insights from the documents and collect them into SQL tabular results, based on the data schema.
Difference between us v.s vector search-based solution
In this process, we did not use vector search because we found that the vector search-based solution is hit or miss due to its similarity-based nature and the information loss during the data transformation steps.
Instead, we prefer to deal with raw documents in .pdf format (of course, you can control which documents to go through via SQL filter)
Advantages over chatbots
Compared to the chatbot-based solution this paradigm is
- More interpretable: This means the insights derivation steps come from SQL steps so that the intermediate SQL result is explainable. Because of this, you can trust the end output you get
- More flexible: you can control what document to pass in via SQL. and you can further get consolidated insights via aggregation + LLM
- More scalable: basically, you can get insights from 10K+ pages within minutes
I would love any high-level input, but if you think this might be useful to your daily job and want to give it a spin, let me know, and we'll set it up with no charge for experimentation workloads.
Cumque eius non cumque sed ea. Sit quidem ipsam officia tempora. Qui accusamus et modi voluptates aut non. Sit rerum a eum et libero est. Et temporibus omnis nam velit nam sapiente atque. Dolorem exercitationem ut unde repellendus placeat veniam nam.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...