Semantic Search Experiment
This tool uses OpenAI's GPT-3 text-embedding-ada model to transform a search term into a 1,536-element vector. That data is then compared with chunks of data from documents stored in a PostgreSQL table using the pg-vector extension. The documents are pre-parsed into roughly 1000-word chunks, analyzed by OpenAI, and then stored in the table. When a search is made against the data, the back-end makes a remote procedure call to a stored function in the PG database. The function returns 3 matches in order of its cosine similarity. Those 3 chunks are concatenated and then another API call is made to ChatGPT to digest the information and generate an answer to the question based on the provided data.
Suggested Queries:
- What is the Tech Model Railroad Club?
- What is the author's sentiment about IBM?
- Who was Jack Dennis?
PDF Document: True Hackers
Upon successful search match, the PDF viewer automatically takes you to the first relevant page.
Page 1 of
Conclusion
Thanks to the semantic similarity search, I am able to feed the most relevant passages from the body of text to ChatGPT. ChatGPT has the ability to not just find the meaning of the question, but also summize the answer based on very limited information.
This technology is going to utterly change the world as we know it. Our individual access to information and knowledge will be limitless.