We built an open source library that traverses an existing vector database in a graph-like manner to answer complex questions. We saw a 1.7x increase in “perfect” retrievals over vector search, as well as a 5x decrease in catastrophic failures on the hotpot_qa dataset.
The architecture stands on Datastax’s Astra DB for easy vector management and Pongo’s semantic filter for pin-point retrieval performance.
RAG is a powerful tool, but if you ask a complex question like What is the CEO of Pongo's favorite color?
, many systems will fall flat due to the fact that the query is incomplete. If the target document says Caleb's favorite color is orange
, then naive RAG will fail to retrieve it, even though Caleb is the CEO of Pongo. Things get even worse if you have multiple documents stating people’s favorite colors, which may lead you to returning a confidently incorrect answer about someone else’s Favorite color.
There are multiple solutions to this problem, such as setting up and maintaining a graph database, much like Microsoft’s GraphRAG did recently. We propose a simpler approach, which utilizes your existing vector database with the following recursive approach: