20240606
RAG Reranker
Rerankers and Two-Stage Retrieval | Pinecone
Q: why LLM’s long context window can not beat RAG?
A: Lost in the Middle: How Language Models Use Long Contexts (2023). When storing information in the middle of a context window, an LLM’s ability to recall that information becomes worse than had it not been provided in the first place.
LLM has recall problem of itself.
Q: Core issue of reranker?
A: To maximize retrieval recall by retrieving plenty of documents and then maximize LLM recall by minimizing the number of documents that make it to the LLM. To do that, we reorder retrieved documents and keep just the most relevant for our LLM — to do that, we use reranking.
Q: How reranker works?
A: A reranking model — also known as a cross-encoder — is a type of model that, given a query and document pair, will output a similarity score. We use this score to reorder the documents by relevance to our query. It works together with vector DB as A two-stage retrieval system.

Q: bi-encoder model (embedding) v.s. reranker
A: A bi-encoder model compresses the document or query meaning into a single vector. Note that the bi-encoder processes our query in the same way as it does documents, but at user query time.

A reranker considers query and document to produce a single similarity score over a full transformer inference step. Note that document A here is equivalent to our query.

Rerankers avoid the information loss of bi-encoders — but they come with a different penalty — time.
Three rules of JSX
Writing Markup with JSX – React
Three rules of JSX
- Return a signal root element.
<></>Fragment - Close all the tags.
- camelCase most of the things!