collections

20240606

RAG Reranker

Rerankers and Two-Stage Retrieval | Pinecone

Q: why LLM’s long context window can not beat RAG?

A: Lost in the Middle: How Language Models Use Long Contexts (2023). When storing information in the middle of a context window, an LLM’s ability to recall that information becomes worse than had it not been provided in the first place. LLM has recall problem of itself.

Q: Core issue of reranker?

A: To maximize retrieval recall by retrieving plenty of documents and then maximize LLM recall by minimizing the number of documents that make it to the LLM. To do that, we reorder retrieved documents and keep just the most relevant for our LLM — to do that, we use reranking.

Q: How reranker works?

A: A reranking model — also known as a cross-encoder — is a type of model that, given a query and document pair, will output a similarity score. We use this score to reorder the documents by relevance to our query. It works together with vector DB as A two-stage retrieval system.

Q: bi-encoder model (embedding) v.s. reranker

A: A bi-encoder model compresses the document or query meaning into a single vector. Note that the bi-encoder processes our query in the same way as it does documents, but at user query time.

A reranker considers query and document to produce a single similarity score over a full transformer inference step. Note that document A here is equivalent to our query.

Rerankers avoid the information loss of bi-encoders — but they come with a different penalty — time.

Three rules of JSX

React

Writing Markup with JSX – React

Three rules of JSX

Return a signal root element. <></> Fragment
Close all the tags.
camelCase most of the things!

🪴 无人之路

Explorer

collections

20240606

RAG Reranker

Q: why LLM’s long context window can not beat RAG?

Q: Core issue of reranker?

Q: How reranker works?

Q: bi-encoder model (embedding) v.s. reranker

Three rules of JSX

Three rules of JSX

Graph View

Table of Contents

Backlinks