BM25
BM25, also known as Okapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.
You can use it as part of your retrieval pipeline as a to rerank documents as a postprocessing step after retrieving an initial set of documents from another source.
Setup
The BM25Retriever
is exported from @langchain/community
. You’ll need
to install it like this:
- npm
- yarn
- pnpm
npm i @langchain/community @langchain/core
yarn add @langchain/community @langchain/core
pnpm add @langchain/community @langchain/core
This retriever uses code from
this implementation
of
Okapi BM25.
Usage
You can now create a new retriever with previously retrieved documents:
import { BM25Retriever } from "@langchain/community/retrievers/bm25";
const retriever = BM25Retriever.fromDocuments(
[
{ pageContent: "Buildings are made out of brick", metadata: {} },
{ pageContent: "Buildings are made out of wood", metadata: {} },
{ pageContent: "Buildings are made out of stone", metadata: {} },
{ pageContent: "Cars are made out of metal", metadata: {} },
{ pageContent: "Cars are made out of plastic", metadata: {} },
{ pageContent: "mitochondria is the powerhouse of the cell", metadata: {} },
{ pageContent: "mitochondria is made of lipids", metadata: {} },
],
{ k: 4 }
);
// Will return the 4 documents reranked by the BM25 algorithm
await retriever.invoke("mitochondria");
[
{ pageContent: 'mitochondria is made of lipids', metadata: {} },
{
pageContent: 'mitochondria is the powerhouse of the cell',
metadata: {}
},
{ pageContent: 'Buildings are made out of brick', metadata: {} },
{ pageContent: 'Buildings are made out of wood', metadata: {} }
]
Related
- Retriever conceptual guide
- Retriever how-to guides