Superior Retrieval-Augmented Technology: From Principle to LlamaIndex Implementation | by Leonie Monigatti

Machine Learning

Superior Retrieval-Augmented Technology: From Principle to LlamaIndex Implementation | by Leonie Monigatti | Feb, 2024

hhhhm

2024年2月20日

Superior Retrieval-Augmented Technology: From Principle to LlamaIndex Implementation | by Leonie Monigatti | Feb, 2024

[ad_1]

For added concepts on tips on how to enhance the efficiency of your RAG pipeline to make it production-ready, proceed studying right here:

This part discusses the required packages and API keys to comply with alongside on this article.

Required Packages

This text will information you thru implementing a naive and a complicated RAG pipeline utilizing LlamaIndex in Python.

pip set up llama-index

On this article, we can be utilizing LlamaIndex v0.10. If you’re upgrading from an older LlamaIndex model, it’s worthwhile to run the next instructions to put in and run LlamaIndex correctly:

pip uninstall llama-index
pip set up llama-index --upgrade --no-cache-dir --force-reinstall

LlamaIndex gives an choice to retailer vector embeddings domestically in JSON information for persistent storage, which is nice for shortly prototyping an concept. Nonetheless, we are going to use a vector database for persistent storage since superior RAG strategies goal for production-ready purposes.

Since we are going to want metadata storage and hybrid search capabilities along with storing the vector embeddings, we are going to use the open supply vector database Weaviate (v3.26.2), which helps these options.

pip set up weaviate-client llama-index-vector-stores-weaviate

API Keys

We can be utilizing Weaviate embedded, which you need to use free of charge with out registering for an API key. Nonetheless, this tutorial makes use of an embedding mannequin and LLM from OpenAI, for which you will want an OpenAI API key. To acquire one, you want an OpenAI account after which “Create new secret key” below API keys.

Subsequent, create an area .env file in your root listing and outline your API keys in it:

OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Afterwards, you may load your API keys with the next code:

# !pip set up python-dotenv
import os
from dotenv import load_dotenv,find_dotenvload_dotenv(find_dotenv())

This part discusses tips on how to implement a naive RAG pipeline utilizing LlamaIndex. You’ll find your entire naive RAG pipeline on this Jupyter Pocket book. For the implementation utilizing LangChain, you may proceed in this text (naive RAG pipeline utilizing LangChain).

Step 1: Outline the embedding mannequin and LLM

First, you may outline an embedding mannequin and LLM in a world settings object. Doing this implies you don’t should specify the fashions explicitly within the code once more.

Embedding mannequin: used to generate vector embeddings for the doc chunks and the question.
LLM: used to generate a solution primarily based on the consumer question and the related context.

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import SettingsSettings.llm = OpenAI(mannequin="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding()

Step 2: Load knowledge

Subsequent, you’ll create an area listing named knowledge in your root listing and obtain some instance knowledge from the LlamaIndex GitHub repository (MIT license).

!mkdir -p 'knowledge'
!wget '<https://uncooked.githubusercontent.com/run-llama/llama_index/fundamental/docs/examples/knowledge/paul_graham/paul_graham_essay.txt>' -O 'knowledge/paul_graham_essay.txt'

Afterward, you may load the info for additional processing:

from llama_index.core import SimpleDirectoryReader# Load knowledge
paperwork = SimpleDirectoryReader(
input_files=["./data/paul_graham_essay.txt"]
).load_data()

Step 3: Chunk paperwork into nodes

As your entire doc is just too massive to suit into the context window of the LLM, you will want to partition it into smaller textual content chunks, that are known as Nodes in LlamaIndex. You possibly can parse the loaded paperwork into nodes utilizing the SimpleNodeParser with an outlined chunk measurement of 1024.

from llama_index.core.node_parser import SimpleNodeParsernode_parser = SimpleNodeParser.from_defaults(chunk_size=1024)
# Extract nodes from paperwork
nodes = node_parser.get_nodes_from_documents(paperwork)

Step 4: Construct index

Subsequent, you’ll construct the index that shops all of the exterior information in Weaviate, an open supply vector database.

First, you will want to connect with a Weaviate occasion. On this case, we’re utilizing Weaviate Embedded, which lets you experiment in Notebooks free of charge with out an API key. For a production-ready resolution, deploying Weaviate your self, e.g., by way of Docker or using a managed service, is really useful.

import weaviate# Hook up with your Weaviate occasion
consumer = weaviate.Shopper(
embedded_options=weaviate.embedded.EmbeddedOptions(), 
)

Subsequent, you’ll construct a VectorStoreIndex from the Weaviate consumer to retailer your knowledge in and work together with.

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.weaviate import WeaviateVectorStoreindex_name = "MyExternalContext"
# Assemble vector retailer
vector_store = WeaviateVectorStore(
weaviate_client = consumer, 
index_name = index_name
)
# Arrange the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Setup the index
# construct VectorStoreIndex that takes care of chunking paperwork
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex(
nodes,
storage_context = storage_context,
)

Step 5: Setup question engine

Lastly, you’ll arrange the index because the question engine.

# The QueryEngine class is provided with the generator
# and facilitates the retrieval and era steps
query_engine = index.as_query_engine()

Step 6: Run a naive RAG question in your knowledge

Now, you may run a naive RAG question in your knowledge, as proven beneath:

# Run your naive RAG question
response = query_engine.question(
"What occurred at Interleaf?"
)

On this part, we are going to cowl some easy changes you may make to show the above naive RAG pipeline into a complicated one. This walkthrough will cowl the next collection of superior RAG strategies:

As we are going to solely cowl the modifications right here, you will discover the full end-to-end superior RAG pipeline on this Jupyter Pocket book.

For the sentence window retrieval approach, it’s worthwhile to make two changes: First, you could regulate the way you retailer and post-process your knowledge. As an alternative of the SimpleNodeParser, we are going to use the SentenceWindowNodeParser.

from llama_index.core.node_parser import SentenceWindowNodeParser# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)

The SentenceWindowNodeParser does two issues:

It separates the doc into single sentences, which can be embedded.
For every sentence, it creates a context window. In case you specify a window_size = 3, the ensuing window can be three sentences lengthy, beginning on the earlier sentence of the embedded sentence and spanning the sentence after. The window can be saved as metadata.

Throughout retrieval, the sentence that almost all carefully matches the question is returned. After retrieval, it’s worthwhile to exchange the sentence with your entire window from the metadata by defining a MetadataReplacementPostProcessor and utilizing it within the record of node_postprocessors.

from llama_index.core.postprocessor import MetadataReplacementPostProcessor# The goal key defaults to `window` to match the node_parser's default
postproc = MetadataReplacementPostProcessor(
target_metadata_key="window"
)
...
query_engine = index.as_query_engine( 
node_postprocessors = [postproc],
)

Implementing a hybrid search in LlamaIndex is as simple as two parameter adjustments to the query_engine if the underlying vector database helps hybrid search queries. The alpha parameter specifies the weighting between vector search and keyword-based search, the place alpha=0 means keyword-based search and alpha=1 means pure vector search.

query_engine = index.as_query_engine(
...,
vector_store_query_mode="hybrid", 
alpha=0.5,
...
)

Including a reranker to your superior RAG pipeline solely takes three easy steps:

First, outline a reranker mannequin. Right here, we’re utilizing the BAAI/bge-reranker-basefrom Hugging Face.
Within the question engine, add the reranker mannequin to the record of node_postprocessors.
Enhance the similarity_top_k within the question engine to retrieve extra context passages, which may be decreased to top_n after reranking.

# !pip set up torch sentence-transformers
from llama_index.core.postprocessor import SentenceTransformerRerank# Outline reranker mannequin
rerank = SentenceTransformerRerank(
top_n = 2, 
mannequin = "BAAI/bge-reranker-base"
)
...
# Add reranker to question engine
query_engine = index.as_query_engine(
similarity_top_k = 6,
...,
node_postprocessors = [rerank],
...,
)

[ad_2]