LangChain has emerged as a powerful framework for developing applications powered by Large Language Models (LLMs). Its modular design and comprehensive toolkit simplify the process of connecting LLMs to various data sources and tools, making it easier to build complex and intelligent systems. Let's dive into a simple example of using LangChain to answer questions based on a document. Example: Question Answering from a Document First, make sure you have LangChain installed:
pip install langchain langchain-openai beautifulsoup4
This installs LangChain, the OpenAI integration (for using OpenAI models), and BeautifulSoup4 (for web scraping in the example below). You'll also need an OpenAI API key which you can set as an environment variable:
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY" # Replace with your actual key
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
# 1. Load the Document
loader = WebBaseLoader("https://www.example.com/article") # Replace with a real URL
documents = loader.load()
# 2. Split the Document into Chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
# 3. Initialize the Language Model
llm = OpenAI(temperature=0)
# 4. Create the Question Answering Chain
chain = load_qa_with_sources_chain(llm, chain_type="stuff") # 'stuff' is a simple chain type
# 5. Ask a Question
query = "What is the main topic of this article?"
result = chain({"input_documents": texts, "question": query}, return_only_outputs=True)
print(result["output_text"])
Explanation:
- Document Loading: We use
WebBaseLoader
to load content from a website. LangChain offers loaders for various data sources like PDFs, text files, databases, and more. - Text Splitting: LLMs have input token limits.
RecursiveCharacterTextSplitter
breaks the document into smaller chunks, ensuring they fit within the model's context window. - LLM Initialization: We initialize an
OpenAI
language model. Thetemperature
parameter controls the randomness of the output (0 for more deterministic results). - QA Chain:
load_qa_with_sources_chain
sets up a question answering pipeline. Thechain_type="stuff"
concatenates all the document chunks and feeds them to the LLM along with the question. - Question Answering: We provide the document chunks and the question to the chain and print the answer. Best Practices:
- Choose the Right Chain Type: LangChain offers different chain types like "map_reduce", "refine", and "map_rerank" which are more suitable for larger documents and more complex scenarios. Experiment to find the optimal one for your use case. "Stuff" is generally good for small documents.
- Optimize Chunk Size and Overlap: Adjust
chunk_size
andchunk_overlap
in the text splitter to balance context and avoid losing information between chunks. Larger chunks can improve context, but increase token count. - Leverage Embeddings: For more sophisticated question answering, consider using vector embeddings to represent document chunks. This allows you to retrieve the most relevant chunks based on semantic similarity to the question. LangChain integrates with vector databases like Chroma and Pinecone.
- Handle API Rate Limits: LLM APIs have rate limits. Implement retry mechanisms and consider using LangChain's rate limiters to avoid exceeding these limits. Conclusion: This example provides a basic understanding of how to use LangChain for question answering. By exploring the different modules and chain types, you can build more sophisticated applications to automate tasks, extract information, and create personalized experiences. Remember to experiment with different parameters and techniques to optimize your results. Tags: LangChain, LLM, OpenAI, Python