Getting Started with LangChain and Python for AI-Powered Workflows

1449 words

7 minutes

Getting Started with LangChain and Python for AI-Powered Workflows

2025-06-29

Tutorial

Python

/

LangChain

/

AI

/

LLM

/

Automation

Building advanced applications leveraging large language models (LLMs) often requires more than just sending a single prompt to an API. Complex workflows involve sequencing calls, interacting with external data sources, and using tools. LangChain is a framework designed to streamline the development of such AI-powered applications by providing abstractions and components that facilitate these interactions using Python.

LangChain simplifies the process of integrating LLMs into various applications, enabling developers to build sophisticated systems ranging from chatbots that can access real-time information to agents that can perform actions based on user instructions. It addresses the challenge of connecting different AI models and external resources in a coherent and manageable way.

Core Concepts in LangChain#

Understanding the fundamental building blocks of LangChain is essential before building AI workflows. The framework revolves around several key modules:

Models: This module provides interfaces to different types of language models, such as standard LLMs (for text completion) and Chat Models (optimized for conversational turns). LangChain supports numerous providers like OpenAI, Hugging Face, Anthropic, and others, allowing for flexibility in choosing the underlying AI model.
Prompts: Managing and optimizing prompts is crucial for effective LLM interaction. LangChain’s Prompts module includes:
- Prompt Templates: Standardized structures for creating prompts that can be dynamically populated with user input or other data.
- Output Parsers: Structures for extracting information from the LLM’s text output into structured formats (like JSON or lists).
Indexes: This module helps in structuring documents and preparing them for interaction with LLMs. It includes:
- Document Loaders: Components for ingesting data from various sources (files, websites, databases).
- Text Splitters: Utilities for breaking down large documents into smaller, manageable chunks.
- Vector Stores: Integrations with databases that store text data as numerical vector representations (embeddings), enabling semantic search and retrieval.
- Retrievers: Interfaces for searching and retrieving relevant document chunks from a vector store based on a query.
Chains: Chains combine LLM calls with other steps or components in a sequence. They allow for multi-step workflows, such as passing the output of one LLM call as input to another, or combining LLM calls with data retrieval.
Agents: Agents use an LLM as a “reasoning engine” to determine which actions to take and in what order. They can use tools (like search engines, calculators, or custom functions) to interact with the external world, enabling dynamic and multi-step problem-solving.

These components can be used independently or combined to create complex, stateful, and data-aware AI applications.

Setting Up Your Python Environment for LangChain#

Getting started with LangChain requires a Python environment and the installation of the necessary libraries. Python 3.8+ is recommended.

Create a Virtual Environment: Using a virtual environment is a standard practice to manage project dependencies and avoid conflicts.
Terminal window
```
1
python -m venv .venv
```
Activate the Virtual Environment:
- On macOS/Linux:
  Terminal window
```
1
source .venv/bin/activate
```
- On Windows:
  Terminal window
```
1
.venv\Scripts\activate
```
Install LangChain: Install the core LangChain library.
Terminal window
```
1
pip install langchain
```
Install Specific Model Provider: Install the library for the LLM provider being used (e.g., OpenAI).
Terminal window
```
1
pip install langchain-openai
```
Note: If using a different provider like Hugging Face, install langchain-huggingface instead.
Set Up API Keys: Most external LLM providers require an API key for authentication and billing. This key should be stored securely, typically using environment variables.
Terminal window
```
1
export OPENAI_API_KEY="your-api-key"
```
Note: Replace "your-api-key" with the actual key. For production systems, more robust secrets management is advised.

With the environment configured and libraries installed, the foundation for building LangChain applications is ready.

Building Basic AI Workflows with LangChain and Python#

Understanding how to combine LangChain components begins with simple examples.

Making a Simple LLM Call#

The most basic interaction involves calling an LLM directly. LangChain provides an abstraction for this.

1
from langchain_openai import OpenAI
2
import os
3

4
# Initialize the LLM - uses OPENAI_API_KEY environment variable
5
llm = OpenAI(temperature=0.7)
6

7
# Make a prediction
8
text = "What is the capital of France?"
9
response = llm.invoke(text)
10

11
print(response)

This code initializes a connection to the OpenAI text completion model and sends a simple query, printing the model’s response. The temperature parameter influences the creativity/randomness of the output (lower values are more deterministic).

Using Prompt Templates#

Prompt templates make prompts reusable and dynamic.

1
from langchain_openai import OpenAI
2
from langchain_core.prompts import PromptTemplate
3
import os
4

5
llm = OpenAI(temperature=0.7)
6

7
# Define a prompt template with an input variable
8
prompt = PromptTemplate(
9
    input_variables=["topic"],
10
    template="Write a short paragraph about {topic}.",
11
)
12

13
# Create a chain that combines the prompt template and the LLM
14
from langchain.chains import LLMChain
15

16
chain = LLMChain(llm=llm, prompt=prompt)
17

18
# Run the chain with different inputs
19
print(chain.invoke({"topic": "artificial intelligence"}))
20
print(chain.invoke({"topic": "blockchain technology"}))

This example demonstrates creating a template with a placeholder topic. The LLMChain links the template and the LLM, allowing the prompt to be populated and sent to the model in a single step.

Simple Sequence Chain#

More complex workflows involve chaining multiple steps. An LLMChain is a specific type of chain, but LangChain supports sequencing operations.

Consider a workflow that first generates a list of related concepts and then writes a paragraph about one of them.

1
from langchain_openai import OpenAI
2
from langchain_core.prompts import PromptTemplate
3
from langchain.chains import SimpleSequentialChain
4
import os
5

6
llm = OpenAI(temperature=0.7)
7

8
# Define the first prompt: Generate related concepts
9
prompt1 = PromptTemplate(
10
    input_variables=["subject"],
11
    template="List 3 concepts related to {subject}. List them comma-separated.",
12
)
13
chain1 = LLMChain(llm=llm, prompt=prompt1)
14

15
# Define the second prompt: Write a paragraph about one concept
16
prompt2 = PromptTemplate(
17
    input_variables=["concepts"],
18
    template="Write a brief paragraph explaining {concepts}.",
19
)
20
chain2 = LLMChain(llm=llm, prompt=prompt2)
21

22
# Combine the chains in a sequence
23
overall_chain = SimpleSequentialChain(chains=[chain1, chain2], verbose=True)
24

25
# Run the overall chain
26
print(overall_chain.invoke("machine learning"))

The SimpleSequentialChain takes the output of the first chain (a list of concepts generated by the LLM based on the first prompt) and passes it directly as input to the second chain’s prompt. The verbose=True flag helps visualize the steps being executed.

Real-World Application: Building a Basic Q&A System#

A common and practical AI-powered workflow is building a question-answering (Q&A) system over specific documents. This often involves a pattern called Retrieval Augmented Generation (RAG), which combines information retrieval with LLM generation. LangChain is well-suited for this.

The basic steps involve:

Loading Documents: Ingest text data from sources.
Splitting Documents: Break large documents into smaller, overlapping chunks. This is necessary because LLMs have input token limits, and searching smaller chunks is more effective.
Creating Embeddings: Convert text chunks into numerical vector representations (embeddings).
Storing Embeddings: Store the embeddings and their corresponding text chunks in a vector store (a type of database optimized for vector search).
Retrieving Relevant Chunks: When a query is received, create an embedding for the query and search the vector store for the most semantically similar document chunks.
Generating Answer: Pass the retrieved chunks and the original query to an LLM, instructing it to generate an answer based only on the provided context.

Here’s a simplified illustration using LangChain components:

1
from langchain_community.document_loaders import TextLoader
2
from langchain_community.embeddings import OpenAIEmbeddings
3
from langchain_community.vectorstores import FAISS
4
from langchain_text_splitters import CharacterTextSplitter
5
from langchain.chains import RetrievalQA
6
from langchain_openai import OpenAI
7
import os
8

9
# Assume a file named 'document.txt' exists with some text data
10
# Example document.txt content:
11
# "Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines..."
12

13
# 1. Load the document
14
loader = TextLoader("document.txt")
15
documents = loader.load()
16

17
# 2. Split the document
18
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
19
texts = text_splitter.split_documents(documents)
20

21
# 3. Create embeddings and 4. Store embeddings in a vector store
22
# Using FAISS (a local vector store for demonstration)
23
embeddings = OpenAIEmbeddings()
24
vectorstore = FAISS.from_documents(texts, embeddings)
25

26
# 5. Create a retriever
27
retriever = vectorstore.as_retriever()
28

29
# 6. Create a RetrievalQA chain
30
qa_chain = RetrievalQA.from_chain_type(
31
    llm=OpenAI(), retriever=retriever, chain_type="stuff"
32
)
33
# 'stuff' chain type simply stuffs all retrieved documents into the prompt
34

35
# Ask a question
36
query = "What is Artificial Intelligence?"
37
response = qa_chain.invoke(query)
38

39
print(response)

This code snippet outlines the core steps of a RAG system using LangChain. It loads text, splits it, embeds it, stores it in a FAISS vector store, and then sets up a RetrievalQA chain. This chain automatically handles retrieving relevant document chunks based on the user’s query and passing them to the LLM along with the prompt to generate an answer. This pattern is highly effective for building knowledge-based systems that can answer questions about specific, private, or constantly updated data.

Key Takeaways for Getting Started#

Getting started with LangChain for AI-powered workflows involves understanding its modular structure and how to connect components.

LangChain provides a framework to orchestrate LLM calls, external data, and tools, moving beyond simple API calls.
Core modules include Models, Prompts, Indexes, Chains, and Agents, each serving a specific purpose in building AI applications.
Setting up involves installing the langchain library, provider-specific libraries (like langchain-openai), and configuring API keys.
Simple LLM and PromptTemplate combinations form the basis of many interactions.
Chains allow sequencing multiple steps or components, creating defined workflows.
The Retrieval Augmented Generation (RAG) pattern, facilitated by LangChain’s Indexing components (loaders, splitters, embeddings, vector stores, retrievers) and specialized chains (like RetrievalQA), is a powerful approach for building Q&A systems over custom data.
Building complex AI workflows becomes more manageable by breaking them down into smaller, interconnected steps using LangChain’s abstractions.

By understanding these fundamental concepts and practicing with basic examples, one can begin to leverage LangChain and Python to build more sophisticated and data-aware AI applications.