2121 words
11 minutes
Using Python and SQLite to Build a Personal Knowledge Base

Building a Personal Knowledge Base with Python and SQLite#

A personal knowledge base (PKB) serves as a centralized repository for information, ideas, notes, and resources collected over time. It functions as a digital archive, aiding organization, retrieval, and synthesis of knowledge. Building a custom PKB allows tailoring its structure and functionality precisely to individual needs, unlike generic note-taking applications.

This article explores using Python and SQLite to construct a functional PKB. Python, a versatile programming language known for its readability and extensive libraries, provides the logic for interacting with the knowledge base. SQLite, a lightweight, file-based database engine, offers a robust and portable solution for storing structured data without requiring a separate server process. This combination offers flexibility, control, and ease of deployment for a personal system.

Why Python and SQLite?#

The choice of Python and SQLite for a personal knowledge base offers several advantages:

  • Simplicity and Accessibility: SQLite databases are single files, making them easy to manage, back up, and transfer. Python’s standard library includes the sqlite3 module, requiring no external dependencies for basic database interaction.
  • Portability: A SQLite database file works across different operating systems, simplifying access from various machines.
  • Performance: For a personal system with potentially tens of thousands of entries, SQLite provides excellent performance for typical query operations.
  • Customization: Building from scratch with Python allows defining the exact data structure and features required, unlike rigid off-the-shelf applications.
  • Scalability: While not suited for massive, concurrent enterprise applications, SQLite is more than adequate for the scale of a personal knowledge store.

Essential Concepts#

Constructing a PKB involves understanding fundamental database principles and how Python interacts with them.

  • Personal Knowledge Base Structure: At its core, a PKB stores pieces of information (notes, articles, links) and connects them through relationships (tags, categories, links between notes).
  • Relational Database: SQLite is a relational database. Data is organized into tables with defined columns and data types. Relationships are established between tables using common columns (keys).
  • SQL (Structured Query Language): This is the standard language used to interact with relational databases, including SQLite. Commands include CREATE TABLE, INSERT, SELECT, UPDATE, and DELETE.
  • Database Schema: The schema defines the structure of the database: the tables, their columns, data types, and relationships. A well-designed schema is crucial for efficient data storage and retrieval.
  • Python sqlite3 Module: This built-in Python library provides functions to connect to SQLite databases, execute SQL commands, and fetch results.

Designing the SQLite Database Schema#

A fundamental PKB requires tables to store the core information and how it relates. A simple schema might include:

  1. Notes Table: Stores the main content.
  2. Tags Table: Stores keywords or tags.
  3. Note-Tag Relationship Table: Connects notes to tags (a note can have many tags, and a tag can apply to many notes - a many-to-many relationship).

Proposed Schema:#

  • notes table:

    • id: INTEGER PRIMARY KEY (unique identifier for each note)
    • title: TEXT (optional title for the note)
    • content: TEXT (the main text content of the note)
    • created_at: DATETIME (timestamp when the note was created)
    • updated_at: DATETIME (timestamp when the note was last modified)
    • source_url: TEXT (optional URL if the note originates from the web)
  • tags table:

    • id: INTEGER PRIMARY KEY (unique identifier for each tag)
    • name: TEXT UNIQUE (the tag name, enforced unique)
  • note_tags table: (Linking table for many-to-many relationship)

    • note_id: INTEGER (Foreign Key referencing notes.id)
    • tag_id: INTEGER (Foreign Key referencing tags.id)
    • PRIMARY KEY (note_id, tag_id) (Ensures unique combination of note and tag)
    • FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE
    • FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE (If a note or tag is deleted, the corresponding entry in note_tags is removed)

SQL for Creating Tables:#

-- Create the notes table
CREATE TABLE notes (
id INTEGER PRIMARY KEY,
title TEXT,
content TEXT NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
source_url TEXT
);
-- Create the tags table
CREATE TABLE tags (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
-- Create the linking table for notes and tags
CREATE TABLE note_tags (
note_id INTEGER,
tag_id INTEGER,
PRIMARY KEY (note_id, tag_id),
FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

Implementing with Python#

The Python sqlite3 module provides the interface to interact with the database file.

1. Connecting to the Database#

Establish a connection to the SQLite database file. If the file does not exist, it will be created.

import sqlite3
from datetime import datetime
DATABASE_FILE = 'personal_kb.db'
def get_db_connection():
"""Establishes a connection to the SQLite database."""
conn = sqlite3.connect(DATABASE_FILE)
conn.row_factory = sqlite3.Row # Access columns by name
return conn

The conn.row_factory = sqlite3.Row line is useful as it allows accessing query results like dictionaries (e.g., row['title']) instead of just by index.

2. Creating Tables#

Execute the CREATE TABLE SQL statements using Python. This is typically done once when the application is first run or set up.

def create_tables():
"""Creates the necessary tables if they don't exist."""
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS notes (
id INTEGER PRIMARY KEY,
title TEXT,
content TEXT NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
source_url TEXT
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS tags (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS note_tags (
note_id INTEGER,
tag_id INTEGER,
PRIMARY KEY (note_id, tag_id),
FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
)
""")
conn.commit()
conn.close()
# Call this function once to set up the database
# create_tables()

The CREATE TABLE IF NOT EXISTS syntax prevents errors if the script is run multiple times.

3. Adding Data (Inserting Notes and Tags)#

Adding a note involves inserting data into the notes table. Adding tags requires potentially adding the tag name to the tags table (if it doesn’t exist) and then linking the note and tag in the note_tags table. Using parameters (?) in SQL queries with the second argument of execute is crucial for preventing SQL injection vulnerabilities.

def add_note(title, content, source_url=None, tags=[]):
"""Adds a new note to the database and links associated tags."""
conn = get_db_connection()
cursor = conn.cursor()
# Insert the note
cursor.execute("""
INSERT INTO notes (title, content, source_url)
VALUES (?, ?, ?)
""", (title, content, source_url))
note_id = cursor.lastrowid # Get the ID of the newly inserted note
# Process and link tags
for tag_name in tags:
# Check if tag exists, if not, insert it
cursor.execute("SELECT id FROM tags WHERE name = ?", (tag_name,))
tag_row = cursor.fetchone()
if tag_row:
tag_id = tag_row['id']
else:
cursor.execute("INSERT INTO tags (name) VALUES (?)", (tag_name,))
tag_id = cursor.lastrowid
# Link note and tag in the note_tags table
try:
cursor.execute("INSERT INTO note_tags (note_id, tag_id) VALUES (?, ?)", (note_id, tag_id))
except sqlite3.IntegrityError:
# This handles cases where the link already exists (e.g., adding the same tag twice)
pass # Or log a warning
conn.commit()
conn.close()
return note_id # Return the ID of the created note
# Example Usage:
# add_note(
# title="Python SQLite Tutorial Notes",
# content="Learned how to connect, create tables, and insert data.",
# tags=["python", "database", "sqlite"]
# )
# add_note(
# title="Interesting Article on PKM",
# content="Article discussing different PKM approaches.",
# source_url="http://example.com/pkm-article",
# tags=["pkm", "knowledge-management"]
# )

4. Retrieving Data (Querying Notes)#

Queries can retrieve notes based on various criteria: keywords in content or title, specific tags, creation date ranges, etc. Joining tables (notes, note_tags, tags) is necessary to retrieve notes associated with specific tags.

def get_note_by_id(note_id):
"""Retrieves a single note by its ID."""
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute("SELECT * FROM notes WHERE id = ?", (note_id,))
note = cursor.fetchone()
conn.close()
return note
def search_notes(keyword=None, tag=None):
"""Searches for notes by keyword (in title or content) or tag."""
conn = get_db_connection()
cursor = conn.cursor()
query = "SELECT DISTINCT n.* FROM notes n"
params = []
joins = []
conditions = []
if tag:
joins.append("JOIN note_tags nt ON n.id = nt.note_id JOIN tags t ON nt.tag_id = t.id")
conditions.append("t.name = ?")
params.append(tag)
if keyword:
# Use LIKE for partial matching (case-insensitive with COLLATE NOCASE)
conditions.append("(n.title LIKE ? COLLATE NOCASE OR n.content LIKE ? COLLATE NOCASE)")
params.extend([f"%{keyword}%", f"%{keyword}%"])
if joins:
query += " " + " ".join(joins)
if conditions:
query += " WHERE " + " AND ".join(conditions)
query += " ORDER BY n.updated_at DESC" # Order by most recently updated
cursor.execute(query, params)
notes = cursor.fetchall()
conn.close()
return notes
def get_notes_with_tags(notes_list):
"""Fetches tags for a list of note objects/dictionaries."""
if not notes_list:
return notes_list # Return empty list if no notes provided
conn = get_db_connection()
cursor = conn.cursor()
notes_with_tags = []
# Fetch tags for each note
for note in notes_list:
cursor.execute("""
SELECT t.name FROM tags t
JOIN note_tags nt ON t.id = nt.tag_id
WHERE nt.note_id = ?
""", (note['id'],))
tags = [row['name'] for row in cursor.fetchall()]
# Create a new dictionary or modify the existing one to include tags
note_dict = dict(note) # Convert sqlite3.Row to dict if needed
note_dict['tags'] = tags
notes_with_tags.append(note_dict)
conn.close()
return notes_with_tags
# Example Usage:
# print("All notes:")
# all_notes = get_notes_with_tags(search_notes()) # Get all notes
# for note in all_notes:
# print(f"- {note['title']} ({', '.join(note['tags'])})")
# print("\nNotes tagged 'python':")
# python_notes = get_notes_with_tags(search_notes(tag='python'))
# for note in python_notes:
# print(f"- {note['title']}")
# print("\nNotes containing 'PKM':")
# pkm_notes = get_notes_with_tags(search_notes(keyword='PKM'))
# for note in pkm_notes:
# print(f"- {note['title']}")

5. Updating and Deleting Data#

Modifying existing notes or removing them (and their tag associations) requires UPDATE and DELETE statements.

def update_note(note_id, title=None, content=None, source_url=None, tags=None):
"""Updates an existing note and optionally its tags."""
conn = get_db_connection()
cursor = conn.cursor()
update_fields = []
params = []
if title is not None:
update_fields.append("title = ?")
params.append(title)
if content is not None:
update_fields.append("content = ?")
params.append(content)
if source_url is not None:
update_fields.append("source_url = ?")
params.append(source_url)
# Add update timestamp
update_fields.append("updated_at = ?")
params.append(datetime.now().strftime('%Y-%m-%d %H:%M:%S')) # SQLite DATETIME format
if update_fields:
query = "UPDATE notes SET " + ", ".join(update_fields) + " WHERE id = ?"
params.append(note_id)
cursor.execute(query, params)
# Update tags: A simple approach is to remove all existing tags and re-add the new list
if tags is not None:
# Remove existing tags for this note
cursor.execute("DELETE FROM note_tags WHERE note_id = ?", (note_id,))
# Add new tags
for tag_name in tags:
# Check if tag exists, if not, insert it
cursor.execute("SELECT id FROM tags WHERE name = ?", (tag_name,))
tag_row = cursor.fetchone()
if tag_row:
tag_id = tag_row['id']
else:
cursor.execute("INSERT INTO tags (name) VALUES (?)", (tag_name,))
tag_id = cursor.lastrowid
# Link note and tag
try:
cursor.execute("INSERT INTO note_tags (note_id, tag_id) VALUES (?, ?)", (note_id, tag_id))
except sqlite3.IntegrityError:
pass # Link already exists
conn.commit()
conn.close()
print(f"Note {note_id} updated.")
def delete_note(note_id):
"""Deletes a note and its associated tag links."""
conn = get_db_connection()
cursor = conn.cursor()
# Due to ON DELETE CASCADE in the schema, deleting the note
# will automatically delete corresponding entries in note_tags.
cursor.execute("DELETE FROM notes WHERE id = ?", (note_id,))
conn.commit()
conn.close()
print(f"Note {note_id} deleted.")
# Example Usage:
# # Assuming note_id 1 exists
# update_note(1, title="Updated Tutorial Notes", tags=["python", "database", "sqlite", "tutorial"])
# # Assuming note_id 2 exists
# delete_note(2)

Real-World Application: Research Notes Manager#

Consider the scenario of a researcher or student collecting information from various sources – articles, books, websites, lecture notes. A Python and SQLite PKB can serve as a custom research notes manager.

Use Case: Storing excerpts from papers, links to useful resources, summaries of concepts, and connecting them via relevant keywords (tags) and perhaps linking notes together.

Implementation:

  1. Schema: The proposed notes, tags, and note_tags schema works well. An optional links table could be added to represent explicit links between notes (e.g., a summary note linking to a note detailing a specific concept).
  2. Python Functions:
    • add_research_note(title, content, source_url, tags, related_note_ids): Extends add_note to handle source_url and tags. Could also add logic to create links if a links table exists.
    • find_notes_by_topic(tag_list): Uses the search_notes function filtered by multiple tags using WHERE t.name IN (...).
    • find_notes_by_keyword(keyword): Uses search_notes filtered by keyword in content/title.
    • get_note_with_context(note_id): Retrieves a note and also fetches its associated tags and potentially notes linked to it.
    • generate_report(tag): Could query notes with a specific tag and output them in a formatted text or markdown file.
  3. User Interface: While not covered in detail here, a simple command-line interface could wrap these Python functions, or a web interface could be built using frameworks like Flask or FastAPI, backed by these same Python functions interacting with the SQLite database.

This demonstrates how the core Python and SQLite components form the engine for a tailored application, providing precise control over data organization and retrieval for specific workflows like managing research.

Key Takeaways#

  • A personal knowledge base organizes information for retrieval and synthesis.
  • Python and SQLite offer a simple, portable, and customizable solution for building a PKB.
  • Designing a relational database schema (notes, tags, note_tags tables) is a foundational step.
  • The Python sqlite3 module facilitates connecting, executing SQL commands (CREATE TABLE, INSERT, SELECT, UPDATE, DELETE), and fetching data.
  • Using parameters in SQL queries is essential for security against SQL injection.
  • Joining tables allows retrieving related information, such as fetching notes along with their associated tags.
  • Basic CRUD (Create, Read, Update, Delete) operations form the core functionality for managing knowledge entries.
  • This setup serves as a flexible backend for various front-end interfaces or automation scripts tailored to personal knowledge management workflows.
Using Python and SQLite to Build a Personal Knowledge Base
https://dev-resources.site/posts/using-python-and-sqlite-to-build-a-personal-knowledge-base/
Author
Dev-Resources
Published at
2025-06-30
License
CC BY-NC-SA 4.0