Using Python and SQLite to Build a Personal Knowledge Base

2121 words

11 minutes

Using Python and SQLite to Build a Personal Knowledge Base

2025-06-30

Project

Python

/

Knowledge Management

/

SQLite

/

Productivity

/

Project

Building a Personal Knowledge Base with Python and SQLite#

A personal knowledge base (PKB) serves as a centralized repository for information, ideas, notes, and resources collected over time. It functions as a digital archive, aiding organization, retrieval, and synthesis of knowledge. Building a custom PKB allows tailoring its structure and functionality precisely to individual needs, unlike generic note-taking applications.

This article explores using Python and SQLite to construct a functional PKB. Python, a versatile programming language known for its readability and extensive libraries, provides the logic for interacting with the knowledge base. SQLite, a lightweight, file-based database engine, offers a robust and portable solution for storing structured data without requiring a separate server process. This combination offers flexibility, control, and ease of deployment for a personal system.

Why Python and SQLite?#

The choice of Python and SQLite for a personal knowledge base offers several advantages:

Simplicity and Accessibility: SQLite databases are single files, making them easy to manage, back up, and transfer. Python’s standard library includes the sqlite3 module, requiring no external dependencies for basic database interaction.
Portability: A SQLite database file works across different operating systems, simplifying access from various machines.
Performance: For a personal system with potentially tens of thousands of entries, SQLite provides excellent performance for typical query operations.
Customization: Building from scratch with Python allows defining the exact data structure and features required, unlike rigid off-the-shelf applications.
Scalability: While not suited for massive, concurrent enterprise applications, SQLite is more than adequate for the scale of a personal knowledge store.

Essential Concepts#

Constructing a PKB involves understanding fundamental database principles and how Python interacts with them.

Personal Knowledge Base Structure: At its core, a PKB stores pieces of information (notes, articles, links) and connects them through relationships (tags, categories, links between notes).
Relational Database: SQLite is a relational database. Data is organized into tables with defined columns and data types. Relationships are established between tables using common columns (keys).
SQL (Structured Query Language): This is the standard language used to interact with relational databases, including SQLite. Commands include CREATE TABLE, INSERT, SELECT, UPDATE, and DELETE.
Database Schema: The schema defines the structure of the database: the tables, their columns, data types, and relationships. A well-designed schema is crucial for efficient data storage and retrieval.
Python sqlite3 Module: This built-in Python library provides functions to connect to SQLite databases, execute SQL commands, and fetch results.

Designing the SQLite Database Schema#

A fundamental PKB requires tables to store the core information and how it relates. A simple schema might include:

Notes Table: Stores the main content.
Tags Table: Stores keywords or tags.
Note-Tag Relationship Table: Connects notes to tags (a note can have many tags, and a tag can apply to many notes - a many-to-many relationship).

Proposed Schema:#

notes table:
- id: INTEGER PRIMARY KEY (unique identifier for each note)
- title: TEXT (optional title for the note)
- content: TEXT (the main text content of the note)
- created_at: DATETIME (timestamp when the note was created)
- updated_at: DATETIME (timestamp when the note was last modified)
- source_url: TEXT (optional URL if the note originates from the web)
tags table:
- id: INTEGER PRIMARY KEY (unique identifier for each tag)
- name: TEXT UNIQUE (the tag name, enforced unique)
note_tags table: (Linking table for many-to-many relationship)
- note_id: INTEGER (Foreign Key referencing notes.id)
- tag_id: INTEGER (Foreign Key referencing tags.id)
- PRIMARY KEY (note_id, tag_id) (Ensures unique combination of note and tag)
- FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE
- FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE (If a note or tag is deleted, the corresponding entry in note_tags is removed)

SQL for Creating Tables:#

1
-- Create the notes table
2
CREATE TABLE notes (
3
    id INTEGER PRIMARY KEY,
4
    title TEXT,
5
    content TEXT NOT NULL,
6
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
7
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
8
    source_url TEXT
9
);
10

11
-- Create the tags table
12
CREATE TABLE tags (
13
    id INTEGER PRIMARY KEY,
14
    name TEXT NOT NULL UNIQUE
15
);
16

17
-- Create the linking table for notes and tags
18
CREATE TABLE note_tags (
19
    note_id INTEGER,
20
    tag_id INTEGER,
21
    PRIMARY KEY (note_id, tag_id),
22
    FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
23
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
24
);

Implementing with Python#

The Python sqlite3 module provides the interface to interact with the database file.

1. Connecting to the Database#

Establish a connection to the SQLite database file. If the file does not exist, it will be created.

1
import sqlite3
2
from datetime import datetime
3

4
DATABASE_FILE = 'personal_kb.db'
5

6
def get_db_connection():
7
    """Establishes a connection to the SQLite database."""
8
    conn = sqlite3.connect(DATABASE_FILE)
9
    conn.row_factory = sqlite3.Row # Access columns by name
10
    return conn

The conn.row_factory = sqlite3.Row line is useful as it allows accessing query results like dictionaries (e.g., row['title']) instead of just by index.

2. Creating Tables#

Execute the CREATE TABLE SQL statements using Python. This is typically done once when the application is first run or set up.

1
def create_tables():
2
    """Creates the necessary tables if they don't exist."""
3
    conn = get_db_connection()
4
    cursor = conn.cursor()
5

6
    cursor.execute("""
7
        CREATE TABLE IF NOT EXISTS notes (
8
            id INTEGER PRIMARY KEY,
9
            title TEXT,
10
            content TEXT NOT NULL,
11
            created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
12
            updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
13
            source_url TEXT
14
        )
15
    """)
16

17
    cursor.execute("""
18
        CREATE TABLE IF NOT EXISTS tags (
19
            id INTEGER PRIMARY KEY,
20
            name TEXT NOT NULL UNIQUE
21
        )
22
    """)
23

24
    cursor.execute("""
25
        CREATE TABLE IF NOT EXISTS note_tags (
26
            note_id INTEGER,
27
            tag_id INTEGER,
28
            PRIMARY KEY (note_id, tag_id),
29
            FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
30
            FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
31
        )
32
    """)
33

34
    conn.commit()
35
    conn.close()
36

37
# Call this function once to set up the database
38
# create_tables()

The CREATE TABLE IF NOT EXISTS syntax prevents errors if the script is run multiple times.

3. Adding Data (Inserting Notes and Tags)#

Adding a note involves inserting data into the notes table. Adding tags requires potentially adding the tag name to the tags table (if it doesn’t exist) and then linking the note and tag in the note_tags table. Using parameters (?) in SQL queries with the second argument of execute is crucial for preventing SQL injection vulnerabilities.

1
def add_note(title, content, source_url=None, tags=[]):
2
    """Adds a new note to the database and links associated tags."""
3
    conn = get_db_connection()
4
    cursor = conn.cursor()
5

6
    # Insert the note
7
    cursor.execute("""
8
        INSERT INTO notes (title, content, source_url)
9
        VALUES (?, ?, ?)
10
    """, (title, content, source_url))
11

12
    note_id = cursor.lastrowid # Get the ID of the newly inserted note
13

14
    # Process and link tags
15
    for tag_name in tags:
16
        # Check if tag exists, if not, insert it
17
        cursor.execute("SELECT id FROM tags WHERE name = ?", (tag_name,))
18
        tag_row = cursor.fetchone()
19

20
        if tag_row:
21
            tag_id = tag_row['id']
22
        else:
23
            cursor.execute("INSERT INTO tags (name) VALUES (?)", (tag_name,))
24
            tag_id = cursor.lastrowid
25

26
        # Link note and tag in the note_tags table
27
        try:
28
            cursor.execute("INSERT INTO note_tags (note_id, tag_id) VALUES (?, ?)", (note_id, tag_id))
29
        except sqlite3.IntegrityError:
30
            # This handles cases where the link already exists (e.g., adding the same tag twice)
31
            pass # Or log a warning
32

33
    conn.commit()
34
    conn.close()
35
    return note_id # Return the ID of the created note
36

37
# Example Usage:
38
# add_note(
39
#     title="Python SQLite Tutorial Notes",
40
#     content="Learned how to connect, create tables, and insert data.",
41
#     tags=["python", "database", "sqlite"]
42
# )
43
# add_note(
44
#     title="Interesting Article on PKM",
45
#     content="Article discussing different PKM approaches.",
46
#     source_url="http://example.com/pkm-article",
47
#     tags=["pkm", "knowledge-management"]
48
# )

4. Retrieving Data (Querying Notes)#

Queries can retrieve notes based on various criteria: keywords in content or title, specific tags, creation date ranges, etc. Joining tables (notes, note_tags, tags) is necessary to retrieve notes associated with specific tags.

1
def get_note_by_id(note_id):
2
    """Retrieves a single note by its ID."""
3
    conn = get_db_connection()
4
    cursor = conn.cursor()
5
    cursor.execute("SELECT * FROM notes WHERE id = ?", (note_id,))
6
    note = cursor.fetchone()
7
    conn.close()
8
    return note
9

10
def search_notes(keyword=None, tag=None):
11
    """Searches for notes by keyword (in title or content) or tag."""
12
    conn = get_db_connection()
13
    cursor = conn.cursor()
14

15
    query = "SELECT DISTINCT n.* FROM notes n"
16
    params = []
17
    joins = []
18
    conditions = []
19

20
    if tag:
21
        joins.append("JOIN note_tags nt ON n.id = nt.note_id JOIN tags t ON nt.tag_id = t.id")
22
        conditions.append("t.name = ?")
23
        params.append(tag)
24

25
    if keyword:
26
        # Use LIKE for partial matching (case-insensitive with COLLATE NOCASE)
27
        conditions.append("(n.title LIKE ? COLLATE NOCASE OR n.content LIKE ? COLLATE NOCASE)")
28
        params.extend([f"%{keyword}%", f"%{keyword}%"])
29

30
    if joins:
31
        query += " " + " ".join(joins)
32

33
    if conditions:
34
        query += " WHERE " + " AND ".join(conditions)
35

36
    query += " ORDER BY n.updated_at DESC" # Order by most recently updated
37

38
    cursor.execute(query, params)
39
    notes = cursor.fetchall()
40
    conn.close()
41
    return notes
42

43
def get_notes_with_tags(notes_list):
44
    """Fetches tags for a list of note objects/dictionaries."""
45
    if not notes_list:
46
        return notes_list # Return empty list if no notes provided
47

48
    conn = get_db_connection()
49
    cursor = conn.cursor()
50
    notes_with_tags = []
51

52
    # Fetch tags for each note
53
    for note in notes_list:
54
        cursor.execute("""
55
            SELECT t.name FROM tags t
56
            JOIN note_tags nt ON t.id = nt.tag_id
57
            WHERE nt.note_id = ?
58
        """, (note['id'],))
59
        tags = [row['name'] for row in cursor.fetchall()]
60
        # Create a new dictionary or modify the existing one to include tags
61
        note_dict = dict(note) # Convert sqlite3.Row to dict if needed
62
        note_dict['tags'] = tags
63
        notes_with_tags.append(note_dict)
64

65
    conn.close()
66
    return notes_with_tags
67

68

69
# Example Usage:
70
# print("All notes:")
71
# all_notes = get_notes_with_tags(search_notes()) # Get all notes
72
# for note in all_notes:
73
#     print(f"- {note['title']} ({', '.join(note['tags'])})")
74

75
# print("\nNotes tagged 'python':")
76
# python_notes = get_notes_with_tags(search_notes(tag='python'))
77
# for note in python_notes:
78
#     print(f"- {note['title']}")
79

80
# print("\nNotes containing 'PKM':")
81
# pkm_notes = get_notes_with_tags(search_notes(keyword='PKM'))
82
# for note in pkm_notes:
83
#      print(f"- {note['title']}")

5. Updating and Deleting Data#

Modifying existing notes or removing them (and their tag associations) requires UPDATE and DELETE statements.

1
def update_note(note_id, title=None, content=None, source_url=None, tags=None):
2
    """Updates an existing note and optionally its tags."""
3
    conn = get_db_connection()
4
    cursor = conn.cursor()
5
    update_fields = []
6
    params = []
7

8
    if title is not None:
9
        update_fields.append("title = ?")
10
        params.append(title)
11
    if content is not None:
12
        update_fields.append("content = ?")
13
        params.append(content)
14
    if source_url is not None:
15
        update_fields.append("source_url = ?")
16
        params.append(source_url)
17

18
    # Add update timestamp
19
    update_fields.append("updated_at = ?")
20
    params.append(datetime.now().strftime('%Y-%m-%d %H:%M:%S')) # SQLite DATETIME format
21

22
    if update_fields:
23
        query = "UPDATE notes SET " + ", ".join(update_fields) + " WHERE id = ?"
24
        params.append(note_id)
25
        cursor.execute(query, params)
26

27
    # Update tags: A simple approach is to remove all existing tags and re-add the new list
28
    if tags is not None:
29
        # Remove existing tags for this note
30
        cursor.execute("DELETE FROM note_tags WHERE note_id = ?", (note_id,))
31

32
        # Add new tags
33
        for tag_name in tags:
34
             # Check if tag exists, if not, insert it
35
            cursor.execute("SELECT id FROM tags WHERE name = ?", (tag_name,))
36
            tag_row = cursor.fetchone()
37

38
            if tag_row:
39
                tag_id = tag_row['id']
40
            else:
41
                cursor.execute("INSERT INTO tags (name) VALUES (?)", (tag_name,))
42
                tag_id = cursor.lastrowid
43

44
            # Link note and tag
45
            try:
46
                 cursor.execute("INSERT INTO note_tags (note_id, tag_id) VALUES (?, ?)", (note_id, tag_id))
47
            except sqlite3.IntegrityError:
48
                 pass # Link already exists
49

50
    conn.commit()
51
    conn.close()
52
    print(f"Note {note_id} updated.")
53

54

55
def delete_note(note_id):
56
    """Deletes a note and its associated tag links."""
57
    conn = get_db_connection()
58
    cursor = conn.cursor()
59
    # Due to ON DELETE CASCADE in the schema, deleting the note
60
    # will automatically delete corresponding entries in note_tags.
61
    cursor.execute("DELETE FROM notes WHERE id = ?", (note_id,))
62
    conn.commit()
63
    conn.close()
64
    print(f"Note {note_id} deleted.")
65

66
# Example Usage:
67
# # Assuming note_id 1 exists
68
# update_note(1, title="Updated Tutorial Notes", tags=["python", "database", "sqlite", "tutorial"])
69
# # Assuming note_id 2 exists
70
# delete_note(2)

Real-World Application: Research Notes Manager#

Consider the scenario of a researcher or student collecting information from various sources – articles, books, websites, lecture notes. A Python and SQLite PKB can serve as a custom research notes manager.

Use Case: Storing excerpts from papers, links to useful resources, summaries of concepts, and connecting them via relevant keywords (tags) and perhaps linking notes together.

Implementation:

Schema: The proposed notes, tags, and note_tags schema works well. An optional links table could be added to represent explicit links between notes (e.g., a summary note linking to a note detailing a specific concept).
Python Functions:
- add_research_note(title, content, source_url, tags, related_note_ids): Extends add_note to handle source_url and tags. Could also add logic to create links if a links table exists.
- find_notes_by_topic(tag_list): Uses the search_notes function filtered by multiple tags using WHERE t.name IN (...).
- find_notes_by_keyword(keyword): Uses search_notes filtered by keyword in content/title.
- get_note_with_context(note_id): Retrieves a note and also fetches its associated tags and potentially notes linked to it.
- generate_report(tag): Could query notes with a specific tag and output them in a formatted text or markdown file.
User Interface: While not covered in detail here, a simple command-line interface could wrap these Python functions, or a web interface could be built using frameworks like Flask or FastAPI, backed by these same Python functions interacting with the SQLite database.

This demonstrates how the core Python and SQLite components form the engine for a tailored application, providing precise control over data organization and retrieval for specific workflows like managing research.

Key Takeaways#

A personal knowledge base organizes information for retrieval and synthesis.
Python and SQLite offer a simple, portable, and customizable solution for building a PKB.
Designing a relational database schema (notes, tags, note_tags tables) is a foundational step.
The Python sqlite3 module facilitates connecting, executing SQL commands (CREATE TABLE, INSERT, SELECT, UPDATE, DELETE), and fetching data.
Using parameters in SQL queries is essential for security against SQL injection.
Joining tables allows retrieving related information, such as fetching notes along with their associated tags.
Basic CRUD (Create, Read, Update, Delete) operations form the core functionality for managing knowledge entries.
This setup serves as a flexible backend for various front-end interfaces or automation scripts tailored to personal knowledge management workflows.