1691 words
8 minutes
Automating GitHub Issue Cleanup with Python and GitHub API

Automating GitHub Issue Cleanup with Python and the GitHub API#

Repository maintenance is a critical aspect of effective software development and project management. Over time, GitHub repositories can accumulate numerous issues that become outdated, resolved without explicit closure, or are no longer relevant. This proliferation of stale issues can hinder productivity, obscure important tasks, and make the repository difficult to navigate. Automating the process of identifying and managing these issues significantly improves repository health and team focus. This article explores how to leverage Python and the GitHub API for efficient GitHub issue cleanup.

The core problem addressed is the manual burden of sifting through potentially hundreds or thousands of issues to identify those suitable for closure, relabeling, or commentary. Automation offers a scalable solution, allowing teams to define criteria for “staleness” and apply cleanup actions programmatically.

Essential Concepts for GitHub Issue Automation#

Successful automation of GitHub issue cleanup requires understanding several key concepts:

  • GitHub Issues: The fundamental unit for tracking tasks, bugs, features, and other work items within a GitHub repository. Issues have states (open, closed), labels, assignees, comments, and activity timelines.
  • Stale Issues: Issues that meet specific criteria indicating they are no longer actively being worked on or are relevant. Common criteria include:
    • Lack of activity (comments, commits linked) for a defined period (e.g., 6 months).
    • Specific labels (e.g., “question,” “discussion,” “answered”) where closure is appropriate.
    • Association with completed milestones or projects.
  • GitHub API: A RESTful interface that allows external applications to interact with GitHub data programmatically. This includes reading repository information, listing issues, adding comments, applying labels, and closing issues. Access requires authentication.
  • Personal Access Tokens (PATs): Secure credentials used to authenticate with the GitHub API on behalf of a user or organization. PATs grant specific permissions (scopes) to interact with repositories and user data. Using PATs is a standard and recommended authentication method for scripts and tools.
  • Python Libraries: High-level programming languages like Python are well-suited for scripting automation tasks. Several Python libraries simplify interaction with the GitHub API, abstracting the complexities of HTTP requests and authentication. PyGitHub is a popular example, providing an object-oriented interface to GitHub resources.

The combination of Python’s scripting capabilities and the GitHub API’s programmatic access enables the creation of custom automation workflows tailored to specific repository needs and cleanup criteria.

Setting Up for Automation: Authentication and Libraries#

Interacting with the GitHub API requires authentication to identify the requesting application and control its permissions. The recommended approach for scripts is using a Personal Access Token.

Generating a GitHub Personal Access Token#

  1. Navigate to GitHub settings.
  2. Select “Developer settings.”
  3. Choose “Personal access tokens” > “Tokens (classic).”
  4. Generate a new token.
  5. Provide a descriptive name for the token (e.g., “issue-cleanup-script”).
  6. Crucially, assign the necessary scopes. For issue cleanup, the repo scope (or more granular scopes like repo:status, repo_deployment, public_repo, repo:invite, security_events) is often required to read and modify issues. Granting the minimal necessary permissions is a security best practice.
  7. Copy the generated token immediately. It will not be shown again. Store this token securely (e.g., environment variables, secrets management system).

Exposing tokens directly in script code is a security risk. Storing the token in an environment variable is a common, safer practice.

Installing a Python Library#

Python’s package manager, pip, simplifies installing third-party libraries. The PyGitHub library provides a convenient interface for the GitHub API.

Terminal window
pip install PyGitHub

This command downloads and installs the PyGitHub package and its dependencies, making it available for use in Python scripts.

Automating Issue Cleanup: A Step-by-Step Approach#

The process involves writing a Python script that uses the generated PAT to connect to GitHub, fetches issues from a target repository, applies defined cleanup logic, and performs actions via the API.

Step 1: Connect to GitHub and Get the Repository#

The script begins by importing the necessary library and authenticating with the GitHub API using the PAT.

import os
from github import Github
# Retrieve token from environment variable for security
github_token = os.environ.get('GITHUB_TOKEN')
if not github_token:
raise ValueError("GITHUB_TOKEN environment variable not set.")
# Authenticate with GitHub
g = Github(github_token)
# Specify the repository (e.g., 'owner/repo_name')
repo_name = "octocat/Spoon-Knife" # Replace with your repository name
try:
repo = g.get_repo(repo_name)
print(f"Successfully connected to repository: {repo_name}")
except Exception as e:
print(f"Error connecting to repository {repo_name}: {e}")
exit(1)

This code snippet retrieves the token from the environment variable GITHUB_TOKEN, authenticates with the GitHub API, and fetches the specified repository object, which is needed to interact with its issues.

Step 2: Define Cleanup Criteria and Fetch Issues#

Defining clear criteria is crucial. Stale issues can be identified based on factors like age, lack of recent activity, specific labels, or assignees. The GitHub API allows filtering issues based on various parameters.

For example, identifying issues that are open and have not been updated for a specific period (e.g., 180 days):

import datetime
days_stale = 180
stale_date = datetime.datetime.now() - datetime.timedelta(days=days_stale)
# Fetch open issues updated before the stale_date
# The GitHub API supports filtering by state and update time
# Note: PyGitHub might require iterating and checking updated_at for client-side filtering
# A direct search query is often more efficient for complex criteria
query = f'repo:{repo_name} is:issue is:open updated:<{stale_date.strftime("%Y-%m-%d")}'
print(f"Searching for issues with query: {query}")
try:
stale_issues = g.search_issues(query=query)
print(f"Found {stale_issues.totalCount} potentially stale issues.")
except Exception as e:
print(f"Error searching issues: {e}")
stale_issues = [] # Handle potential API errors

Using g.search_issues with a well-crafted query string is often more efficient than fetching all issues and filtering client-side, especially for large repositories. The query syntax (repo:owner/repo is:issue is:open updated:<YYYY-MM-DD) allows precise filtering directly via the API.

Step 3: Implement Cleanup Actions#

Once stale issues are identified, the script can perform actions. Common actions include:

  • Adding a “stale” label: Marks the issue for review.
  • Adding a comment: Informs the issue participants about potential closure.
  • Closing the issue: The final cleanup action.

A cautious approach involves a multi-stage process: first label/comment, then close after a waiting period if no activity occurs.

stale_label_name = "stale"
closure_comment = (
"This issue has been automatically marked as stale because it has not had "
f"recent activity. It will be closed in {days_to_close} days if no further activity occurs. "
"Thank you for your contributions."
)
days_to_close = 7 # Example: close issues 7 days after initial stale marking
try:
# Ensure the 'stale' label exists
try:
stale_label = repo.get_label(stale_label_name)
except: # Label doesn't exist, create it
stale_label = repo.create_label(stale_label_name, "f9d0c4", "Indicates inactivity")
print(f"Created label: {stale_label_name}")
for issue in stale_issues:
print(f"Processing issue #{issue.number}: {issue.title}")
# Action 1: Add 'stale' label if not already present
if stale_label_name not in [label.name for label in issue.labels]:
issue.add_to_labels(stale_label)
print(f" - Added '{stale_label_name}' label.")
# Action 2: Add a comment if not already commented by the bot/script recently
# (More complex logic might be needed to avoid spamming comments)
recent_comments = [c for c in issue.get_comments() if c.created_at > stale_date] # Basic check
if not any(closure_comment in c.body for c in recent_comments): # Avoid duplicate comments
issue.create_comment(closure_comment)
print(" - Added closure comment.")
# Action 3 (Optional/Advanced): Close issues labeled 'stale' with comment older than X days
# This would be a separate pass or integrated logic checking comment age
# Example:
# if stale_label_name in [label.name for label in issue.labels] and \
# any(closure_comment in c.body and c.created_at < (datetime.datetime.now() - datetime.timedelta(days=days_to_close)) for c in issue.get_comments()):
# issue.edit(state='closed')
# print(" - Closed issue.")
except Exception as e:
print(f"Error processing issues: {e}")
print("Issue cleanup process completed.")

This script demonstrates adding a “stale” label and a comment to issues identified as stale. A more robust solution might involve a second script run later to close issues that received the stale label and comment but remained inactive.

Step 4: Schedule the Script#

Manual execution defeats the purpose of automation. The script should be scheduled to run periodically. Common methods include:

  • GitHub Actions: A native GitHub feature allowing workflow automation directly within the repository. A workflow file (YAML) can be configured to run the Python script on a schedule (e.g., weekly).
  • Cron Jobs (Linux/macOS): A system utility for scheduling commands or scripts to run periodically.
  • Task Scheduler (Windows): The Windows equivalent of cron.
  • Cloud Functions (AWS Lambda, Azure Functions, Google Cloud Functions): Serverless options for running the script without managing infrastructure.

Using GitHub Actions keeps the automation logic within the repository context, simplifying setup and management.

Real-World Applications and Examples#

Automated issue cleanup is valuable for various scenarios:

  • Managing Large, Popular Repositories: Open source projects or large internal projects often receive numerous issues, some of which are low priority or duplicates that never get explicitly closed. Automation helps surface important issues by archiving inactive ones. Example: A widely used library might automatically label issues with no activity for 90 days as “stale” and close them after another 30 days, significantly reducing the manual burden on maintainers.
  • Keeping Track of Feature Requests or Discussions: Issues created for potential features or general discussions can become irrelevant as the project evolves. Automation can help prune these, perhaps by closing discussion issues after a fixed period or labeling them as “archived.” Example: A project uses issues for feature ideas. After a release cycle is completed, issues tagged for that cycle but not implemented and inactive for 6 months are automatically labeled “future consideration” or closed.
  • Ensuring Project Focus: For projects with specific milestones, issues tied to completed milestones that remain open might indicate oversight. An automated script could list or label such issues for final review. Example: At the end of v2.0 development, a script identifies any open issues still linked to the v2.0 milestone that haven’t been updated in 30 days, adding a “review-for-v2.1” label or closing them.

These examples highlight how tailored cleanup criteria and actions, powered by Python and the GitHub API, contribute to a cleaner, more manageable repository state.

Key Takeaways for Automating GitHub Issue Cleanup#

  • Automated issue cleanup improves repository navigability and team focus by managing stale issues.
  • Python scripting combined with the GitHub API provides a flexible solution.
  • Authentication using Personal Access Tokens with appropriate scopes is essential and must be handled securely (e.g., environment variables).
  • Libraries like PyGitHub simplify interaction with the GitHub API.
  • Defining clear, objective criteria for identifying stale issues (e.g., age, inactivity, labels) is paramount before writing code.
  • Implementing cleanup actions (labeling, commenting, closing) should consider a cautious, multi-stage approach (e.g., label/comment first, close later).
  • Leveraging the GitHub API’s search capabilities is efficient for filtering issues based on complex criteria.
  • Scheduling the script using tools like GitHub Actions ensures regular maintenance without manual intervention.
  • Applying cleanup automation to scenarios like large repositories, feature request tracking, or milestone management yields tangible benefits in project organization.
Automating GitHub Issue Cleanup with Python and GitHub API
https://dev-resources.site/posts/automating-github-issue-cleanup-with-python-and-github-api/
Author
Dev-Resources
Published at
2025-06-30
License
CC BY-NC-SA 4.0