1734 words
9 minutes
Creating a Python Tool to Monitor GitHub Stars and Repo Changes

Building a Python Tool for Monitoring GitHub Repository Stars and Changes#

Monitoring activity on GitHub repositories offers valuable insights for developers, project managers, and businesses. Tracking metrics like the number of stars indicates project popularity and community interest. Observing repository changes, such as commits and releases, reveals development pace, feature additions, and maintenance activity. A programmatic approach using a Python tool provides an automated and efficient way to collect and analyze this information from multiple repositories.

This article outlines the process of creating a basic Python tool leveraging the GitHub API to monitor specified repositories, focusing on star counts and recent updates.

Why Monitor GitHub Repositories?#

Tracking GitHub repository data provides several strategic advantages:

  • Market and Trend Analysis: Monitoring star counts of projects within a specific domain can highlight emerging technologies, popular frameworks, or trending tools.
  • Competitor Analysis: Observing the activity on competitors’ open-source projects provides insights into their development focus, release cycles, and community engagement.
  • Dependency Management: Keeping track of updates (commits, releases) in libraries or frameworks used in internal projects helps anticipate potential compatibility issues or identify opportunities for upgrades.
  • Project Tracking: For maintainers or contributors, monitoring activity ensures awareness of recent contributions, issues, or discussions.
  • Community Engagement Metrics: While stars are one metric, monitoring issues opened, pull requests, and contributors gives a broader picture of community health.

Automating this process with a Python tool allows for scheduled data collection and comparison over time, providing a historical perspective on repository evolution.

Essential Concepts for Repository Monitoring#

Building a tool to monitor GitHub repositories programmatically requires understanding a few core concepts:

  • GitHub API: GitHub provides a powerful Application Programming Interface (API) that allows external applications to interact with GitHub data. This tool will primarily use the GitHub REST API to fetch repository information.
  • API Authentication: To make requests to the GitHub API, especially to avoid stringent rate limits or access certain data, authentication is necessary. The recommended method for personal tools is using a Personal Access Token (PAT). PATs grant specific permissions (scopes) to interact with the API on behalf of a user.
  • API Rate Limits: GitHub imposes limits on the number of API requests that can be made within a specific timeframe. Authenticated requests have significantly higher limits than unauthenticated ones (typically 5,000 requests per hour for authenticated users vs. 60 for unauthenticated). The tool needs to be mindful of these limits.
  • Repository Data: The API exposes various endpoints to retrieve information about a repository, including star gazers count, commit history, releases, branches, contributors, etc.
  • Data Storage: To track changes over time, the collected data (like star count, latest commit/release) needs to be stored persistently between monitoring runs. Simple file formats like CSV or JSON, or a lightweight database, can serve this purpose.

Building the Python Tool: A Step-by-Step Guide#

Creating a basic Python tool involves setting up API access, defining which repositories to monitor, making API requests, processing the data, and storing it for comparison.

Step 1: Set up GitHub API Access (Personal Access Token)#

To make authenticated API requests, a Personal Access Token (PAT) is required.

  1. Navigate to Settings on GitHub.com (usually under the user profile dropdown).
  2. In the left sidebar, select Developer settings.
  3. Choose Personal access tokens, then Tokens (classic).
  4. Click Generate new token and select Generate new token (classic).
  5. Give the token a descriptive name (e.g., “GitHub Monitor Tool”).
  6. Set an expiration date (recommended for security).
  7. Under Select scopes, choose the necessary permissions. For monitoring public repository stars and basic info, the public_repo scope is usually sufficient. This grants read access to public repository information.
  8. Click Generate token.
  9. Copy the token immediately. It will not be shown again after leaving the page. Store this token securely; treat it like a password.

Security Note: Never hardcode API tokens directly in scripts. Use environment variables or a secure configuration file to store and access the token.

Step 2: Choose Repositories to Monitor#

The tool needs a list of repositories to track. This can be a simple Python list containing strings in the format “owner/repo”.

repositories_to_monitor = [
"openai/openai-python",
"facebookresearch/detectron2",
"google/go-github",
"microsoft/vscode"
]

Step 3: Make API Calls using Python#

The requests library is standard for making HTTP requests in Python. Authenticated requests are made by including the PAT in the Authorization header.

To fetch repository details (including star count): Endpoint: GET /repos/{owner}/{repo} Documentation: https://docs.github.com/en/rest/repos/repos#get-a-repository

To fetch recent commits: Endpoint: GET /repos/{owner}/{repo}/commits Documentation: https://docs.github.com/en/rest/commits/commits#list-commits (Can limit results using per_page and page parameters)

To fetch recent releases: Endpoint: GET /repos/{owner}/{repo}/releases Documentation: https://docs.github.com/en/rest/releases/releases#list-releases (Can limit results using per_page and page parameters)

A helper function can be created to handle authenticated requests:

import requests
import os # Recommended for getting PAT from environment variable
GITHUB_API_URL = "https://api.github.com"
# Load token from environment variable for security
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
def make_github_request(endpoint):
"""Makes an authenticated GET request to the GitHub API."""
headers = {
"Authorization": f"token {GITHUB_TOKEN}",
"Accept": "application/vnd.github.v3+json" # Recommended header
}
url = f"{GITHUB_API_URL}{endpoint}"
try:
response = requests.get(url, headers=headers)
response.raise_for_status() # Raise an exception for bad status codes
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error making API request to {url}: {e}")
return None
def get_repo_info(repo_full_name):
"""Fetches stars, latest commit SHA and date, and latest release tag and date."""
repo_data = make_github_request(f"/repos/{repo_full_name}")
if repo_data:
stars = repo_data.get("stargazers_count")
# Fetch latest commit - retrieve only the first one (latest)
commits_data = make_github_request(f"/repos/{repo_full_name}/commits?per_page=1")
latest_commit_sha = None
latest_commit_date = None
if commits_data and isinstance(commits_data, list) and len(commits_data) > 0:
latest_commit_sha = commits_data[0].get("sha")
# Use committer date, author date is less reliable for 'last updated'
latest_commit_date = commits_data[0].get("commit", {}).get("committer", {}).get("date")
# Fetch latest release - retrieve only the first one (latest)
releases_data = make_github_request(f"/repos/{repo_full_name}/releases?per_page=1")
latest_release_tag = None
latest_release_date = None
if releases_data and isinstance(releases_data, list) and len(releases_data) > 0:
latest_release_tag = releases_data[0].get("tag_name")
latest_release_date = releases_data[0].get("published_at")
return {
"repo": repo_full_name,
"stars": stars,
"latest_commit_sha": latest_commit_sha,
"latest_commit_date": latest_commit_date,
"latest_release_tag": latest_release_tag,
"latest_release_date": latest_release_date,
"timestamp": datetime.datetime.now().isoformat() # Record when data was fetched
}
return None
# Example usage (after defining get_repo_info)
# collected_data = get_repo_info("openai/openai-python")
# if collected_data:
# print(collected_data)

Note: The datetime module is needed for timestamps.

import datetime
# ... (rest of the make_github_request and get_repo_info functions) ...

Step 4: Process and Store Data#

The collected data for each repository needs to be stored. To track changes, the current data is compared against the previously stored data. A simple approach is to use a JSON file to store the monitoring history.

import json
import os
import datetime
import requests # Assume requests is already imported
# ... (make_github_request and get_repo_info functions) ...
DATA_FILE = "github_monitoring_data.json"
def load_monitoring_data(filename=DATA_FILE):
"""Loads historical monitoring data from a JSON file."""
if os.path.exists(filename):
with open(filename, 'r') as f:
try:
return json.load(f)
except json.JSONDecodeError:
print(f"Error decoding JSON from {filename}. Starting fresh.")
return {}
return {}
def save_monitoring_data(data, filename=DATA_FILE):
"""Saves current monitoring data to a JSON file."""
with open(filename, 'w') as f:
json.dump(data, f, indent=4)
def monitor_repositories(repo_list):
"""Monitors a list of repositories, checks for changes, and stores data."""
history = load_monitoring_data()
current_data = {}
print("Starting monitoring run...")
for repo in repo_list:
print(f"Fetching data for {repo}...")
repo_info = get_repo_info(repo)
if repo_info:
current_data[repo] = repo_info
# Compare with historical data
if repo in history:
previous_info = history[repo][-1] # Get last entry for this repo
# Check for changes
if repo_info["stars"] != previous_info.get("stars"):
print(f" 🌟 Stars changed for {repo}: {previous_info.get('stars')} -> {repo_info['stars']}")
if repo_info["latest_commit_sha"] != previous_info.get("latest_commit_sha"):
print(f" ⬆️ New commit detected for {repo}: {repo_info['latest_commit_sha'][:7]}...") # Show short SHA
# Could potentially fetch commit message here if needed
if repo_info["latest_release_tag"] != previous_info.get("latest_release_tag"):
print(f" 🎉 New release detected for {repo}: {repo_info['latest_release_tag']}")
# Could fetch release details (name, body) if needed
# Append current data to history for this repo
if repo not in history:
history[repo] = []
history[repo].append(repo_info)
else:
print(f" Failed to retrieve data for {repo}.")
save_monitoring_data(history)
print("Monitoring run finished. Data saved.")
# Main execution block
if __name__ == "__main__":
# Ensure GITHUB_TOKEN environment variable is set before running
if not GITHUB_TOKEN:
print("Error: GITHUB_TOKEN environment variable not set.")
print("Please set it to your GitHub Personal Access Token.")
else:
repositories_to_monitor = [
"openai/openai-python",
"facebookresearch/detectron2",
"google/go-github",
"microsoft/vscode",
"pallets/flask" # Add another example
]
monitor_repositories(repositories_to_monitor)

Note: This simple storage appends data to a list for each repo, which can grow large. For long-term monitoring or many repos, a database would be more efficient.

Step 5: Reporting or Notification#

The current tool prints changes to the console. For a more practical application, this could be extended to:

  • Write changes to a log file.
  • Send an email notification.
  • Post a message to a Slack channel or Discord server.
  • Generate a report (e.g., CSV, HTML).

This requires integrating with external services or libraries (e.g., smtplib for email, slack_sdk for Slack). The core logic of detecting a change remains the same; only the action taken after detection differs.

Concrete Examples and Practical Application#

Consider a software development team that relies heavily on several key open-source libraries. Monitoring these dependencies using the Python tool provides:

  • Proactive Updates: Receive alerts when new releases of critical libraries are published. This allows the team to evaluate and plan upgrades faster, reducing technical debt and ensuring access to the latest features and security patches.
  • Impact Assessment: If a monitored library shows a sudden surge in stars, it might indicate growing popularity or adoption, potentially influencing decisions about its continued use or deeper integration.
  • Risk Management: Observing a sudden decrease in commit activity or lack of response to issues might signal that a library is becoming unmaintained, prompting the team to seek alternatives or consider contributing.

Another example involves businesses tracking the open-source contributions or projects of competitors. If a competitor suddenly open-sources a new tool and it quickly gains stars, it suggests they are entering or investing in a new technology area or strategy, providing valuable market intelligence.

Using the tool to monitor a portfolio of personal projects helps track their individual growth (stars) and ensures consistency in development pace by highlighting periods of inactivity (commits).

Key Takeaways#

Creating a Python tool to monitor GitHub repository stars and changes offers significant benefits for tracking projects, competitors, and dependencies.

  • The GitHub API is the programmatic interface for accessing repository data.
  • Personal Access Tokens (PATs) are essential for authenticated API requests, providing higher rate limits and access to certain data. Handle PATs securely, ideally using environment variables.
  • The requests library simplifies making HTTP requests to the API in Python.
  • Endpoints like /repos/{owner}/{repo}, /repos/{owner}/{repo}/commits, and /repos/{owner}/{repo}/releases provide data on stars, commit history, and releases.
  • Storing historical data (e.g., in a JSON file) is necessary to detect changes over time.
  • Comparing current data with previous records allows for identifying and reporting changes in star count, new commits, or new releases.
  • The basic tool can be extended with notification features (email, messaging apps) for real-time alerts.
  • Monitoring provides actionable insights for dependency management, competitor analysis, and project tracking.

This framework provides a foundation for building a customized monitoring solution tailored to specific needs, enabling automated collection and analysis of valuable GitHub repository data.

Creating a Python Tool to Monitor GitHub Stars and Repo Changes
https://dev-resources.site/posts/creating-a-python-tool-to-monitor-github-stars-and-repo-changes/
Author
Dev-Resources
Published at
2025-06-29
License
CC BY-NC-SA 4.0