Using Python to Monitor Server Health and Send Alerts with Telegram Bots

2072 words

10 minutes

Using Python to Monitor Server Health and Send Alerts with Telegram Bots

2025-06-29

Tools

Python

/

Monitoring

/

Telegram Bots

/

System Admin

/

DevOps

Python Server Health Monitoring: Real-Time Alerts with Telegram Bots#

Ensuring the continuous and optimal performance of servers is a fundamental requirement for reliable IT infrastructure. Server health monitoring involves tracking various vital metrics to detect potential issues before they lead to critical failures or performance degradation. While numerous commercial and open-source monitoring systems exist, a lightweight, customizable solution can be rapidly deployed for specific needs using scripting languages like Python and readily available communication platforms such as Telegram.

This article details a practical approach for monitoring core server health indicators—such as CPU usage, memory consumption, and disk space—using Python scripts and dispatching real-time alerts via a Telegram bot. This method offers flexibility, cost-effectiveness for simple setups, and direct control over the monitoring logic and alerting mechanisms.

Essential Concepts in Server Health Monitoring#

Server health monitoring focuses on tracking key performance indicators (KPIs) that reflect the operational status and resource utilization of a server. Anomalies in these metrics often indicate underlying problems.

CPU Usage: Measures the percentage of time the CPU is busy executing processes. High CPU usage can signify heavy load, inefficient applications, or runaway processes, potentially leading to slow response times.
Memory Usage (RAM): Tracks the amount of physical memory currently in use. Excessive memory consumption or swapping (moving data between RAM and disk) can severely impact performance and stability.
Disk Space: Monitors the amount of available storage space on disk partitions. Running out of disk space can halt applications, prevent logging, and cause system instability.
Network Activity: While not the primary focus of this Python script using psutil, network monitoring tracks data throughput and errors, crucial for services relying on network communication. (More advanced monitoring would involve specific network tools or libraries).
Running Processes: Keeping track of essential system processes or application instances ensures that critical services are operational. (This script focuses on resource metrics but can be extended to check specific processes).

Python’s rich ecosystem, including libraries like psutil, makes it an excellent choice for accessing these system-level metrics in a cross-platform manner. psutil (process and system utilities) provides an interface to retrieve information on processes and system utilization (CPU, memory, disks, network, sensors) in a portable way by implementing many notorious Unix and Windows command-line tools.

Telegram bots offer a convenient and accessible way to receive alerts. The Telegram Bot API allows applications to interact with Telegram users and groups programmatically, sending messages, notifications, and other content. This enables instant notifications directly to a mobile device or desktop via the Telegram application.

Implementing Server Health Monitoring and Telegram Alerts with Python#

Setting up this monitoring system involves creating a Telegram bot, writing a Python script to check server metrics and send messages, and scheduling the script to run periodically.

Step 1: Creating a Telegram Bot and Obtaining API Credentials#

A Telegram bot is required to send messages to a user or group.

Find BotFather: Open the Telegram application and search for the user @BotFather. This is the official bot used to create and manage other bots.
Create a New Bot: Start a chat with @BotFather and use the command /newbot. Follow the instructions:
- Choose a name for the bot (e.g., “ServerMonitorBot”).
- Choose a unique username ending in “bot” (e.g., “MyServerMonitor_bot”).
Obtain the API Token: Upon successful creation, @BotFather will provide an HTTP API token. This token is essential for sending messages via the bot. Keep this token secure, as it grants control over the bot. Example token format: 123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEF.
Obtain the Chat ID: The bot needs to know where to send messages. This requires obtaining the chat ID of the conversation with the bot (either a direct message with the bot or a group where the bot has been added and given permission to send messages).
- Start a conversation with your new bot or add it to a group.
- Send a message to the bot (or in the group where the bot is present).
- Access the following URL in a web browser, replacing <YourBOTToken> with your actual bot token: https://api.telegram.org/bot<YourBOTToken>/getUpdates
- This will return a JSON object. Look for the chat object within the message entry. The id field within the chat object is the required chat ID. It will be a large number, possibly negative for group chats. Example: "id": 123456789.

Step 2: Installing Required Python Libraries#

The Python script will utilize the psutil library to access server metrics and the requests library to interact with the Telegram Bot API.

Install these libraries using pip:

1
pip install psutil requests

Ensure pip is installed and updated if necessary.

Step 3: Writing the Python Monitoring Script#

The core of the system is a Python script that performs the monitoring checks and sends alerts.

1
import psutil
2
import requests
3
import time
4
import os
5

6
# --- Configuration ---
7
# Replace with your actual Telegram bot token and chat ID
8
TELEGRAM_BOT_TOKEN = os.environ.get('TELEGRAM_BOT_TOKEN')
9
TELEGRAM_CHAT_ID = os.environ.get('TELEGRAM_CHAT_ID')
10

11
# Define thresholds (percentages)
12
CPU_THRESHOLD = 80
13
RAM_THRESHOLD = 90
14
DISK_THRESHOLD = 90 # Percentage of disk used
15

16
# Define time intervals (seconds) to avoid spamming alerts
17
ALERT_COOLDOWN_SECONDS = 3600 # 1 hour cooldown per alert type
18

19
# Store the last time an alert was sent for each type
20
last_alert_time = {
21
    'cpu': 0,
22
    'ram': 0,
23
    'disk': 0
24
}
25

26
# --- Functions ---
27

28
def send_telegram_message(message):
29
    """Sends a message to the configured Telegram chat."""
30
    if not TELEGRAM_BOT_TOKEN or not TELEGRAM_CHAT_ID:
31
        print("Telegram token or chat ID not configured.")
32
        return False
33

34
    api_url = f"https://api.telegram.org/bot{TELEGRAM_BOT_TOKEN}/sendMessage"
35
    try:
36
        response = requests.post(api_url, json={'chat_id': TELEGRAM_CHAT_ID, 'text': message})
37
        response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
38
        print(f"Message sent successfully: {message}")
39
        return True
40
    except requests.exceptions.RequestException as e:
41
        print(f"Error sending message: {e}")
42
        return False
43

44
def check_threshold_and_alert(alert_type, current_value, threshold, unit="%", disk_partition="/"):
45
    """Checks if a value exceeds a threshold and sends an alert if cooldown allows."""
46
    global last_alert_time
47
    current_time = time.time()
48

49
    if current_value > threshold:
50
        # Check cooldown
51
        if current_time - last_alert_time[alert_type] > ALERT_COOLDOWN_SECONDS:
52
            server_hostname = os.uname().nodename # Get hostname for context
53
            alert_message = f"🚨 **ALERT: High {alert_type.upper()} Usage** on `{server_hostname}`!"
54
            if alert_type == 'disk':
55
                 alert_message += f" Partition: `{disk_partition}`."
56
            alert_message += f"\nCurrent Value: `{current_value:.2f}{unit}` (Threshold: `{threshold}{unit}`)."
57
            alert_message += "\nPlease investigate immediately."
58

59
            if send_telegram_message(alert_message):
60
                last_alert_time[alert_type] = current_time # Update cooldown timer
61
            return True # Alert sent
62
        else:
63
            print(f"Threshold for {alert_type} exceeded ({current_value:.2f}{unit}) but still in cooldown.")
64
            return False # Threshold exceeded but no alert sent due to cooldown
65
    else:
66
        print(f"{alert_type.upper()} usage is normal ({current_value:.2f}{unit}).")
67
        return False # Threshold not exceeded
68

69
def monitor_server():
70
    """Monitors server health metrics and sends alerts if thresholds are exceeded."""
71
    print("Starting server health check...")
72

73
    # Check CPU Usage
74
    # interval=1 takes a 1-second average, instead of returning a potentially misleading instantaneous value
75
    cpu_percent = psutil.cpu_percent(interval=1)
76
    check_threshold_and_alert('cpu', cpu_percent, CPU_THRESHOLD)
77

78
    # Check RAM Usage
79
    mem_info = psutil.virtual_memory()
80
    ram_percent = mem_info.percent
81
    check_threshold_and_alert('ram', ram_percent, RAM_THRESHOLD)
82

83
    # Check Disk Usage (root partition '/')
84
    try:
85
        disk_info = psutil.disk_usage('/')
86
        disk_percent = disk_info.percent
87
        check_threshold_and_alert('disk', disk_percent, DISK_THRESHOLD, disk_partition="/")
88
    except Exception as e:
89
        # Handle cases where '/' might not be a valid partition or other disk errors
90
        print(f"Error checking disk usage for '/': {e}")
91
        # Potentially send an alert about monitoring failure itself?
92

93
    # You can add checks for other partitions if needed
94
    # Example: psutil.disk_usage('/var').percent
95

96
    print("Server health check finished.")
97

98
# --- Main execution ---
99
if __name__ == "__main__":
100
    # It's safer to load credentials from environment variables
101
    # export TELEGRAM_BOT_TOKEN='your_token_here'
102
    # export TELEGRAM_CHAT_ID='your_chat_id_here'
103
    if not TELEGRAM_BOT_TOKEN or not TELEGRAM_CHAT_ID:
104
        print("FATAL: TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID environment variables must be set.")
105
    else:
106
        monitor_server()

Code Explanation:

Configuration: Defines the Telegram bot token and chat ID, ideally loaded from environment variables for security. Sets the thresholds for CPU, RAM, and Disk usage as percentages. Includes a ALERT_COOLDOWN_SECONDS to prevent sending too many alerts for a persistent issue within a short period.
send_telegram_message(message): A helper function that takes a string message, constructs the Telegram API URL, and sends a POST request using the requests library. Includes basic error handling.
check_threshold_and_alert(...): This function checks if the current_value exceeds the defined threshold. If it does, it checks the last_alert_time for that specific alert_type. If the cooldown period has passed, it formats an alert message including the server’s hostname (using os.uname().nodename) and sends it via send_telegram_message, then updates the last_alert_time.
monitor_server(): The main monitoring logic. It uses psutil.cpu_percent(), psutil.virtual_memory().percent, and psutil.disk_usage('/').percent to get the current usage percentages. It then calls check_threshold_and_alert for each metric to determine if an alert is necessary.
Main Execution Block (if __name__ == "__main__":): Ensures the monitor_server() function is called only when the script is executed directly. It includes a check to ensure environment variables are set.

Security Note: Storing sensitive information like API tokens directly in the script is not recommended, especially if the script might be shared or version-controlled. Using environment variables (as shown) or a configuration file with restricted permissions is a better practice.

Step 4: Scheduling the Script#

For continuous monitoring, the Python script needs to be executed at regular intervals. On Linux systems, cron is a standard utility for scheduling tasks.

Open crontab: Open the crontab editor for the current user:
Terminal window
```
1
crontab -e
```
Add a cron job: Add a line to the crontab file to run the script periodically. For example, to run the script every 5 minutes:
```
1
*/5 * * * * /usr/bin/env python3 /path/to/your/script.py >> /var/log/server_monitor.log 2>&1
```

Explanation of the cron entry:

*/5 * * * *: This specifies the schedule: every 5 minutes (* on the day of the month, month, day of the week).
/usr/bin/env python3: Ensures the script is executed using the python3 interpreter, relying on the system’s environment to find the correct Python executable.
/path/to/your/script.py: The absolute path to the Python script file.
>> /var/log/server_monitor.log 2>&1: Redirects standard output and standard error to a log file. This is helpful for debugging. Ensure the log file path is writable by the user running the cron job.

Remember to set the necessary environment variables (TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID) for the user under which the cron job runs. This can be done in the user’s .bashrc or .profile file, or by adding them directly in the crontab entry itself (though less clean), e.g., TELEGRAM_BOT_TOKEN='...' TELEGRAM_CHAT_ID='...' /usr/bin/env python3 ....

On systems using systemd, systemd timers offer a more modern and robust alternative to cron.

Real-World Applications and Considerations#

This Python-based monitoring system provides a flexible foundation applicable in various scenarios:

Small-Scale Deployments: For monitoring a few critical servers in a small business or personal project where setting up a full-fledged monitoring suite is overkill.
Specific Resource Checks: Monitoring only the most crucial metrics for a particular application (e.g., ensuring sufficient disk space for a database log partition).
Custom Alerts: Tailoring alert messages with specific details or triggering different actions based on the type and severity of the issue.
Ephemeral Environments: Quickly deploying basic monitoring in cloud instances or containers that might not persist long-term monitoring agents.

Example: Monitoring a simple web server. A script like the one above can be scheduled to run every 5 minutes. If a sudden traffic surge or misconfiguration causes high CPU usage (>80%), the script detects this on its next run. Since the CPU threshold is exceeded and the cooldown has passed, an alert message like ”🚨 ALERT: High CPU Usage on webserver-01! Current Value: 85.50% (Threshold: 80%). Please investigate immediately.” is sent instantly to the Telegram recipient, prompting immediate action to prevent downtime or performance issues. Similarly, if a log file fills up a disk partition, the disk usage check triggers an alert.

Limitations: While powerful for targeted checks, this simple approach has limitations compared to dedicated monitoring platforms:

No Historical Data/Graphing: It primarily provides point-in-time checks and alerts, lacking capabilities for collecting, storing, and visualizing historical performance data for trend analysis.
Lack of Centralization: Managing scripts on many servers becomes complex. Centralized solutions allow monitoring from a single dashboard.
Agent Management: Scripts need to be deployed and updated on each server.
Advanced Monitoring: Does not inherently support complex checks like application-specific metrics, log analysis, dependency mapping, or sophisticated anomaly detection.
Alert Routing/Escalation: Lacks built-in features for routing alerts to different teams or escalating issues if unaddressed.

Despite these limitations, the Python + Telegram approach offers a quick, understandable, and effective way to implement custom server health checks and ensure timely notifications for basic resource issues.

Key Takeaways#

Server health monitoring is critical for detecting issues before they cause significant impact.
Python, with libraries like psutil, provides a flexible way to access server resource metrics programmatically.
Telegram bots offer a free and convenient platform for receiving real-time server alerts.
The implementation involves creating a Telegram bot, writing a Python script using psutil and requests, and scheduling the script (e.g., with cron).
Thresholds for CPU, RAM, and Disk usage must be configured to trigger alerts based on acceptable limits.
Implementing a cooldown period is essential to prevent alert spamming for persistent issues.
Securing the Telegram bot token and chat ID, ideally via environment variables, is important.
This method is suitable for simple, custom monitoring tasks on a small scale but lacks features of full monitoring systems like historical data analysis or centralized management.