1956 words

10 minutes

How to Use Python and the Twitter API to Analyze Tweet Engagement Over Time

2025-06-30

Tutorial

Python

/

Twitter API

/

Analytics

/

Data

/

Visualization

Analyzing Tweet Engagement Over Time with Python and the Twitter API#

Tweet engagement analysis over time provides valuable insights into content performance, audience behavior, and the effectiveness of social media strategies. Engagement refers to the interactions users have with tweets, including likes, retweets, replies, and quotes. Understanding how these interactions fluctuate over different periods helps identify trends, optimal posting schedules, and successful content types. Analyzing this data programmatically, particularly with Python and the Twitter API, allows for scalable, automated, and in-depth exploration of these patterns.

The Twitter API (Application Programming Interface) is a set of rules and specifications that allow software applications to interact with Twitter’s data and functionality. It provides structured access to tweets, user information, and engagement metrics, making it a powerful tool for researchers, developers, and analysts. Python, with its rich ecosystem of libraries like tweepy for interacting with the Twitter API, pandas for data manipulation, and matplotlib or seaborn for data visualization, offers a flexible environment to perform this analysis.

Essential Concepts in Tweet Engagement Analysis#

Analyzing tweet engagement over time relies on understanding key concepts:

Tweet Engagement Metrics: The primary data points representing interaction. The Twitter API v2 provides detailed counts for:
- like_count: Number of times a tweet was liked.
- retweet_count: Number of times a tweet was retweeted.
- reply_count: Number of replies to a tweet.
- quote_count: Number of times a tweet was quoted.
- impression_count (sometimes referred to as views): Number of times a tweet was seen. This metric is often indicative of reach, while the others measure active interaction.
Time Series Data: Tweet data, inherently linked to a created_at timestamp, forms a time series. Analyzing engagement over time involves observing how metrics change based on creation time, day of the week, hour of the day, or specific date ranges.
Data Aggregation: Raw tweet data is often too granular. Aggregating engagement metrics by hour, day, week, or month reveals broader trends and reduces noise. For example, summing total likes for all tweets posted on a specific day.
Trend Identification: Observing patterns in aggregated engagement data, such as consistent peaks on certain days or a gradual increase in engagement following a campaign launch.
Python Libraries:
- Tweepy: A user-friendly library for accessing the Twitter API. It handles authentication and provides methods for fetching tweets and user data.
- Pandas: Essential for structuring collected tweet data into DataFrames, enabling efficient cleaning, transformation, and aggregation.
- Matplotlib/Seaborn: Libraries for creating static visualizations of engagement trends over time.

Prerequisites and Setup#

To begin analyzing tweet engagement with Python and the Twitter API, several prerequisites are necessary:

Twitter Developer Account: Obtain access to the Twitter API by applying for a developer account via the Twitter Developer Portal. This process requires agreeing to Twitter’s terms of service.
API Credentials: Within the developer project, create an application to generate API keys and tokens (Consumer Key, Consumer Secret, Access Token, Access Token Secret, or Bearer Token, depending on API version and use case). These credentials authenticate requests to the API. Store these securely and never expose them publicly.
Python Environment: Have Python installed on a system. A virtual environment is recommended to manage project dependencies.
Required Libraries: Install the necessary Python libraries using pip:
Terminal window
```
1
pip install tweepy pandas matplotlib seaborn
```

Step-by-Step Guide: Using Python and the Twitter API#

Analyzing tweet engagement over time involves several distinct steps, from collecting data to visualization.

Step 1: Connecting to the Twitter API using Tweepy#

API credentials obtained from the Twitter Developer Portal are used to establish a connection. The Bearer Token is suitable for making read-only requests, such as fetching tweets and their engagement metrics.

1
import tweepy
2
import os # Recommended for securely storing keys
3

4
# Replace with actual Bearer Token - use environment variables for security
5
bearer_token = os.environ.get("TWITTER_BEARER_TOKEN")
6

7
if not bearer_token:
8
    print("Error: TWITTER_BEARER_TOKEN environment variable not set.")
9
    # Exit or raise exception
10

11
client = tweepy.Client(bearer_token)

Step 2: Defining the Data Collection Strategy#

Determine which tweets to analyze (e.g., tweets from a specific user’s timeline, tweets matching a search query) and the desired time frame. The Twitter API v2 supports filtering by date and time. Specifying the tweet_fields parameter is crucial to ensure engagement metrics are included in the response.

User Timeline: Fetch tweets from a specific user ID.
Search Tweets: Fetch tweets based on keywords, hashtags, mentions, etc.

Parameters like start_time and end_time (ISO 8601 format) define the analysis window. tweet_fields should include 'public_metrics' to get engagement counts.

Step 3: Collecting Tweet Data#

Use the Tweepy client to fetch tweets. The get_users_tweets method is used for timelines, and search_recent_tweets or search_all_tweets (requires Academic Research access or Elevated access with specific permissions) for search queries. Iteration and pagination are necessary for collecting more than the default number of tweets (usually 10 or 100 per request).

This example fetches recent tweets from a user timeline:

1
import datetime
2
import pytz # Use pytz for timezone-aware datetimes
3

4
user_id = 2244994945 # Example: TwitterDev user ID
5

6
# Define time frame (e.g., last 30 days)
7
utc = pytz.UTC
8
end_time = utc.localize(datetime.datetime.utcnow())
9
start_time = end_time - datetime.timedelta(days=30)
10

11
tweet_list = []
12

13
try:
14
    for tweet in tweepy.Paginator(client.get_users_tweets,
15
                                   id=user_id,
16
                                   start_time=start_time,
17
                                   end_time=end_time,
18
                                   tweet_fields=['created_at', 'public_metrics'],
19
                                   max_results=100).flatten(): # max_results can be 5-100
20
        tweet_list.append({
21
            'id': tweet.id,
22
            'text': tweet.text,
23
            'created_at': tweet.created_at,
24
            'like_count': tweet.public_metrics['like_count'],
25
            'retweet_count': tweet.public_metrics['retweet_count'],
26
            'reply_count': tweet.public_metrics['reply_count'],
27
            'quote_count': tweet.public_metrics['quote_count'],
28
            'impression_count': tweet.public_metrics.get('impression_count', 0) # Impressions might not always be available depending on data
29
        })
30
except Exception as e:
31
    print(f"Error collecting tweets: {e}")
32

33
print(f"Collected {len(tweet_list)} tweets.")

This code iterates through pages of results using tweepy.Paginator and appends relevant data to a list.

Step 4: Structuring the Data with Pandas#

Convert the list of dictionaries into a Pandas DataFrame for efficient data manipulation.

1
import pandas as pd
2

3
tweets_df = pd.DataFrame(tweet_list)
4

5
print("DataFrame Head:")
6
print(tweets_df.head())
7
print("\nDataFrame Info:")
8
print(tweets_df.info())

Step 5: Data Cleaning and Preprocessing#

Ensure the created_at column is in datetime format and set it as the DataFrame index for time-based operations.

1
# Ensure created_at is a datetime object and set as index
2
tweets_df['created_at'] = pd.to_datetime(tweets_df['created_at'])
3
tweets_df = tweets_df.set_index('created_at')
4

5
print("\nDataFrame Info after processing timestamp and index:")
6
print(tweets_df.info())

Step 6: Analyzing Engagement Over Time#

Aggregate the data by different time periods. Resampling is a common Pandas operation for this. For example, resampling by day (‘D’) allows analysis of daily engagement.

1
# Aggregate engagement metrics by day
2
daily_engagement = tweets_df[['like_count', 'retweet_count', 'reply_count', 'quote_count', 'impression_count']].resample('D').sum()
3

4
# Calculate average engagement per tweet per day (handle potential division by zero if no tweets on a day)
5
tweets_per_day = tweets_df.resample('D').size()
6
average_daily_engagement = daily_engagement.divide(tweets_per_day, axis=0).fillna(0) # Fill NaN with 0 for days with no tweets
7

8
print("\nDaily Total Engagement:")
9
print(daily_engagement.head())
10
print("\nDaily Average Engagement per Tweet:")
11
print(average_daily_engagement.head())
12

13
# Analyze engagement by hour of the day (across the entire period)
14
tweets_df['hour_of_day'] = tweets_df.index.hour
15
hourly_engagement = tweets_df.groupby('hour_of_day')[['like_count', 'retweet_count', 'reply_count', 'quote_count', 'impression_count']].mean()
16

17
print("\nAverage Engagement by Hour of Day:")
18
print(hourly_engagement)

Step 7: Visualizing Engagement Trends#

Visualize the aggregated data to identify trends. Line plots are effective for showing changes over time, while bar plots can compare engagement across categories like hours or days of the week.

1
import matplotlib.pyplot as plt
2
import seaborn as sns
3

4
sns.set_theme(style="whitegrid")
5

6
# Plot daily total engagement
7
plt.figure(figsize=(12, 6))
8
sns.lineplot(data=daily_engagement[['like_count', 'retweet_count', 'reply_count', 'quote_count']]) # Excluding impressions for clarity on interaction counts
9
plt.title('Total Tweet Engagement Metrics Over Time (Daily)')
10
plt.xlabel('Date')
11
plt.ylabel('Total Count')
12
plt.xticks(rotation=45)
13
plt.tight_layout()
14
plt.show()
15

16
# Plot average engagement by hour of day (e.g., just likes and retweets)
17
plt.figure(figsize=(10, 5))
18
sns.barplot(data=hourly_engagement[['like_count', 'retweet_count']])
19
plt.title('Average Tweet Engagement by Hour of Day')
20
plt.xlabel('Hour of Day (24-hour format)')
21
plt.ylabel('Average Count per Tweet')
22
plt.xticks(range(0, 24))
23
plt.tight_layout()
24
plt.show()

These plots visually represent the daily volume of interactions and the average engagement received by tweets based on the hour they were posted.

Real-World Example: Analyzing a Brand’s Twitter Performance#

Consider a hypothetical company, “Tech Innovations Inc.,” that wants to understand the performance of its tweets over the past quarter (3 months). They primarily use Twitter for product announcements, industry news sharing, and customer engagement.

Using the steps outlined:

Connect to API: Use the company’s Twitter Developer account credentials to connect via Tweepy.
Collect Data: Fetch all tweets from the Tech Innovations Inc. Twitter handle (@TechInnovInc) for the specified 3-month period using get_users_tweets, ensuring public_metrics are included.
Structure Data: Load the collected tweets into a Pandas DataFrame with created_at and engagement counts.
Analyze Over Time:
- Daily Trends: Resample the data by day to see how total and average likes, retweets, etc., vary throughout the quarter. A spike in retweets on a specific day might correlate with a product launch announcement.
- Weekly Patterns: Resample by week (‘W’) to see if engagement is consistently higher or lower on certain weeks.
- Hourly Patterns: Group tweets by the hour they were posted to find which hours tend to receive the highest average engagement. This could reveal optimal posting times.
Visualize: Generate line plots for daily/weekly trends and bar plots for hourly averages.

Potential Insights from this analysis:

A clear peak in retweets and likes aligns with the date of a major product announcement, quantifying its immediate social impact.
Average engagement metrics (likes, retweets) are consistently higher for tweets posted between 10 AM and 2 PM local time, suggesting this is when the target audience is most active.
Tweets containing links to blog posts receive moderate likes but low retweets, while tweets with embedded videos have higher impressions and likes but fewer replies.
Engagement dipped during a specific week, correlating with a known issue or lack of significant content.

This analysis provides evidence-based insights that Tech Innovations Inc. can use to refine its content strategy (e.g., promote videos more, test posting major announcements during peak hours, analyze content types that perform best).

Interpreting Results and Actionable Insights#

Interpreting the results of tweet engagement analysis over time requires context and correlation with external events or internal actions.

Correlation with Events: Did engagement spike after a specific campaign, announcement, or trending topic? Did it drop during quiet periods or negative events?
Content Type Performance: Analyze subsets of tweets (e.g., tweets with images vs. text-only, tweets about product A vs. product B) to see which content formats or topics drive higher engagement over time. This requires adding content analysis (e.g., checking for media, keywords) to the process.
Optimal Timing: The analysis of engagement by hour and day provides data to inform posting schedules. Higher average engagement at certain times suggests that posting at those times might reach a more active audience.
Audience Behavior: Changes in engagement patterns over longer periods might reflect shifts in audience online habits or evolving interests.

Actionable Steps Based on Insights:

Adjust Posting Schedule: If analysis shows peak engagement at specific hours, test posting critical content during those times.
Refine Content Strategy: Focus on content types or topics that historically receive higher engagement. Experiment with formats that underperform to improve them or reduce their frequency.
Measure Campaign Impact: Use time-based analysis to quantify the engagement driven by specific marketing campaigns or initiatives.
Benchmarking: Compare current engagement trends against historical data or industry averages (where available) to measure growth or identify areas for improvement.

Limitations:

API Rate Limits: The Twitter API has limits on the number of requests per time window, which can restrict the volume of data collected, especially for historical analysis (search_all_tweets has higher limits but requires elevated access).
Data Availability: The availability of historical data can be limited depending on the API tier.
Defining “Engagement”: While metrics are provided, the true value of engagement depends on specific goals (e.g., is a retweet more valuable than a like?). Analysis often involves creating custom engagement scores.
Causation vs. Correlation: Identifying that engagement is high at a certain time doesn’t definitively prove that time causes the high engagement; other factors like the content posted at that time are also critical.

Key Takeaways#

Analyzing tweet engagement over time with Python and the Twitter API provides data-driven insights into social media performance.

API Access: A Twitter Developer account and API credentials are required to collect tweet data programmatically.
Python Libraries: Tweepy connects to the API, Pandas structures and manipulates data, and Matplotlib/Seaborn visualize trends.
Data Collection: Fetch tweets within a specified date range, ensuring public_metrics are requested via the API.
Time-Based Analysis: Convert tweet timestamps to datetime objects and use Pandas resampling or grouping to aggregate engagement metrics by hour, day, or week.
Visualization: Plotting aggregated metrics reveals patterns, such as daily fluctuations or peak hours for engagement.
Actionable Insights: Interpret trends in conjunction with posting activity and external events to refine content strategy, optimize posting times, and measure the impact of initiatives.