2188 words
11 minutes
Creating a Python Script to Bulk Rename Files Using Regex

Automating File Renaming with Python and Regular Expressions#

Managing large collections of files often requires renaming them systematically. This process can be time-consuming and prone to manual error when performed individually. Automating file renaming, especially for bulk operations, offers significant efficiency gains and ensures consistency. Combining Python, a versatile scripting language, with regular expressions (regex), a powerful tool for pattern matching in text, provides a robust solution for complex renaming tasks. This approach allows for defining precise rules to transform filenames based on their existing structure or content, handling scenarios far beyond simple find-and-replace operations.

Why Use Python and Regex for Bulk Renaming?#

Manually renaming files is feasible for a small number, but becomes impractical for hundreds or thousands. While some operating systems offer limited batch renaming capabilities, they often lack the flexibility required for sophisticated pattern-based changes.

  • Efficiency: Process large volumes of files in seconds or minutes, a task that could take hours manually.
  • Accuracy: Eliminate human error associated with repetitive manual tasks. Once the renaming logic is defined and tested, it applies consistently across all targeted files.
  • Flexibility: Regular expressions enable matching and manipulating complex filename patterns that simple wildcards cannot handle. This includes extracting specific parts of a name, reordering elements, inserting dates, or removing unwanted characters based on intricate rules.
  • Auditability: A script provides a clear record of how files were renamed. Changes can be easily reviewed and modified.

Python’s os module provides the necessary functions to interact with the file system (listing directories, constructing paths, renaming files), while the re module handles the powerful pattern matching and substitution capabilities of regular expressions.

Essential Concepts#

Successfully creating a Python script for bulk renaming using regex requires understanding a few core concepts.

File Paths and the os Module#

Files reside within a file system structure, typically organized into directories. A file’s path indicates its location within this hierarchy. Paths can be absolute (starting from the root directory) or relative (starting from the current working directory).

The os module in Python provides functions for interacting with the operating system, including file system operations. Key functions for renaming tasks include:

  • os.listdir(path): Returns a list of entries (files and directories) in the specified path.
  • os.path.join(path, name): Constructs a full path by joining a directory path and a filename or subdirectory name, handling system-specific path separators (\ on Windows, / on macOS/Linux). This is crucial for building the full paths required by renaming functions.
  • os.path.isfile(path): Checks if a given path points to a file.
  • os.rename(src, dst): Renames the file or directory src to dst. Both arguments must be full paths.

Regular Expressions and the re Module#

Regular expressions are sequences of characters that define a search pattern. They are invaluable for finding, matching, and manipulating text based on these patterns. The re module in Python provides functions for working with regex.

Basic regex concepts relevant to renaming include:

  • Literal Characters: Match themselves (e.g., a, 1, _).
  • Metacharacters: Special characters with specific meanings:
    • .: Matches any single character (except newline).
    • *: Matches the preceding element zero or more times.
    • +: Matches the preceding element one or more times.
    • ?: Matches the preceding element zero or one time (makes it optional).
    • ^: Matches the start of the string.
    • $: Matches the end of the string.
    • \d: Matches any digit (0-9).
    • \w: Matches any word character (alphanumeric + underscore).
    • \s: Matches any whitespace character.
  • Character Sets:
    • [abc]: Matches any single character ‘a’, ‘b’, or ‘c’.
    • [a-z]: Matches any lowercase letter.
    • [^abc]: Matches any character except ‘a’, ‘b’, or ‘c’.
  • Grouping and Capturing:
    • (pattern): Groups parts of the pattern together. Captured groups can be referenced later.
  • Backreferences:
    • \1, \2, etc.: Reference the text matched by the 1st, 2nd, etc., captured group in the replacement string.

The primary re function used for renaming is re.sub(pattern, repl, string). This function finds all occurrences of pattern in string and replaces them with repl. The repl argument can be a string, which can include backreferences like \1 to refer to captured groups from the pattern.

Developing the Python Rename Script#

Creating a script to bulk rename files involves defining the directory to work in, iterating through its contents, applying a regex pattern to each potential filename, constructing a new filename based on the regex match and replacement rules, and finally performing the rename operation.

Step-by-Step Script Construction#

Here is a breakdown of the process and a complete script example.

Step 1: Import Necessary Modules

import os
import re

This line imports the os module for file system interaction and the re module for regular expressions.

Step 2: Define the Target Directory and Patterns

Specify the path to the directory containing the files to be renamed. Define the regular expression pattern to search for within filenames and the string used for replacement.

# Define the directory containing the files
target_directory = './test_files' # Example: Use a relative path
# Define the regex pattern to match
# Example: Find 'copy_' followed by one or more digits (\d+)
# Capture the digits in a group ()
pattern_to_match = r"copy_(\d+)\.txt$"
# Define the replacement string
# Example: Replace with 'renamed_file_' followed by the captured digits (\1)
# and the original extension
replacement_string = r"renamed_file_\1.txt"

Using raw strings (r"...") for regex patterns is good practice to prevent backslashes from being interpreted as escape sequences by Python itself.

Step 3: Implement a Dry Run Mode (Highly Recommended)

Before performing any actual renaming, it is critical to see what changes the script would make. A dry run mode prints the planned renames without executing os.rename().

dry_run = True # Set to False to perform actual renaming
print(f"Processing directory: {target_directory}")
if dry_run:
print("--- Dry Run Mode: No files will be renamed ---")

Step 4: Iterate Through Files and Apply Renaming Logic

Walk through the items in the directory. For each item, check if it’s a file. If it is, attempt to apply the regex pattern. If the pattern matches, construct the new filename and perform the rename (or print the planned rename in dry run mode).

try:
# List all entries in the directory
with os.scandir(target_directory) as entries:
for entry in entries:
# Only process files
if entry.is_file():
original_name = entry.name
original_path = entry.path # Full path
# Attempt to apply the regex pattern
match = re.search(pattern_to_match, original_name)
# If the pattern matches the filename
if match:
# Construct the new filename using the replacement string
# re.sub is convenient as it handles backreferences like \1
new_name = re.sub(pattern_to_match, replacement_string, original_name)
# Construct the full path for the new file name
new_path = os.path.join(target_directory, new_name)
# Avoid renaming if the new name is the same as the original
if original_name == new_name:
print(f"Skipping '{original_name}': New name is the same.")
continue
print(f"Planning to rename '{original_name}' to '{new_name}'")
# Perform the rename if not in dry run mode
if not dry_run:
try:
os.rename(original_path, new_path)
print(f"Renamed '{original_name}' to '{new_name}'")
except OSError as e:
print(f"Error renaming '{original_name}': {e}")
else:
# Optional: Report files that did not match the pattern
# print(f"'{original_name}' did not match the pattern.")
pass # Keep silent for non-matching files by default
except FileNotFoundError:
print(f"Error: Directory not found at '{target_directory}'")
except Exception as e:
print(f"An unexpected error occurred: {e}")
if dry_run:
print("--- Dry Run Complete ---")
else:
print("--- Renaming Complete ---")

Using os.scandir() is generally more efficient for iterating directory contents than os.listdir() followed by os.path.join() and os.path.isfile() in separate steps, especially for large directories.

Complete Script Example#

Here is the complete script incorporating the steps above.

import os
import re
# --- Configuration ---
# Define the directory containing the files to rename
target_directory = './test_files' # Change this to your directory path
# Define the regex pattern to match parts of the filename
# Example: Match files like 'document_part1_v1.txt' and capture 'part1' and '1'
# pattern_to_match = r"^document_(\w+)_v(\d+)\.txt$"
# Example: Match 'IMG_XXXX.jpg' or 'img_XXXX.JPG' and capture the digits
pattern_to_match = r"^(?i)IMG_(\d+)\.(jpg|jpeg|png)$" # (?i) makes it case-insensitive
# Define the replacement string using captured groups (\1, \2, etc.)
# Example for document pattern: Replace with 'report_v\2_\1.txt' -> 'report_v1_part1.txt'
# Example for IMG pattern: Replace with 'Photo_\1_processed.\2' -> 'Photo_XXXX_processed.jpg'
replacement_string = r"Photo_\1_processed.\2"
# Set to True to simulate renaming without actually changing files
# Set to False to perform the actual rename operation
dry_run = True
# --- End Configuration ---
print(f"Processing directory: {target_directory}")
if dry_run:
print("--- Dry Run Mode: No files will be renamed ---")
else:
print("--- Renaming Mode: Files WILL be renamed ---")
try:
# Check if the target directory exists
if not os.path.isdir(target_directory):
print(f"Error: Directory not found or is not a directory at '{target_directory}'")
else:
# List all entries in the directory
with os.scandir(target_directory) as entries:
file_count = 0
renamed_count = 0
for entry in entries:
# Only process files
if entry.is_file():
file_count += 1
original_name = entry.name
original_path = entry.path # Full path
# Attempt to apply the regex pattern
# Using re.search to find the pattern anywhere in the name
# Use re.match if the pattern must match from the start of the name
match = re.search(pattern_to_match, original_name)
# If the pattern matches the filename
if match:
# Construct the new filename using the replacement string and captured groups
# re.sub is ideal as it finds the pattern and replaces based on the template string
new_name = re.sub(pattern_to_match, replacement_string, original_name)
# Construct the full path for the new file name
new_path = os.path.join(target_directory, new_name)
# Add checks before renaming
if original_name == new_name:
# print(f"Skipping '{original_name}': New name is the same.")
continue # Skip if name doesn't change
# Check if a file with the new name already exists (basic check)
# More robust collision handling might be needed for complex cases
if os.path.exists(new_path) and original_path != new_path:
print(f"Skipping '{original_name}': Target file '{new_name}' already exists.")
continue
print(f"'{original_name}' -> '{new_name}'")
# Perform the rename if not in dry run mode
if not dry_run:
try:
os.rename(original_path, new_path)
renamed_count += 1
except OSError as e:
print(f"Error renaming '{original_name}': {e}")
# else:
# Optional: Report files that did not match the pattern
# print(f"'{original_name}' did not match the pattern.")
# pass # Keep silent for non-matching files by default
print(f"\nFinished processing {file_count} files.")
if not dry_run:
print(f"Successfully renamed {renamed_count} files.")
except Exception as e:
print(f"An unexpected error occurred: {e}")
if dry_run:
print("--- Dry Run Complete ---")
else:
print("--- Renaming Complete ---")

Key Safety Considerations:

  • Dry Run: Always run the script in dry run mode first to verify the planned changes are correct before executing the actual rename operations.
  • Backup: Back up files before performing bulk rename operations. Mistakes in the regex or script logic can lead to unintended consequences.
  • Error Handling: Include try...except blocks to catch potential errors during file operations (e.g., file permissions issues, file not found during rename if it was moved or deleted).
  • Path Construction: Always use os.path.join() to construct paths to ensure compatibility across different operating systems.
  • Overwrite Prevention: The example script includes a basic check (os.path.exists) to prevent overwriting an existing file with the target name, unless the original file is the target file itself. More sophisticated collision handling might involve adding suffixes (_1, _2) or timestamps.

Real-World Application Examples#

Python’s flexibility combined with regex power allows handling diverse renaming scenarios.

Example 1: Standardizing Photo Filenames#

Digital cameras often name photos with a prefix followed by a sequence number, like IMG_1234.jpg. A common need is to replace the generic prefix with something more descriptive, possibly including the captured sequence number.

  • Original names: IMG_0001.jpg, IMG_0002.jpg, IMG_0003.png
  • Goal: Vacation_Hawaii_0001.jpg, Vacation_Hawaii_0002.jpg, Vacation_Hawaii_0003.png
  • pattern_to_match: r"^(?i)IMG_(\d+)\.(jpg|jpeg|png)$"
    • ^: Start of the string.
    • (?i): Case-insensitive flag (matches IMG_ or img_).
    • IMG_: Matches the literal string “IMG_”.
    • (\d+): Captures one or more digits (the sequence number) into group 1.
    • \.: Matches a literal dot (needs escaping).
    • (jpg|jpeg|png): Captures ‘jpg’, ‘jpeg’, or ‘png’ into group 2.
    • $: End of the string.
  • replacement_string: r"Vacation_Hawaii_\1.\2"
    • Vacation_Hawaii_: Literal prefix.
    • \1: Inserts the text captured by group 1 (the sequence number).
    • \.: Inserts a literal dot.
    • \2: Inserts the text captured by group 2 (the original extension).

Example 2: Reordering and Simplifying Downloaded Reports#

Suppose reports are downloaded with names like Report_Status_20231027.csv or Report-Summary-2023-11-15.xlsx, and the goal is to standardize the format to YYYY-MM-DD_Report_Type.csv.

  • Original names: Report_Status_20231027.csv, Report-Summary-2023-11-15.xlsx
  • Goal: 2023-10-27_Report_Status.csv, 2023-11-15_Report_Summary.xlsx
  • pattern_to_match: r"^Report[_ -](Status|Summary)[_ -](\d{4})[_ -]?(\d{2})[_ -]?(\d{2})\.(csv|xlsx)$"
    • ^Report: Matches ‘Report’ at the start.
    • [_ -]: Matches a single underscore, space, or hyphen.
    • (Status|Summary): Captures ‘Status’ or ‘Summary’ into group 1.
    • [_ -]: Another separator match.
    • (\d{4}): Captures 4 digits (year) into group 2.
    • [_ -]?: Optional separator.
    • (\d{2}): Captures 2 digits (month) into group 3.
    • [_ -]?: Optional separator.
    • (\d{2}): Captures 2 digits (day) into group 4.
    • \.: Matches a literal dot.
    • (csv|xlsx): Captures ‘csv’ or ‘xlsx’ into group 5.
    • $: End of the string.
  • replacement_string: r"\2-\3-\4_Report_\1.\5"
    • \2-\3-\4: Inserts captured year, month, day with hyphens.
    • _Report_: Literal string.
    • \1: Inserts the captured report type (Status/Summary).
    • \.: Inserts a dot.
    • \5: Inserts the captured extension.

These examples demonstrate how regex allows for complex pattern matching and extraction, while the Python script orchestrates the file system interaction and application of the renaming rule to multiple files.

Key Takeaways#

  • Bulk renaming files with Python and regex is an efficient and accurate alternative to manual methods.
  • The os module in Python provides the necessary functions for interacting with the file system, such as listing directories and renaming files using full paths.
  • Regular expressions, handled by Python’s re module, offer powerful pattern matching capabilities to identify specific parts of filenames.
  • The re.sub() function is particularly useful as it finds a pattern and substitutes it with a replacement string, allowing the use of captured groups from the original match.
  • A dry run mode is essential for testing the renaming logic before making permanent changes to files.
  • Using os.path.join() ensures script compatibility across different operating systems.
  • Implementing error handling and potentially collision detection increases script robustness.
  • Regular expressions provide the flexibility to handle complex renaming rules like reordering filename components, extracting data, or changing formats based on sophisticated patterns.
Creating a Python Script to Bulk Rename Files Using Regex
https://dev-resources.site/posts/creating-a-python-script-to-bulk-rename-files-using-regex/
Author
Dev-Resources
Published at
2025-06-30
License
CC BY-NC-SA 4.0