Automate Data Backups with Python: Never Lose Important Files
You know you should be backing up your files. Your project folders, documents, configuration files—all that work sitting on a single drive. One hardware failure, one accidental deletion, and it's gone.
Let's build a Python backup system that handles this automatically. Set it once, and your important files are protected forever.
What You'll Learn
- Copying and syncing files with Python
- Creating compressed backup archives
- Implementing backup rotation (keeping recent backups, deleting old ones)
- Scheduling automated backups
- Logging and notifications for backup status
Prerequisites
- Python 3.8 or higher
- No external libraries required for basic backups
- Optional: cloud storage SDK for remote backups
The Problem
Manual backups fail because:
- You forget to do them
- They're tedious, so you skip them
- You don't know what's backed up and what isn't
- Old backups pile up and fill your drive
- No verification that backups actually work
The Solution
An automated backup script that:
- Copies important folders to a backup location
- Creates compressed archives with timestamps
- Rotates old backups automatically
- Logs every operation
- Runs on a schedule without intervention
Step 1: Simple File Copy Backup
Let's start with the basics—copying folders to a backup location:
1import shutil2from pathlib import Path3from datetime import datetime456def simple_backup(source, destination):7 """8 Create a simple copy backup of a folder.910 Args:11 source: Path to folder to backup12 destination: Path to backup location1314 Returns:15 Path to the created backup16 """17 source = Path(source)18 destination = Path(destination)1920 if not source.exists():21 raise FileNotFoundError(f"Source not found: {source}")2223 # Create timestamped backup folder name24 timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")25 backup_name = f"{source.name}_{timestamp}"26 backup_path = destination / backup_name2728 # Copy the folder29 print(f"Backing up: {source}")30 print(f"To: {backup_path}")3132 shutil.copytree(source, backup_path)3334 print(f"âś… Backup complete!")35 return backup_path363738# Example usage39backup_path = simple_backup(40 source="/home/user/Documents/Projects",41 destination="/media/backup_drive/backups"42)
Step 2: Compressed Backups
Save space with compressed archives:
1import shutil2from pathlib import Path3from datetime import datetime456def compressed_backup(source, destination, format='zip'):7 """8 Create a compressed backup archive.910 Args:11 source: Path to folder to backup12 destination: Path to backup location13 format: Archive format ('zip', 'tar', 'gztar', 'bztar')1415 Returns:16 Path to the created archive17 """18 source = Path(source)19 destination = Path(destination)2021 if not source.exists():22 raise FileNotFoundError(f"Source not found: {source}")2324 destination.mkdir(parents=True, exist_ok=True)2526 # Create archive name with timestamp27 timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")28 archive_name = f"{source.name}_{timestamp}"29 archive_path = destination / archive_name3031 print(f"Creating {format} archive of: {source}")3233 # Create the archive34 result = shutil.make_archive(35 str(archive_path),36 format,37 root_dir=source.parent,38 base_dir=source.name39 )4041 # Get file size42 size_mb = Path(result).stat().st_size / (1024 * 1024)43 print(f"âś… Archive created: {result} ({size_mb:.1f} MB)")4445 return Path(result)464748# Example usage49archive = compressed_backup(50 source="/home/user/Documents",51 destination="/media/backup/archives",52 format='gztar' # .tar.gz - good compression53)
Step 3: Incremental Backup (Sync)
Only copy changed files—much faster for regular backups:
1import os2import shutil3from pathlib import Path4from datetime import datetime567def sync_backup(source, destination, delete_extra=False):8 """9 Sync backup - only copy new or modified files.1011 Args:12 source: Path to folder to backup13 destination: Path to backup location14 delete_extra: Remove files in destination not in source1516 Returns:17 Dictionary with backup statistics18 """19 source = Path(source)20 destination = Path(destination)2122 if not source.exists():23 raise FileNotFoundError(f"Source not found: {source}")2425 destination.mkdir(parents=True, exist_ok=True)2627 stats = {28 "files_copied": 0,29 "files_skipped": 0,30 "files_deleted": 0,31 "bytes_copied": 0,32 }3334 # Walk through source directory35 for src_path in source.rglob("*"):36 if src_path.is_dir():37 continue3839 # Calculate destination path40 rel_path = src_path.relative_to(source)41 dst_path = destination / rel_path4243 # Check if copy is needed44 should_copy = False4546 if not dst_path.exists():47 should_copy = True48 else:49 # Compare modification times50 src_mtime = src_path.stat().st_mtime51 dst_mtime = dst_path.stat().st_mtime5253 if src_mtime > dst_mtime:54 should_copy = True5556 if should_copy:57 # Create parent directories58 dst_path.parent.mkdir(parents=True, exist_ok=True)5960 # Copy file61 shutil.copy2(src_path, dst_path)62 stats["files_copied"] += 163 stats["bytes_copied"] += src_path.stat().st_size64 else:65 stats["files_skipped"] += 16667 # Optionally delete extra files in destination68 if delete_extra:69 for dst_path in destination.rglob("*"):70 if dst_path.is_dir():71 continue7273 rel_path = dst_path.relative_to(destination)74 src_path = source / rel_path7576 if not src_path.exists():77 dst_path.unlink()78 stats["files_deleted"] += 17980 return stats818283# Example usage84stats = sync_backup(85 source="/home/user/Documents",86 destination="/media/backup/Documents_sync"87)88print(f"Files copied: {stats['files_copied']}")89print(f"Files skipped (unchanged): {stats['files_skipped']}")
Step 4: Backup Rotation
Keep recent backups, automatically delete old ones:
1from pathlib import Path2from datetime import datetime, timedelta345def rotate_backups(backup_dir, keep_daily=7, keep_weekly=4, keep_monthly=3):6 """7 Rotate backups - keep recent ones, delete old ones.89 Args:10 backup_dir: Directory containing backup files11 keep_daily: Number of daily backups to keep12 keep_weekly: Number of weekly backups to keep13 keep_monthly: Number of monthly backups to keep1415 Returns:16 List of deleted backup paths17 """18 backup_dir = Path(backup_dir)1920 if not backup_dir.exists():21 return []2223 # Get all backup files/folders with timestamps24 backups = []25 for item in backup_dir.iterdir():26 # Extract timestamp from name (format: name_YYYYMMDD_HHMMSS)27 try:28 parts = item.stem.split('_')29 date_str = parts[-2]30 time_str = parts[-1].split('.')[0]31 timestamp = datetime.strptime(f"{date_str}_{time_str}", "%Y%m%d_%H%M%S")32 backups.append((item, timestamp))33 except (ValueError, IndexError):34 continue3536 # Sort by date (newest first)37 backups.sort(key=lambda x: x[1], reverse=True)3839 now = datetime.now()40 keep = set()4142 # Keep recent daily backups43 daily_kept = 044 for item, timestamp in backups:45 if daily_kept >= keep_daily:46 break47 if now - timestamp < timedelta(days=keep_daily):48 keep.add(item)49 daily_kept += 15051 # Keep weekly backups (one per week)52 weeks_seen = set()53 for item, timestamp in backups:54 week = timestamp.strftime("%Y-%W")55 if week not in weeks_seen and len(weeks_seen) < keep_weekly:56 keep.add(item)57 weeks_seen.add(week)5859 # Keep monthly backups (one per month)60 months_seen = set()61 for item, timestamp in backups:62 month = timestamp.strftime("%Y-%m")63 if month not in months_seen and len(months_seen) < keep_monthly:64 keep.add(item)65 months_seen.add(month)6667 # Delete backups not in keep set68 deleted = []69 for item, _ in backups:70 if item not in keep:71 if item.is_dir():72 shutil.rmtree(item)73 else:74 item.unlink()75 deleted.append(item)76 print(f"🗑️ Deleted old backup: {item.name}")7778 print(f"Kept {len(keep)} backups, deleted {len(deleted)}")79 return deleted
The Complete Backup Script
1#!/usr/bin/env python32"""3Automated Backup System - Protect your important files automatically.4Author: Alex Rodriguez56This script creates compressed backups of specified folders,7manages backup rotation, and logs all operations.8"""910import json11import logging12import os13import shutil14import sys15from datetime import datetime, timedelta16from pathlib import Path171819# ========================================20# CONFIGURATION21# ========================================2223# Folders to backup24BACKUP_SOURCES = [25 {26 "name": "Documents",27 "path": Path.home() / "Documents",28 "enabled": True,29 },30 {31 "name": "Projects",32 "path": Path.home() / "Projects",33 "enabled": True,34 },35 {36 "name": "Config",37 "path": Path.home() / ".config",38 "enabled": True,39 "exclude": ["cache", "Cache", "tmp"],40 },41]4243# Backup destination44BACKUP_DESTINATION = Path("/media/backup/automated_backups")45# Alternative: BACKUP_DESTINATION = Path.home() / "Backups"4647# Backup settings48BACKUP_FORMAT = "gztar" # Options: zip, tar, gztar, bztar49KEEP_DAILY = 750KEEP_WEEKLY = 451KEEP_MONTHLY = 65253# Logging54LOG_FILE = Path.home() / ".backup_log.txt"555657# ========================================58# LOGGING SETUP59# ========================================6061def setup_logging():62 """Configure logging."""63 logging.basicConfig(64 level=logging.INFO,65 format='%(asctime)s | %(levelname)-8s | %(message)s',66 handlers=[67 logging.FileHandler(LOG_FILE),68 logging.StreamHandler(sys.stdout)69 ]70 )71 return logging.getLogger(__name__)727374logger = setup_logging()757677# ========================================78# BACKUP FUNCTIONS79# ========================================8081def get_folder_size(path):82 """Calculate total size of a folder in bytes."""83 total = 084 for item in Path(path).rglob("*"):85 if item.is_file():86 total += item.stat().st_size87 return total888990def format_size(bytes_size):91 """Format bytes as human-readable string."""92 for unit in ['B', 'KB', 'MB', 'GB']:93 if bytes_size < 1024:94 return f"{bytes_size:.1f} {unit}"95 bytes_size /= 102496 return f"{bytes_size:.1f} TB"979899def create_backup(source_config, destination):100 """101 Create a compressed backup of a source folder.102103 Args:104 source_config: Dictionary with source configuration105 destination: Base backup destination path106107 Returns:108 Path to created backup or None if failed109 """110 name = source_config["name"]111 source_path = Path(source_config["path"])112 exclude = source_config.get("exclude", [])113114 if not source_path.exists():115 logger.warning(f"Source not found, skipping: {source_path}")116 return None117118 # Create backup destination folder for this source119 backup_dir = destination / name120 backup_dir.mkdir(parents=True, exist_ok=True)121122 # Generate backup filename with timestamp123 timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")124 archive_name = f"{name}_{timestamp}"125 archive_path = backup_dir / archive_name126127 logger.info(f"Backing up: {name}")128 logger.info(f" Source: {source_path}")129130 # Get source size131 source_size = get_folder_size(source_path)132 logger.info(f" Size: {format_size(source_size)}")133134 try:135 # If we have exclusions, we need to copy to temp first136 if exclude:137 import tempfile138 with tempfile.TemporaryDirectory() as temp_dir:139 temp_source = Path(temp_dir) / source_path.name140141 # Copy with exclusions142 shutil.copytree(143 source_path,144 temp_source,145 ignore=shutil.ignore_patterns(*exclude)146 )147148 # Create archive from temp149 result = shutil.make_archive(150 str(archive_path),151 BACKUP_FORMAT,152 root_dir=temp_dir,153 base_dir=source_path.name154 )155 else:156 # Create archive directly157 result = shutil.make_archive(158 str(archive_path),159 BACKUP_FORMAT,160 root_dir=source_path.parent,161 base_dir=source_path.name162 )163164 result_path = Path(result)165 backup_size = result_path.stat().st_size166167 logger.info(f" ✅ Created: {result_path.name} ({format_size(backup_size)})")168169 return result_path170171 except Exception as e:172 logger.error(f" ❌ Backup failed: {e}")173 return None174175176def rotate_backups(backup_dir):177 """178 Rotate backups - keep recent ones, delete old ones.179180 Args:181 backup_dir: Directory containing backup files182183 Returns:184 Number of backups deleted185 """186 backup_dir = Path(backup_dir)187188 if not backup_dir.exists():189 return 0190191 # Collect all backup files with their timestamps192 backups = []193 for item in backup_dir.iterdir():194 if not item.is_file():195 continue196197 try:198 # Parse timestamp from filename199 parts = item.stem.split('_')200 if len(parts) >= 3:201 date_str = parts[-2]202 time_str = parts[-1]203 timestamp = datetime.strptime(f"{date_str}_{time_str}", "%Y%m%d_%H%M%S")204 backups.append((item, timestamp))205 except (ValueError, IndexError):206 continue207208 if not backups:209 return 0210211 # Sort newest first212 backups.sort(key=lambda x: x[1], reverse=True)213214 now = datetime.now()215 keep = set()216217 # Keep daily backups218 for i, (item, timestamp) in enumerate(backups):219 if i < KEEP_DAILY:220 keep.add(item)221222 # Keep weekly backups (one per week)223 weeks_seen = set()224 for item, timestamp in backups:225 week_key = timestamp.strftime("%Y-W%W")226 if week_key not in weeks_seen:227 keep.add(item)228 weeks_seen.add(week_key)229 if len(weeks_seen) >= KEEP_WEEKLY:230 break231232 # Keep monthly backups (one per month)233 months_seen = set()234 for item, timestamp in backups:235 month_key = timestamp.strftime("%Y-%m")236 if month_key not in months_seen:237 keep.add(item)238 months_seen.add(month_key)239 if len(months_seen) >= KEEP_MONTHLY:240 break241242 # Delete old backups243 deleted = 0244 for item, timestamp in backups:245 if item not in keep:246 item.unlink()247 logger.info(f" 🗑️ Deleted old backup: {item.name}")248 deleted += 1249250 return deleted251252253def verify_backup(backup_path):254 """255 Verify a backup archive is readable.256257 Args:258 backup_path: Path to backup archive259260 Returns:261 True if backup is valid262 """263 import tarfile264 import zipfile265266 backup_path = Path(backup_path)267268 try:269 if backup_path.suffix == '.zip':270 with zipfile.ZipFile(backup_path, 'r') as zf:271 # Test archive integrity272 bad_file = zf.testzip()273 return bad_file is None274 elif backup_path.suffix in ['.tar', '.gz', '.bz2'] or '.tar' in backup_path.name:275 with tarfile.open(backup_path, 'r:*') as tf:276 # List contents to verify277 tf.getnames()278 return True279 except Exception as e:280 logger.error(f"Backup verification failed: {e}")281282 return False283284285def get_backup_summary(destination):286 """Generate summary of all backups."""287 summary = {288 "total_size": 0,289 "backup_count": 0,290 "sources": {}291 }292293 destination = Path(destination)294295 if not destination.exists():296 return summary297298 for source_dir in destination.iterdir():299 if not source_dir.is_dir():300 continue301302 source_summary = {303 "count": 0,304 "size": 0,305 "latest": None,306 "oldest": None,307 }308309 backups = list(source_dir.glob("*"))310 source_summary["count"] = len(backups)311312 for backup in backups:313 if backup.is_file():314 size = backup.stat().st_size315 source_summary["size"] += size316 summary["total_size"] += size317 summary["backup_count"] += 1318319 # Track newest and oldest320 mtime = backup.stat().st_mtime321 if source_summary["latest"] is None or mtime > source_summary["latest"]:322 source_summary["latest"] = datetime.fromtimestamp(mtime)323 if source_summary["oldest"] is None or mtime < source_summary["oldest"]:324 source_summary["oldest"] = datetime.fromtimestamp(mtime)325326 summary["sources"][source_dir.name] = source_summary327328 return summary329330331# ========================================332# MAIN333# ========================================334335def run_backup():336 """Run the complete backup process."""337 logger.info("=" * 60)338 logger.info("AUTOMATED BACKUP STARTING")339 logger.info(f"Time: {datetime.now()}")340 logger.info(f"Destination: {BACKUP_DESTINATION}")341 logger.info("=" * 60)342343 # Ensure destination exists344 BACKUP_DESTINATION.mkdir(parents=True, exist_ok=True)345346 # Track results347 results = {348 "successful": [],349 "failed": [],350 "rotated": 0,351 }352353 # Process each source354 for source in BACKUP_SOURCES:355 if not source.get("enabled", True):356 logger.info(f"Skipping disabled source: {source['name']}")357 continue358359 logger.info("-" * 40)360361 # Create backup362 backup_path = create_backup(source, BACKUP_DESTINATION)363364 if backup_path:365 # Verify backup366 if verify_backup(backup_path):367 logger.info(f" ✓ Verified")368 results["successful"].append(source["name"])369 else:370 logger.warning(f" ⚠️ Verification failed!")371 results["failed"].append(source["name"])372 else:373 results["failed"].append(source["name"])374375 # Rotate old backups for this source376 source_backup_dir = BACKUP_DESTINATION / source["name"]377 deleted = rotate_backups(source_backup_dir)378 results["rotated"] += deleted379380 # Summary381 logger.info("\n" + "=" * 60)382 logger.info("BACKUP COMPLETE")383 logger.info("=" * 60)384 logger.info(f"Successful: {len(results['successful'])}")385 logger.info(f"Failed: {len(results['failed'])}")386 logger.info(f"Old backups deleted: {results['rotated']}")387388 # Show storage summary389 summary = get_backup_summary(BACKUP_DESTINATION)390 logger.info(f"\nTotal backup storage: {format_size(summary['total_size'])}")391 logger.info(f"Total backup files: {summary['backup_count']}")392393 return results394395396def main():397 """Main entry point."""398 try:399 run_backup()400 except KeyboardInterrupt:401 logger.info("\nBackup cancelled by user")402 except Exception as e:403 logger.exception(f"Backup failed: {e}")404 sys.exit(1)405406407if __name__ == "__main__":408 main()
How to Run This Script
-
Save the script as
backup_system.py -
Configure your backup sources in the
BACKUP_SOURCESlist -
Set your backup destination in
BACKUP_DESTINATION -
Run manually:
bash1python backup_system.py -
Expected output:
Prompt============================================================ AUTOMATED BACKUP STARTING Time: 2025-12-02 10:30:00 Destination: /media/backup/automated_backups ============================================================ ---------------------------------------- Backing up: Documents Source: /home/user/Documents Size: 2.3 GB âś… Created: Documents_20251202_103000.tar.gz (1.8 GB) âś“ Verified ---------------------------------------- Backing up: Projects Source: /home/user/Projects Size: 5.1 GB âś… Created: Projects_20251202_103045.tar.gz (3.2 GB) âś“ Verified ============================================================ BACKUP COMPLETE ============================================================ Successful: 2 Failed: 0 Old backups deleted: 3 Total backup storage: 25.6 GB Total backup files: 15
-
Schedule automatic backups:
bash1# Run daily at 2 AM20 2 * * * /usr/bin/python3 /path/to/backup_system.py
Customization Options
Backup to Cloud Storage
1# pip install boto32import boto334def upload_to_s3(local_path, bucket, s3_key):5 """Upload backup to AWS S3."""6 s3 = boto3.client('s3')7 s3.upload_file(str(local_path), bucket, s3_key)8 logger.info(f"Uploaded to S3: s3://{bucket}/{s3_key}")
Email Notification
1def send_backup_report(results):2 """Send email with backup results."""3 subject = f"Backup Report - {datetime.now().strftime('%Y-%m-%d')}"45 body = f"""6 Backup completed at {datetime.now()}78 Successful: {len(results['successful'])}9 Failed: {len(results['failed'])}1011 {'⚠️ FAILURES: ' + ', '.join(results['failed']) if results['failed'] else '✅ All backups successful'}12 """1314 # Use email sending code from our email automation guide
Exclude Patterns
1BACKUP_SOURCES = [2 {3 "name": "Projects",4 "path": Path.home() / "Projects",5 "enabled": True,6 "exclude": [7 "node_modules",8 "__pycache__",9 ".git",10 "*.pyc",11 ".env",12 "venv",13 ],14 },15]
Common Issues & Solutions
| Issue | Solution |
|---|---|
| Permission denied | Run with sudo or check folder permissions |
| Disk full | Reduce retention settings; check available space |
| Backup too slow | Use sync backup instead of full; exclude large folders |
| Archive corrupted | Always verify backups; check disk health |
| Can't find backup drive | Check mount point; use absolute paths |
Taking It Further
Encrypted Backups
1# pip install cryptography2from cryptography.fernet import Fernet34def encrypt_backup(backup_path, key):5 """Encrypt a backup file."""6 fernet = Fernet(key)78 with open(backup_path, 'rb') as f:9 data = f.read()1011 encrypted = fernet.encrypt(data)1213 encrypted_path = backup_path.with_suffix(backup_path.suffix + '.enc')14 with open(encrypted_path, 'wb') as f:15 f.write(encrypted)1617 return encrypted_path
Backup Integrity Checking
1import hashlib23def calculate_checksum(filepath):4 """Calculate SHA-256 checksum of file."""5 sha256 = hashlib.sha256()67 with open(filepath, 'rb') as f:8 while chunk := f.read(8192):9 sha256.update(chunk)1011 return sha256.hexdigest()121314def save_checksums(backup_dir):15 """Save checksums for all backups."""16 checksums = {}1718 for backup in Path(backup_dir).rglob("*"):19 if backup.is_file() and not backup.name.endswith('.checksums'):20 checksums[backup.name] = calculate_checksum(backup)2122 checksum_file = Path(backup_dir) / "backups.checksums"23 with open(checksum_file, 'w') as f:24 json.dump(checksums, f, indent=2)
Conclusion
You've built a complete backup system. Your important files are now automatically copied, compressed, verified, and rotated—all without any manual intervention.
The key is running this regularly. Set up a scheduled task, and your data is protected. Hardware failures, accidental deletions, ransomware—you're covered.
Start with the basic setup, then customize. Add cloud backup for off-site protection. Add encryption for sensitive data. Add email notifications so you know everything's working.
Your data is too valuable to leave unprotected.
Backup today, sleep well tonight.
Sponsored Content
Interested in advertising? Reach automation professionals through our platform.
