Python Expense Tracking: Automate Receipt Scanning and Categorization

You're drowning in receipts. Coffee shop receipts from client meetings. Lunch receipts. Gas receipts. Office supply receipts. They're stuffed in your wallet, scattered in your car, photographed on your phone. It's the last day of the month, and you need to submit your expense report by 5 PM.

You spend three hours manually entering each expense into a spreadsheet, trying to decipher faded thermal ink, matching dates to credit card statements, and categorizing everything correctly. By the time you're done, you've missed two important meetings and your eyes hurt.

There has to be a better way.

What if Python could scan your receipt photos, extract the key information automatically, categorize each expense intelligently, and generate your complete expense report in under 60 seconds?

Let's build exactly that system.

What You'll Build

By the end of this tutorial, you'll have a Python application that:

Scans receipt images from a folder or email
Extracts data using OCR (Optical Character Recognition)
Parses key information: merchant name, date, total amount, items purchased
Categorizes expenses automatically using rule-based logic and AI
Detects duplicates to prevent double-entry
Generates reports in Excel, CSV, or PDF format
Tracks expenses over time with summaries and analytics

Time savings: 3 hours per month → 5 minutes per month

Prerequisites

Python 3.8 or higher
Basic Python knowledge (functions, loops, dictionaries)
Smartphone or scanner to capture receipt images
Tesseract OCR installed on your system

The Problem with Manual Expense Tracking

Manual expense tracking is painful because:

Time-consuming: Entering 50+ receipts takes hours
Error-prone: Typos, wrong categories, misread amounts
Tedious: Repetitive data entry kills motivation
Inconsistent: Different formats and categorization each month
Delayed: You put it off until deadline day

According to a 2025 workplace productivity study, professionals spend an average of 4.7 hours per month on expense tracking and reporting—that's 56 hours per year, or more than a full work week.

Step 1: Setting Up Your Environment

Install required libraries:

bash

1pip install pytesseract opencv-python pillow pandas openpyxl python-dotenv openai

Install Tesseract OCR:

Windows:

bash

1# Download installer from: https://github.com/UB-Mannheim/tesseract/wiki
2# Install, then add to PATH: C:\Program Files\Tesseract-OCR

Mac:

bash

1brew install tesseract

Linux:

bash

1sudo apt-get install tesseract-ocr

Create project structure:

Prompt

expense_tracker/
├── main.py
├── ocr_processor.py
├── expense_categorizer.py
├── report_generator.py
├── receipts/
│   └── (your receipt images)
├── processed/
│   └── (moved after processing)
├── output/
│   └── (generated reports)
└── expenses.db

Step 2: Building the OCR Receipt Scanner

Create ocr_processor.py:

python

1import pytesseract
2from PIL import Image
3import cv2
4import numpy as np
5import re
6from datetime import datetime
7from pathlib import Path
8
9class ReceiptOCR:
10    def __init__(self, tesseract_path=None):
11        if tesseract_path:
12            pytesseract.pytesseract.tesseract_cmd = tesseract_path
13    
14    def preprocess_image(self, image_path):
15        """Enhance image quality for better OCR results"""
16        # Read image
17        img = cv2.imread(str(image_path))
18        
19        # Convert to grayscale
20        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
21        
22        # Apply thresholding to make text clearer
23        thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
24        
25        # Remove noise
26        denoised = cv2.fastNlMeansDenoising(thresh, None, 10, 7, 21)
27        
28        # Increase contrast
29        contrast = cv2.convertScaleAbs(denoised, alpha=1.5, beta=0)
30        
31        return contrast
32    
33    def extract_text(self, image_path):
34        """Extract text from receipt image using OCR"""
35        try:
36            # Preprocess for better accuracy
37            processed_img = self.preprocess_image(image_path)
38            
39            # Perform OCR
40            text = pytesseract.image_to_string(
41                processed_img,
42                config='--psm 6'  # Assume uniform block of text
43            )
44            
45            return text
46        except Exception as e:
47            print(f"Error processing {image_path}: {e}")
48            return None
49    
50    def parse_receipt(self, text, image_filename):
51        """Extract structured data from OCR text"""
52        if not text:
53            return None
54        
55        receipt_data = {
56            'filename': image_filename,
57            'raw_text': text,
58            'merchant': self._extract_merchant(text),
59            'date': self._extract_date(text),
60            'total': self._extract_total(text),
61            'items': self._extract_items(text),
62            'payment_method': self._extract_payment_method(text)
63        }
64        
65        return receipt_data
66    
67    def _extract_merchant(self, text):
68        """Extract merchant/store name from receipt"""
69        lines = text.split('\n')
70        
71        # Usually merchant name is in first few lines
72        # Look for common indicators
73        for i, line in enumerate(lines[:5]):
74            line_clean = line.strip()
75            
76            # Skip very short lines or lines with just numbers
77            if len(line_clean) < 3 or line_clean.isdigit():
78                continue
79            
80            # Common merchant indicators (customize for your region)
81            common_merchants = [
82                'STARBUCKS', 'WALMART', 'TARGET', 'AMAZON', 'SHELL',
83                'MCDONALD', 'SUBWAY', 'WHOLE FOODS', 'TRADER JOE',
84                'OFFICE DEPOT', 'STAPLES', 'COSTCO', 'HOME DEPOT'
85            ]
86            
87            line_upper = line_clean.upper()
88            for merchant in common_merchants:
89                if merchant in line_upper:
90                    return line_clean
91            
92            # If no match, return first substantial line
93            if i == 0 and len(line_clean) > 5:
94                return line_clean
95        
96        return "Unknown Merchant"
97    
98    def _extract_date(self, text):
99        """Extract transaction date from receipt"""
100        # Common date patterns
101        date_patterns = [
102            r'\d{1,2}/\d{1,2}/\d{2,4}',  # MM/DD/YYYY or M/D/YY
103            r'\d{1,2}-\d{1,2}-\d{2,4}',  # MM-DD-YYYY
104            r'\d{4}-\d{1,2}-\d{1,2}',    # YYYY-MM-DD
105            r'[A-Z][a-z]{2}\s+\d{1,2},?\s+\d{4}',  # Jan 15, 2026
106        ]
107        
108        for pattern in date_patterns:
109            match = re.search(pattern, text)
110            if match:
111                date_str = match.group()
112                
113                # Try to parse the date
114                for fmt in ['%m/%d/%Y', '%m/%d/%y', '%m-%d-%Y', '%Y-%m-%d', '%b %d, %Y']:
115                    try:
116                        return datetime.strptime(date_str, fmt).strftime('%Y-%m-%d')
117                    except ValueError:
118                        continue
119        
120        # Default to today if no date found
121        return datetime.now().strftime('%Y-%m-%d')
122    
123    def _extract_total(self, text):
124        """Extract total amount from receipt"""
125        # Look for total indicators
126        total_keywords = ['TOTAL', 'AMOUNT', 'BALANCE', 'GRAND TOTAL', 'AMOUNT DUE']
127        
128        lines = text.split('\n')
129        
130        for i, line in enumerate(lines):
131            line_upper = line.upper()
132            
133            # Check if line contains total keyword
134            if any(keyword in line_upper for keyword in total_keywords):
135                # Extract amount from this line or next few lines
136                search_lines = lines[i:i+3]
137                
138                for search_line in search_lines:
139                    # Find dollar amounts
140                    amounts = re.findall(r'\$?\s*(\d+\.\d{2})', search_line)
141                    
142                    if amounts:
143                        # Return the largest amount (likely the total)
144                        return max([float(amt) for amt in amounts])
145        
146        # If no "TOTAL" keyword, look for largest dollar amount at end
147        last_lines = lines[-10:]
148        all_amounts = []
149        
150        for line in last_lines:
151            amounts = re.findall(r'\$?\s*(\d+\.\d{2})', line)
152            all_amounts.extend([float(amt) for amt in amounts])
153        
154        if all_amounts:
155            return max(all_amounts)
156        
157        return 0.0
158    
159    def _extract_items(self, text):
160        """Extract line items from receipt (optional, for detailed tracking)"""
161        items = []
162        lines = text.split('\n')
163        
164        for line in lines:
165            # Look for lines with price pattern
166            if re.search(r'\d+\.\d{2}', line):
167                # Extract item description and price
168                match = re.search(r'(.+?)\s+(\d+\.\d{2})', line)
169                if match:
170                    item_name = match.group(1).strip()
171                    item_price = float(match.group(2))
172                    
173                    # Filter out total/subtotal lines
174                    if len(item_name) > 2 and 'TOTAL' not in item_name.upper():
175                        items.append({
176                            'description': item_name,
177                            'price': item_price
178                        })
179        
180        return items
181    
182    def _extract_payment_method(self, text):
183        """Detect payment method from receipt"""
184        text_upper = text.upper()
185        
186        if 'VISA' in text_upper or '**** ****' in text:
187            # Extract last 4 digits if present
188            card_match = re.search(r'\*+\s*(\d{4})', text)
189            if card_match:
190                return f"VISA ending in {card_match.group(1)}"
191            return "VISA"
192        elif 'MASTERCARD' in text_upper or 'MC' in text_upper:
193            return "Mastercard"
194        elif 'AMEX' in text_upper or 'AMERICAN EXPRESS' in text_upper:
195            return "American Express"
196        elif 'CASH' in text_upper:
197            return "Cash"
198        elif 'DEBIT' in text_upper:
199            return "Debit Card"
200        else:
201            return "Unknown"
202    
203    def process_receipt_folder(self, folder_path):
204        """Process all receipt images in a folder"""
205        folder = Path(folder_path)
206        receipts = []
207        
208        # Supported image formats
209        image_extensions = ['.jpg', '.jpeg', '.png', '.tiff', '.bmp']
210        
211        for image_file in folder.iterdir():
212            if image_file.suffix.lower() in image_extensions:
213                print(f"Processing: {image_file.name}")
214                
215                # Extract text
216                text = self.extract_text(str(image_file))
217                
218                # Parse into structured data
219                receipt_data = self.parse_receipt(text, image_file.name)
220                
221                if receipt_data:
222                    receipts.append(receipt_data)
223                    print(f"  ✓ Merchant: {receipt_data['merchant']}")
224                    print(f"  ✓ Date: {receipt_data['date']}")
225                    print(f"  ✓ Total: ${receipt_data['total']:.2f}")
226                else:
227                    print(f"  ✗ Failed to process")
228        
229        return receipts
230
231
232# Example usage
233if __name__ == "__main__":
234    processor = ReceiptOCR()
235    
236    # Process all receipts in folder
237    receipts = processor.process_receipt_folder('receipts/')
238    
239    # Display results
240    for receipt in receipts:
241        print(f"\n{receipt['merchant']} - {receipt['date']}: ${receipt['total']:.2f}")

Step 3: Building the Intelligent Categorizer

Create expense_categorizer.py:

python

1import openai
2import os
3from dotenv import load_dotenv
4
5load_dotenv()
6
7class ExpenseCategorizer:
8    def __init__(self):
9        self.categories = {
10            'Meals & Entertainment': [
11                'restaurant', 'cafe', 'starbucks', 'coffee', 'mcdonald',
12                'dining', 'food', 'lunch', 'dinner', 'breakfast'
13            ],
14            'Transportation': [
15                'uber', 'lyft', 'taxi', 'gas', 'shell', 'chevron',
16                'parking', 'airline', 'rental car', 'transit'
17            ],
18            'Office Supplies': [
19                'staples', 'office depot', 'amazon', 'supplies',
20                'paper', 'toner', 'ink'
21            ],
22            'Software & Subscriptions': [
23                'microsoft', 'adobe', 'google', 'software', 'saas',
24                'subscription', 'license'
25            ],
26            'Travel & Lodging': [
27                'hotel', 'motel', 'airbnb', 'marriott', 'hilton',
28                'lodging', 'accommodation'
29            ],
30            'Utilities': [
31                'electric', 'internet', 'phone', 'verizon', 'at&t',
32                'utility', 'water', 'gas bill'
33            ],
34            'Professional Services': [
35                'consultant', 'lawyer', 'accounting', 'legal',
36                'professional fee', 'service'
37            ],
38            'Other': []
39        }
40        
41        # Optional: Use OpenAI for ambiguous cases
42        self.use_ai = os.getenv('OPENAI_API_KEY') is not None
43        if self.use_ai:
44            openai.api_key = os.getenv('OPENAI_API_KEY')
45    
46    def categorize_rule_based(self, merchant, items=None):
47        """Categorize expense using rule-based matching"""
48        merchant_lower = merchant.lower()
49        
50        # Check merchant name against category keywords
51        for category, keywords in self.categories.items():
52            if category == 'Other':
53                continue
54            
55            for keyword in keywords:
56                if keyword in merchant_lower:
57                    return category, 0.9  # High confidence for keyword match
58        
59        # Check items if available
60        if items:
61            items_text = ' '.join([item['description'].lower() for item in items])
62            
63            for category, keywords in self.categories.items():
64                if category == 'Other':
65                    continue
66                
67                matches = sum(1 for keyword in keywords if keyword in items_text)
68                if matches > 0:
69                    confidence = min(matches * 0.3, 0.8)
70                    return category, confidence
71        
72        return 'Other', 0.3  # Low confidence default
73    
74    def categorize_with_ai(self, merchant, amount, items=None):
75        """Use AI for intelligent categorization of ambiguous expenses"""
76        if not self.use_ai:
77            return self.categorize_rule_based(merchant, items)
78        
79        # Build prompt
80        context = f"Merchant: {merchant}\nAmount: ${amount:.2f}"
81        
82        if items:
83            items_list = '\n'.join([f"- {item['description']}: ${item['price']:.2f}" 
84                                    for item in items[:5]])  # Limit to first 5 items
85            context += f"\n\nItems purchased:\n{items_list}"
86        
87        prompt = f"""Categorize this business expense into one of these categories:
88{', '.join([cat for cat in self.categories.keys() if cat != 'Other'])}
89
90{context}
91
92Respond with just the category name and confidence (0-1), formatted as:
93Category: [category name]
94Confidence: [0.0-1.0]
95"""
96        
97        try:
98            response = openai.ChatCompletion.create(
99                model="gpt-3.5-turbo",
100                messages=[
101                    {"role": "system", "content": "You are an expense categorization expert."},
102                    {"role": "user", "content": prompt}
103                ],
104                temperature=0.3,
105                max_tokens=50
106            )
107            
108            result = response.choices[0].message.content
109            
110            # Parse response
111            category = 'Other'
112            confidence = 0.5
113            
114            for line in result.split('\n'):
115                if 'Category:' in line:
116                    category = line.split(':', 1)[1].strip()
117                elif 'Confidence:' in line:
118                    try:
119                        confidence = float(line.split(':', 1)[1].strip())
120                    except ValueError:
121                        confidence = 0.5
122            
123            return category, confidence
124            
125        except Exception as e:
126            print(f"AI categorization failed: {e}")
127            return self.categorize_rule_based(merchant, items)
128    
129    def categorize(self, receipt_data, use_ai=True):
130        """Main categorization method"""
131        merchant = receipt_data.get('merchant', 'Unknown')
132        amount = receipt_data.get('total', 0.0)
133        items = receipt_data.get('items', [])
134        
135        if use_ai and self.use_ai:
136            category, confidence = self.categorize_with_ai(merchant, amount, items)
137        else:
138            category, confidence = self.categorize_rule_based(merchant, items)
139        
140        return {
141            'category': category,
142            'confidence': confidence,
143            'needs_review': confidence < 0.6
144        }
145
146
147# Example usage
148if __name__ == "__main__":
149    categorizer = ExpenseCategorizer()
150    
151    test_receipt = {
152        'merchant': 'Starbucks',
153        'total': 15.47,
154        'items': [
155            {'description': 'Grande Latte', 'price': 5.25},
156            {'description': 'Blueberry Muffin', 'price': 3.95}
157        ]
158    }
159    
160    result = categorizer.categorize(test_receipt)
161    print(f"Category: {result['category']}")
162    print(f"Confidence: {result['confidence']:.2f}")
163    print(f"Needs Review: {result['needs_review']}")

Step 4: Generating Expense Reports

Create report_generator.py:

python

1import pandas as pd
2from pathlib import Path
3from datetime import datetime
4import sqlite3
5
6class ExpenseReportGenerator:
7    def __init__(self, db_path='expenses.db'):
8        self.db_path = db_path
9        self.conn = sqlite3.connect(db_path)
10        self.setup_database()
11    
12    def setup_database(self):
13        """Create database schema"""
14        cursor = self.conn.cursor()
15        
16        cursor.execute('''
17            CREATE TABLE IF NOT EXISTS expenses (
18                id INTEGER PRIMARY KEY AUTOINCREMENT,
19                date TEXT,
20                merchant TEXT,
21                amount REAL,
22                category TEXT,
23                payment_method TEXT,
24                confidence REAL,
25                needs_review BOOLEAN,
26                receipt_filename TEXT,
27                items_json TEXT,
28                notes TEXT,
29                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
30            )
31        ''')
32        
33        self.conn.commit()
34    
35    def save_expense(self, receipt_data, category_data):
36        """Save parsed receipt to database"""
37        cursor = self.conn.cursor()
38        
39        cursor.execute('''
40            INSERT INTO expenses (
41                date, merchant, amount, category, payment_method,
42                confidence, needs_review, receipt_filename, items_json
43            ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
44        ''', (
45            receipt_data['date'],
46            receipt_data['merchant'],
47            receipt_data['total'],
48            category_data['category'],
49            receipt_data.get('payment_method', 'Unknown'),
50            category_data['confidence'],
51            category_data['needs_review'],
52            receipt_data['filename'],
53            str(receipt_data.get('items', []))
54        ))
55        
56        self.conn.commit()
57        return cursor.lastrowid
58    
59    def get_expenses(self, start_date=None, end_date=None, category=None):
60        """Retrieve expenses with optional filters"""
61        query = "SELECT * FROM expenses WHERE 1=1"
62        params = []
63        
64        if start_date:
65            query += " AND date >= ?"
66            params.append(start_date)
67        
68        if end_date:
69            query += " AND date <= ?"
70            params.append(end_date)
71        
72        if category:
73            query += " AND category = ?"
74            params.append(category)
75        
76        query += " ORDER BY date DESC"
77        
78        return pd.read_sql_query(query, self.conn, params=params)
79    
80    def generate_monthly_report(self, year, month, output_path='output/'):
81        """Generate comprehensive monthly expense report"""
82        start_date = f"{year}-{month:02d}-01"
83        
84        # Calculate last day of month
85        if month == 12:
86            end_date = f"{year}-12-31"
87        else:
88            end_date = f"{year}-{month+1:02d}-01"
89        
90        # Get expenses for the month
91        df = self.get_expenses(start_date, end_date)
92        
93        if df.empty:
94            print(f"No expenses found for {year}-{month:02d}")
95            return None
96        
97        # Create output directory
98        Path(output_path).mkdir(exist_ok=True)
99        
100        # Generate Excel report with multiple sheets
101        output_file = Path(output_path) / f"Expense_Report_{year}_{month:02d}.xlsx"
102        
103        with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
104            # Sheet 1: Detailed expenses
105            df_export = df[['date', 'merchant', 'amount', 'category', 
106                           'payment_method', 'needs_review']].copy()
107            df_export.columns = ['Date', 'Merchant', 'Amount', 'Category', 
108                                'Payment Method', 'Needs Review']
109            df_export.to_excel(writer, sheet_name='Detailed Expenses', index=False)
110            
111            # Sheet 2: Category summary
112            category_summary = df.groupby('category')['amount'].agg(['sum', 'count', 'mean'])
113            category_summary.columns = ['Total', 'Count', 'Average']
114            category_summary = category_summary.sort_values('Total', ascending=False)
115            category_summary.to_excel(writer, sheet_name='Category Summary')
116            
117            # Sheet 3: Daily summary
118            daily_summary = df.groupby('date')['amount'].sum().reset_index()
119            daily_summary.columns = ['Date', 'Total Amount']
120            daily_summary.to_excel(writer, sheet_name='Daily Summary', index=False)
121            
122            # Sheet 4: Needs review
123            needs_review = df[df['needs_review'] == True]
124            if not needs_review.empty:
125                needs_review[['date', 'merchant', 'amount', 'category', 'confidence']].to_excel(
126                    writer, sheet_name='Needs Review', index=False
127                )
128        
129        print(f"✓ Report generated: {output_file}")
130        
131        # Print summary to console
132        self._print_summary(df, year, month)
133        
134        return output_file
135    
136    def _print_summary(self, df, year, month):
137        """Print expense summary to console"""
138        print(f"\n{'='*50}")
139        print(f"Expense Report for {datetime(year, month, 1).strftime('%B %Y')}")
140        print(f"{'='*50}\n")
141        
142        total = df['amount'].sum()
143        count = len(df)
144        avg = df['amount'].mean()
145        
146        print(f"Total Expenses: ${total:,.2f}")
147        print(f"Number of Transactions: {count}")
148        print(f"Average Transaction: ${avg:.2f}")
149        
150        print(f"\n{'Category':<30} {'Amount':>15} {'%':>10}")
151        print(f"{'-'*30} {'-'*15} {'-'*10}")
152        
153        category_totals = df.groupby('category')['amount'].sum().sort_values(ascending=False)
154        
155        for category, amount in category_totals.items():
156            percentage = (amount / total) * 100
157            print(f"{category:<30} ${amount:>14,.2f} {percentage:>9.1f}%")
158        
159        needs_review_count = df['needs_review'].sum()
160        if needs_review_count > 0:
161            print(f"\n⚠️  {needs_review_count} expenses need manual review")
162        
163        print(f"\n{'='*50}\n")
164    
165    def close(self):
166        """Close database connection"""
167        self.conn.close()
168
169
170# Example usage
171if __name__ == "__main__":
172    generator = ExpenseReportGenerator()
173    
174    # Generate report for current month
175    now = datetime.now()
176    generator.generate_monthly_report(now.year, now.month)
177    
178    generator.close()

Step 5: Putting It All Together

Create main.py:

python

1from ocr_processor import ReceiptOCR
2from expense_categorizer import ExpenseCategorizer
3from report_generator import ExpenseReportGenerator
4from pathlib import Path
5import shutil
6from datetime import datetime
7
8def process_all_receipts(receipts_folder='receipts/', 
9                         processed_folder='processed/'):
10    """Main workflow to process all receipts"""
11    
12    print("Starting expense tracking automation...\n")
13    
14    # Initialize components
15    ocr = ReceiptOCR()
16    categorizer = ExpenseCategorizer()
17    report_gen = ExpenseReportGenerator()
18    
19    # Process receipts
20    receipts_path = Path(receipts_folder)
21    processed_path = Path(processed_folder)
22    processed_path.mkdir(exist_ok=True)
23    
24    print(f"Scanning folder: {receipts_path}")
25    receipt_files = list(receipts_path.glob('*.jpg')) + \
26                   list(receipts_path.glob('*.jpeg')) + \
27                   list(receipts_path.glob('*.png'))
28    
29    if not receipt_files:
30        print("No receipt images found!")
31        return
32    
33    print(f"Found {len(receipt_files)} receipts to process\n")
34    
35    processed_count = 0
36    failed_count = 0
37    
38    for receipt_file in receipt_files:
39        try:
40            print(f"Processing: {receipt_file.name}")
41            
42            # Step 1: OCR extraction
43            text = ocr.extract_text(str(receipt_file))
44            receipt_data = ocr.parse_receipt(text, receipt_file.name)
45            
46            if not receipt_data or receipt_data['total'] == 0:
47                print(f"  ✗ Could not extract data")
48                failed_count += 1
49                continue
50            
51            # Step 2: Categorization
52            category_result = categorizer.categorize(receipt_data)
53            
54            # Step 3: Save to database
55            expense_id = report_gen.save_expense(receipt_data, category_result)
56            
57            print(f"  ✓ Merchant: {receipt_data['merchant']}")
58            print(f"  ✓ Amount: ${receipt_data['total']:.2f}")
59            print(f"  ✓ Category: {category_result['category']}")
60            print(f"  ✓ Saved as expense #{expense_id}")
61            
62            # Move to processed folder
63            shutil.move(str(receipt_file), str(processed_path / receipt_file.name))
64            
65            processed_count += 1
66            print()
67            
68        except Exception as e:
69            print(f"  ✗ Error: {e}")
70            failed_count += 1
71            print()
72    
73    print(f"\n{'='*50}")
74    print(f"Processing complete!")
75    print(f"  Successfully processed: {processed_count}")
76    print(f"  Failed: {failed_count}")
77    print(f"{'='*50}\n")
78    
79    # Generate monthly report
80    now = datetime.now()
81    report_gen.generate_monthly_report(now.year, now.month)
82    
83    report_gen.close()
84
85
86if __name__ == "__main__":
87    process_all_receipts()

Step 6: Running Your Expense Tracker

Usage:

Drop receipt photos into receipts/ folder
Run the script:
bash
```
1python main.py
```
Review the report in output/ folder
Check flagged items that need manual review

Example output:

Prompt

Starting expense tracking automation...

Scanning folder: receipts/
Found 15 receipts to process

Processing: starbucks_20260110.jpg
  ✓ Merchant: Starbucks
  ✓ Amount: $15.47
  ✓ Category: Meals & Entertainment
  ✓ Saved as expense #42

Processing: shell_gas_20260109.jpg
  ✓ Merchant: Shell Gas Station
  ✓ Amount: $52.00
  ✓ Category: Transportation
  ✓ Saved as expense #43

...

==================================================
Processing complete!
  Successfully processed: 14
  Failed: 1
==================================================

==================================================
Expense Report for January 2026
==================================================

Total Expenses: $1,247.35
Number of Transactions: 14
Average Transaction: $89.10

Category                       Amount          %
------------------------------ --------------- ----------
Meals & Entertainment                $425.50      34.1%
Transportation                       $312.00      25.0%
Office Supplies                      $189.75      15.2%
Software & Subscriptions             $180.00      14.4%
Travel & Lodging                     $140.10      11.2%

✓ Report generated: output/Expense_Report_2026_01.xlsx

==================================================

Advanced Features

Feature 1: Duplicate Detection

Add to ExpenseReportGenerator:

python

1def check_duplicate(self, merchant, amount, date, tolerance=0.01):
2    """Check if expense already exists"""
3    query = '''
4        SELECT * FROM expenses 
5        WHERE merchant = ? 
6        AND ABS(amount - ?) < ? 
7        AND date = ?
8    '''
9    
10    cursor = self.conn.cursor()
11    cursor.execute(query, (merchant, amount, tolerance, date))
12    
13    return cursor.fetchone() is not None

Feature 2: Email Integration

Process receipts sent to your email:

python

1import imaplib
2import email
3from email.header import decode_header
4
5def download_receipt_attachments(email_user, email_pass, folder='receipts/'):
6    """Download receipt images from email"""
7    mail = imaplib.IMAP4_SSL('imap.gmail.com')
8    mail.login(email_user, email_pass)
9    mail.select('inbox')
10    
11    # Search for emails with "receipt" in subject
12    status, messages = mail.search(None, 'SUBJECT "receipt"')
13    
14    for num in messages[0].split():
15        status, msg_data = mail.fetch(num, '(RFC822)')
16        email_body = msg_data[0][1]
17        email_message = email.message_from_bytes(email_body)
18        
19        # Download attachments
20        for part in email_message.walk():
21            if part.get_content_maintype() == 'image':
22                filename = part.get_filename()
23                if filename:
24                    filepath = Path(folder) / filename
25                    with open(filepath, 'wb') as f:
26                        f.write(part.get_payload(decode=True))
27    
28    mail.close()
29    mail.logout()

Feature 3: Mileage Tracking

Add GPS-based mileage logging:

python

1def log_mileage(start_location, end_location, purpose):
2    """Calculate mileage and log as expense"""
3    # Use Google Maps API or similar to calculate distance
4    distance_miles = calculate_distance(start_location, end_location)
5    
6    # IRS standard mileage rate (update annually)
7    rate_per_mile = 0.67  # 2026 rate
8    
9    amount = distance_miles * rate_per_mile
10    
11    expense_data = {
12        'date': datetime.now().strftime('%Y-%m-%d'),
13        'merchant': f"Mileage: {start_location} to {end_location}",
14        'total': amount,
15        'category': 'Transportation',
16        'payment_method': 'Mileage Reimbursement',
17        'notes': f"{distance_miles:.1f} miles @ ${rate_per_mile}/mile. Purpose: {purpose}"
18    }
19    
20    return expense_data

Best Practices

1. Receipt Photo Quality

For best OCR results:

Good lighting: No shadows or glare
Flat surface: Avoid wrinkles or folds
Full receipt: Capture entire receipt, especially total line
Focus: Clear, not blurry
Contrast: Dark text on light background

2. Consistent Naming

Name receipt files descriptively:

Prompt

merchant_date.jpg
starbucks_20260110.jpg
shell_gas_20260109_52dollars.jpg

3. Regular Processing

Process receipts weekly, not monthly:

Easier to remember context
Catch errors early
Less overwhelming

4. Review Flagged Items

Always review expenses with low confidence scores before submitting reports.

5. Backup Your Data

Regularly backup your expenses.db file and processed receipts.

Troubleshooting

OCR Returns Garbage Text

Problem: Tesseract extracts gibberish instead of text.

Solutions:

Improve image quality (better lighting, flatter receipt)
Adjust preprocessing parameters (threshold values)
Try different OCR engines (Google Cloud Vision API, AWS Textract)

Wrong Total Amount

Problem: OCR extracts wrong total (subtotal, tax, etc.)

Solutions:

Improve total detection logic (look for "TOTAL" keyword more carefully)
Add manual review for amounts over certain threshold
Use AI to verify extracted totals

Poor Categorization

Problem: Expenses categorized incorrectly.

Solutions:

Expand keyword lists for each category
Use AI categorization for ambiguous cases
Add manual categorization review step

Frequently Asked Questions

Does this work with digital receipts (PDFs)? Yes! Add PDF parsing with PyPDF2 or pdfplumber. Extract text directly without OCR for better accuracy.

Can I connect this to QuickBooks or Xero? Yes! Both have APIs. Export expenses from your database and import via their APIs.

What about sales tax tracking? Extend the OCR parser to extract tax amounts separately. Add a tax field to the database schema.

Is Tesseract OCR accurate enough? For most printed receipts, yes (85-95% accuracy). For thermal receipts or handwritten notes, consider Google Cloud Vision API or AWS Textract (paid services with higher accuracy).

Can I use this for personal expenses too? Absolutely! Add a personal flag to distinguish business from personal expenses.

How do I handle split payments (part personal, part business)? Add a split_amount field and create two expense entries for split transactions.

Python Expense Tracking: Automate Receipt Scanning and Categorization

What You'll Build

Prerequisites

The Problem with Manual Expense Tracking

Step 1: Setting Up Your Environment

Step 2: Building the OCR Receipt Scanner

Step 3: Building the Intelligent Categorizer

Step 4: Generating Expense Reports

Step 5: Putting It All Together

Step 6: Running Your Expense Tracker

Advanced Features

Feature 1: Duplicate Detection

Feature 2: Email Integration

Feature 3: Mileage Tracking

Best Practices

1. Receipt Photo Quality

2. Consistent Naming

3. Regular Processing

4. Review Flagged Items

5. Backup Your Data

Troubleshooting

OCR Returns Garbage Text

Wrong Total Amount

Poor Categorization

Frequently Asked Questions

Share this article

Python Expense Tracking: Automate Receipt Scanning and Categorization

What You'll Build

Prerequisites

The Problem with Manual Expense Tracking

Step 1: Setting Up Your Environment

Step 2: Building the OCR Receipt Scanner

Step 3: Building the Intelligent Categorizer

Step 4: Generating Expense Reports

Step 5: Putting It All Together

Step 6: Running Your Expense Tracker

Advanced Features

Feature 1: Duplicate Detection

Feature 2: Email Integration

Feature 3: Mileage Tracking

Best Practices

1. Receipt Photo Quality

2. Consistent Naming

3. Regular Processing

4. Review Flagged Items

5. Backup Your Data

Troubleshooting

OCR Returns Garbage Text

Wrong Total Amount

Poor Categorization

Frequently Asked Questions

Share this article