Automate Customer Feedback Analysis with Python and NLP

You have 847 customer survey responses sitting in a spreadsheet. Your boss wants insights by tomorrow morning: What are customers happy about? What's frustrating them? What themes keep appearing? Which issues are urgent?

You could read all 847 responses manually. That's 6-8 hours of mind-numbing work, and you'll inevitably miss patterns. Or you could skim them and rely on gut feeling, risking confirmation bias and missed insights.

There's a better way.

What if Python could read all 847 responses in 60 seconds, automatically categorize them by sentiment and theme, identify your top 10 issues, and generate a executive summary with quotes and recommendations?

That's exactly what we're building today.

What You'll Build

By the end of this tutorial, you'll have a Python system that:

Imports feedback from CSV, Excel, Google Forms, or surveys
Analyzes sentiment (positive, negative, neutral) for each response
Extracts key themes using NLP clustering
Identifies urgent issues based on language patterns
Generates word clouds showing common topics
Creates executive reports with insights and recommendations
Tracks trends over time to see if issues are improving

Time savings: 6-8 hours of manual analysis → 5 minutes automated

Why Automate Feedback Analysis?

Manual feedback analysis has serious problems:

Scale issues:

Reading 100+ responses is exhausting
Patterns across thousands of responses are invisible
Quarterly feedback takes weeks to analyze

Bias problems:

You remember negative feedback disproportionately
Confirmation bias: You see what you expect
Recency bias: Recent feedback overshadows old

Consistency issues:

Different people categorize differently
Subjective interpretation varies
No standardized metrics

According to a 2025 customer experience study, companies that analyze feedback within 48 hours have 3.2x higher customer retention. But 68% of companies take over 2 weeks to analyze survey results—by then, the insights are stale and opportunities are lost.

Prerequisites

Python 3.8 or higher
Basic Python knowledge
Customer feedback data (CSV, Excel, or API)

Step 1: Set Up Your Environment

Install required libraries:

bash

1pip install pandas numpy matplotlib wordcloud textblob spacy scikit-learn openai python-dotenv openpyxl

Install spaCy language model:

bash

1python -m spacy download en_core_web_sm

Create project structure:

Prompt

feedback_analyzer/
├── main.py
├── sentiment_analyzer.py
├── theme_extractor.py
├── report_generator.py
├── data/
│   └── feedback.csv
├── output/
│   ├── reports/
│   └── visualizations/
└── .env

Step 2: Import and Clean Feedback Data

Create main.py:

python

1import pandas as pd
2import re
3from datetime import datetime
4from pathlib import Path
5
6class FeedbackLoader:
7    def __init__(self, data_path):
8        self.data_path = data_path
9    
10    def load_data(self, source_type='csv'):
11        """Load feedback from various sources"""
12        if source_type == 'csv':
13            return self._load_csv()
14        elif source_type == 'excel':
15            return self._load_excel()
16        elif source_type == 'google_forms':
17            return self._load_google_forms()
18        else:
19            raise ValueError(f"Unsupported source type: {source_type}")
20    
21    def _load_csv(self):
22        """Load from CSV file"""
23        df = pd.read_csv(self.data_path)
24        return self._clean_data(df)
25    
26    def _load_excel(self):
27        """Load from Excel file"""
28        df = pd.read_excel(self.data_path)
29        return self._clean_data(df)
30    
31    def _load_google_forms(self):
32        """Load from Google Forms (requires API setup)"""
33        # Implementation using Google Sheets API
34        # For this tutorial, we'll use CSV export from Forms
35        pass
36    
37    def _clean_data(self, df):
38        """Clean and standardize feedback data"""
39        # Ensure required columns exist
40        required_cols = ['feedback', 'date']
41        
42        # Try to identify feedback column (flexible naming)
43        feedback_col = None
44        for col in df.columns:
45            col_lower = col.lower()
46            if any(term in col_lower for term in ['feedback', 'comment', 'response', 'review', 'text']):
47                feedback_col = col
48                break
49        
50        if not feedback_col:
51            raise ValueError("No feedback column found. Expected column with 'feedback', 'comment', or 'response' in name.")
52        
53        # Rename to standard name
54        df = df.rename(columns={feedback_col: 'feedback'})
55        
56        # Handle date column
57        date_col = None
58        for col in df.columns:
59            if any(term in col.lower() for term in ['date', 'timestamp', 'time', 'submitted']):
60                date_col = col
61                break
62        
63        if date_col and date_col != 'date':
64            df = df.rename(columns={date_col: 'date'})
65        elif 'date' not in df.columns:
66            # Add current date if no date column
67            df['date'] = datetime.now().strftime('%Y-%m-%d')
68        
69        # Clean feedback text
70        df['feedback'] = df['feedback'].astype(str)
71        df['feedback'] = df['feedback'].apply(self._clean_text)
72        
73        # Remove empty feedback
74        df = df[df['feedback'].str.len() > 10]  # At least 10 characters
75        
76        # Add unique ID
77        df['id'] = range(1, len(df) + 1)
78        
79        return df
80    
81    def _clean_text(self, text):
82        """Clean individual feedback text"""
83        if pd.isna(text) or text == 'nan':
84            return ""
85        
86        # Remove extra whitespace
87        text = ' '.join(text.split())
88        
89        # Remove URLs
90        text = re.sub(r'http\S+|www\S+', '', text)
91        
92        # Remove email addresses
93        text = re.sub(r'\S+@\S+', '', text)
94        
95        # Remove special characters (keep punctuation for sentiment)
96        text = re.sub(r'[^\w\s.,!?-]', '', text)
97        
98        return text.strip()
99
100
101# Example usage
102if __name__ == "__main__":
103    loader = FeedbackLoader('data/feedback.csv')
104    df = loader.load_data('csv')
105    
106    print(f"Loaded {len(df)} feedback responses")
107    print(f"\nSample feedback:")
108    print(df[['id', 'date', 'feedback']].head())

Sample feedback data structure (feedback.csv):

csv

1date,customer_name,feedback,rating
22026-01-01,John Doe,"Love the product! Fast shipping and great quality.",5
32026-01-02,Jane Smith,"Disappointed with customer service. Had to wait 2 hours on hold.",2
42026-01-03,Bob Johnson,"Good product but setup instructions were confusing.",3

Step 3: Build the Sentiment Analyzer

Create sentiment_analyzer.py:

python

1from textblob import TextBlob
2import spacy
3from typing import Dict, List
4
5nlp = spacy.load('en_core_web_sm')
6
7class SentimentAnalyzer:
8    def __init__(self):
9        self.sentiment_labels = {
10            'positive': (0.1, 1.0),
11            'neutral': (-0.1, 0.1),
12            'negative': (-1.0, -0.1)
13        }
14    
15    def analyze_sentiment(self, text: str) -> Dict:
16        """Analyze sentiment of feedback text"""
17        # Use TextBlob for basic sentiment
18        blob = TextBlob(text)
19        polarity = blob.sentiment.polarity  # -1 (negative) to 1 (positive)
20        subjectivity = blob.sentiment.subjectivity  # 0 (objective) to 1 (subjective)
21        
22        # Classify sentiment
23        sentiment_label = self._classify_sentiment(polarity)
24        
25        # Extract entities (people, products, features mentioned)
26        doc = nlp(text)
27        entities = [ent.text for ent in doc.ents if ent.label_ in ['PRODUCT', 'ORG', 'PERSON']]
28        
29        return {
30            'polarity': polarity,
31            'subjectivity': subjectivity,
32            'sentiment': sentiment_label,
33            'entities': entities,
34            'is_urgent': self._check_urgency(text, polarity)
35        }
36    
37    def _classify_sentiment(self, polarity: float) -> str:
38        """Classify polarity score into sentiment label"""
39        for label, (min_val, max_val) in self.sentiment_labels.items():
40            if min_val <= polarity <= max_val:
41                return label
42        return 'neutral'
43    
44    def _check_urgency(self, text: str, polarity: float) -> bool:
45        """Detect if feedback requires urgent attention"""
46        urgent_keywords = [
47            'angry', 'furious', 'terrible', 'horrible', 'worst',
48            'lawsuit', 'lawyer', 'refund', 'cancel', 'unacceptable',
49            'disappointed', 'frustrated', 'failed', 'broken', 'dangerous'
50        ]
51        
52        text_lower = text.lower()
53        
54        # Check for urgent keywords
55        has_urgent_keyword = any(keyword in text_lower for keyword in urgent_keywords)
56        
57        # Check for extreme negative sentiment
58        is_very_negative = polarity < -0.5
59        
60        return has_urgent_keyword or is_very_negative
61    
62    def analyze_batch(self, feedback_list: List[str]) -> List[Dict]:
63        """Analyze sentiment for multiple feedback entries"""
64        results = []
65        
66        for i, text in enumerate(feedback_list):
67            try:
68                result = self.analyze_sentiment(text)
69                result['index'] = i
70                results.append(result)
71            except Exception as e:
72                print(f"Error analyzing feedback {i}: {e}")
73                results.append({
74                    'index': i,
75                    'polarity': 0,
76                    'subjectivity': 0,
77                    'sentiment': 'error',
78                    'entities': [],
79                    'is_urgent': False
80                })
81        
82        return results
83
84
85# Example usage
86if __name__ == "__main__":
87    analyzer = SentimentAnalyzer()
88    
89    test_feedback = [
90        "I absolutely love this product! Best purchase ever.",
91        "The service was okay, nothing special.",
92        "Terrible experience. I want my money back immediately."
93    ]
94    
95    results = analyzer.analyze_batch(test_feedback)
96    
97    for i, result in enumerate(results):
98        print(f"\nFeedback {i+1}:")
99        print(f"  Sentiment: {result['sentiment']}")
100        print(f"  Polarity: {result['polarity']:.2f}")
101        print(f"  Urgent: {result['is_urgent']}")

Step 4: Extract Themes and Topics

Create theme_extractor.py:

python

1from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
2from sklearn.decomposition import LatentDirichletAllocation
3from sklearn.cluster import KMeans
4import numpy as np
5from collections import Counter
6import re
7
8class ThemeExtractor:
9    def __init__(self, n_themes=5):
10        self.n_themes = n_themes
11        self.vectorizer = TfidfVectorizer(
12            max_features=100,
13            stop_words='english',
14            ngram_range=(1, 2)  # Include bigrams
15        )
16    
17    def extract_themes(self, feedback_list: List[str]) -> Dict:
18        """Extract main themes from feedback using topic modeling"""
19        
20        if len(feedback_list) < self.n_themes:
21            return {'error': 'Not enough feedback for theme extraction'}
22        
23        # Create document-term matrix
24        doc_term_matrix = self.vectorizer.fit_transform(feedback_list)
25        
26        # Topic modeling with LDA
27        lda = LatentDirichletAllocation(
28            n_components=self.n_themes,
29            random_state=42,
30            max_iter=20
31        )
32        lda.fit(doc_term_matrix)
33        
34        # Extract top words for each theme
35        feature_names = self.vectorizer.get_feature_names_out()
36        themes = []
37        
38        for topic_idx, topic in enumerate(lda.components_):
39            top_words_idx = topic.argsort()[-10:][::-1]
40            top_words = [feature_names[i] for i in top_words_idx]
41            
42            # Generate theme name from top words
43            theme_name = self._generate_theme_name(top_words)
44            
45            themes.append({
46                'id': topic_idx + 1,
47                'name': theme_name,
48                'keywords': top_words[:5],
49                'weight': float(topic.sum())
50            })
51        
52        # Sort themes by weight
53        themes.sort(key=lambda x: x['weight'], reverse=True)
54        
55        return {
56            'themes': themes,
57            'n_feedback': len(feedback_list)
58        }
59    
60    def _generate_theme_name(self, top_words: List[str]) -> str:
61        """Generate human-readable theme name from top words"""
62        # Simple heuristic: use most common nouns/adjectives
63        theme_keywords = {
64            ('product', 'quality', 'item'): 'Product Quality',
65            ('shipping', 'delivery', 'arrived'): 'Shipping & Delivery',
66            ('service', 'support', 'help'): 'Customer Service',
67            ('price', 'cost', 'expensive', 'cheap'): 'Pricing',
68            ('website', 'app', 'interface'): 'User Experience',
69            ('easy', 'simple', 'complicated'): 'Ease of Use',
70            ('problem', 'issue', 'bug', 'error'): 'Technical Issues',
71            ('refund', 'return', 'exchange'): 'Returns & Refunds',
72            ('recommend', 'love', 'amazing'): 'Recommendations',
73            ('disappointed', 'frustrated', 'angry'): 'Complaints'
74        }
75        
76        top_words_set = set(word.lower() for word in top_words)
77        
78        for keywords, name in theme_keywords.items():
79            if any(keyword in top_words_set for keyword in keywords):
80                return name
81        
82        # Fallback: capitalize first two words
83        return ' & '.join(top_words[:2]).title()
84    
85    def extract_keywords(self, feedback_list: List[str], top_n=20) -> List[tuple]:
86        """Extract most common keywords/phrases"""
87        # Combine all feedback
88        all_text = ' '.join(feedback_list)
89        
90        # Extract n-grams
91        vectorizer = CountVectorizer(
92            ngram_range=(1, 3),
93            stop_words='english',
94            max_features=top_n
95        )
96        
97        word_count_matrix = vectorizer.fit_transform(feedback_list)
98        word_counts = word_count_matrix.sum(axis=0)
99        
100        # Get keywords and counts
101        keywords = vectorizer.get_feature_names_out()
102        counts = np.asarray(word_counts).flatten()
103        
104        # Sort by frequency
105        keyword_freq = sorted(zip(keywords, counts), key=lambda x: x[1], reverse=True)
106        
107        return keyword_freq[:top_n]
108    
109    def categorize_feedback(self, feedback_list: List[str], categories: Dict[str, List[str]]) -> Dict:
110        """Categorize feedback based on predefined categories and keywords"""
111        
112        results = {category: [] for category in categories.keys()}
113        results['uncategorized'] = []
114        
115        for idx, text in enumerate(feedback_list):
116            text_lower = text.lower()
117            categorized = False
118            
119            for category, keywords in categories.items():
120                if any(keyword in text_lower for keyword in keywords):
121                    results[category].append(idx)
122                    categorized = True
123                    break
124            
125            if not categorized:
126                results['uncategorized'].append(idx)
127        
128        # Calculate percentages
129        total = len(feedback_list)
130        summary = {
131            category: {
132                'count': len(indices),
133                'percentage': (len(indices) / total * 100) if total > 0 else 0,
134                'indices': indices
135            }
136            for category, indices in results.items()
137        }
138        
139        return summary
140
141
142# Example usage
143if __name__ == "__main__":
144    extractor = ThemeExtractor(n_themes=5)
145    
146    sample_feedback = [
147        "Great product, fast shipping!",
148        "Customer service was helpful and responsive",
149        "The item arrived damaged, very disappointed",
150        "Easy to use, works as expected",
151        "Website is confusing, hard to find what I need",
152        "Love the quality, will buy again",
153        "Shipping took too long, 3 weeks delay",
154        "Support team solved my issue quickly",
155        "Expensive but worth the price",
156        "Product broke after one week"
157    ]
158    
159    # Extract themes
160    themes_result = extractor.extract_themes(sample_feedback)
161    print("Themes found:")
162    for theme in themes_result['themes']:
163        print(f"  {theme['name']}: {', '.join(theme['keywords'])}")
164    
165    # Extract keywords
166    keywords = extractor.extract_keywords(sample_feedback, top_n=10)
167    print("\nTop keywords:")
168    for keyword, count in keywords:
169        print(f"  {keyword}: {count}")
170    
171    # Categorize feedback
172    categories = {
173        'Product Quality': ['product', 'quality', 'broke', 'damaged'],
174        'Shipping': ['shipping', 'delivery', 'arrived'],
175        'Customer Service': ['service', 'support', 'help'],
176        'Price': ['expensive', 'cheap', 'price', 'cost']
177    }
178    
179    categorization = extractor.categorize_feedback(sample_feedback, categories)
180    print("\nCategorization:")
181    for category, data in categorization.items():
182        print(f"  {category}: {data['count']} ({data['percentage']:.1f}%)")

Step 5: Generate Visual Reports

Create report_generator.py:

python

1import pandas as pd
2import matplotlib.pyplot as plt
3from wordcloud import WordCloud
4import seaborn as sns
5from pathlib import Path
6from datetime import datetime
7
8class ReportGenerator:
9    def __init__(self, output_dir='output/'):
10        self.output_dir = Path(output_dir)
11        self.output_dir.mkdir(exist_ok=True)
12        
13        # Set style
14        sns.set_style("whitegrid")
15        plt.rcParams['figure.figsize'] = (12, 6)
16    
17    def generate_word_cloud(self, text: str, title: str, output_filename: str):
18        """Generate word cloud visualization"""
19        wordcloud = WordCloud(
20            width=800,
21            height=400,
22            background_color='white',
23            colormap='viridis',
24            max_words=100
25        ).generate(text)
26        
27        plt.figure(figsize=(12, 6))
28        plt.imshow(wordcloud, interpolation='bilinear')
29        plt.axis('off')
30        plt.title(title, fontsize=16, fontweight='bold')
31        plt.tight_layout()
32        
33        output_path = self.output_dir / 'visualizations' / output_filename
34        output_path.parent.mkdir(exist_ok=True)
35        plt.savefig(output_path, dpi=300, bbox_inches='tight')
36        plt.close()
37        
38        return output_path
39    
40    def generate_sentiment_distribution(self, sentiment_counts: Dict, output_filename: str):
41        """Generate sentiment distribution chart"""
42        sentiments = list(sentiment_counts.keys())
43        counts = list(sentiment_counts.values())
44        
45        colors = {
46            'positive': '#2ecc71',
47            'neutral': '#f39c12',
48            'negative': '#e74c3c'
49        }
50        
51        bar_colors = [colors.get(s, '#95a5a6') for s in sentiments]
52        
53        plt.figure(figsize=(10, 6))
54        bars = plt.bar(sentiments, counts, color=bar_colors, alpha=0.8)
55        
56        # Add value labels on bars
57        for bar in bars:
58            height = bar.get_height()
59            plt.text(bar.get_x() + bar.get_width()/2., height,
60                    f'{int(height)}',
61                    ha='center', va='bottom', fontsize=12, fontweight='bold')
62        
63        plt.title('Sentiment Distribution', fontsize=16, fontweight='bold')
64        plt.xlabel('Sentiment', fontsize=12)
65        plt.ylabel('Number of Responses', fontsize=12)
66        plt.tight_layout()
67        
68        output_path = self.output_dir / 'visualizations' / output_filename
69        plt.savefig(output_path, dpi=300, bbox_inches='tight')
70        plt.close()
71        
72        return output_path
73    
74    def generate_theme_breakdown(self, themes: List[Dict], output_filename: str):
75        """Generate theme breakdown chart"""
76        theme_names = [theme['name'] for theme in themes]
77        weights = [theme['weight'] for theme in themes]
78        
79        plt.figure(figsize=(12, 6))
80        bars = plt.barh(theme_names, weights, color='#3498db', alpha=0.8)
81        
82        # Add value labels
83        for i, bar in enumerate(bars):
84            width = bar.get_width()
85            plt.text(width, bar.get_y() + bar.get_height()/2.,
86                    f'{weights[i]:.1f}',
87                    ha='left', va='center', fontsize=10)
88        
89        plt.title('Top Themes in Customer Feedback', fontsize=16, fontweight='bold')
90        plt.xlabel('Theme Weight', fontsize=12)
91        plt.tight_layout()
92        
93        output_path = self.output_dir / 'visualizations' / output_filename
94        plt.savefig(output_path, dpi=300, bbox_inches='tight')
95        plt.close()
96        
97        return output_path
98    
99    def generate_executive_summary(self, analysis_results: Dict, output_filename: str):
100        """Generate executive summary report"""
101        report_path = self.output_dir / 'reports' / output_filename
102        report_path.parent.mkdir(exist_ok=True)
103        
104        with open(report_path, 'w', encoding='utf-8') as f:
105            f.write("=" * 80 + "\n")
106            f.write(" " * 20 + "CUSTOMER FEEDBACK ANALYSIS REPORT\n")
107            f.write(" " * 25 + f"{datetime.now().strftime('%B %d, %Y')}\n")
108            f.write("=" * 80 + "\n\n")
109            
110            # Overview
111            f.write("EXECUTIVE SUMMARY\n")
112            f.write("-" * 80 + "\n")
113            f.write(f"Total Responses Analyzed: {analysis_results['total_responses']}\n")
114            f.write(f"Analysis Period: {analysis_results.get('date_range', 'All time')}\n\n")
115            
116            # Sentiment breakdown
117            f.write("SENTIMENT BREAKDOWN\n")
118            f.write("-" * 80 + "\n")
119            sentiment_data = analysis_results['sentiment']
120            for sentiment, count in sentiment_data.items():
121                percentage = (count / analysis_results['total_responses'] * 100)
122                f.write(f"  {sentiment.capitalize()}: {count} ({percentage:.1f}%)\n")
123            f.write("\n")
124            
125            # Urgent issues
126            urgent_count = analysis_results.get('urgent_count', 0)
127            if urgent_count > 0:
128                f.write("⚠️  URGENT ATTENTION REQUIRED\n")
129                f.write("-" * 80 + "\n")
130                f.write(f"{urgent_count} responses require immediate attention.\n\n")
131                
132                urgent_feedback = analysis_results.get('urgent_feedback', [])
133                for i, feedback in enumerate(urgent_feedback[:5], 1):
134                    f.write(f"{i}. \"{feedback['text'][:200]}...\"\n")
135                    f.write(f"   Sentiment: {feedback['sentiment']} | Urgency Score: {feedback['urgency']:.2f}\n\n")
136            
137            # Top themes
138            f.write("TOP THEMES\n")
139            f.write("-" * 80 + "\n")
140            themes = analysis_results.get('themes', [])
141            for i, theme in enumerate(themes[:5], 1):
142                f.write(f"{i}. {theme['name']}\n")
143                f.write(f"   Keywords: {', '.join(theme['keywords'])}\n")
144                f.write(f"   Frequency: {theme['weight']:.1f}\n\n")
145            
146            # Key insights
147            f.write("KEY INSIGHTS\n")
148            f.write("-" * 80 + "\n")
149            insights = analysis_results.get('insights', [])
150            for i, insight in enumerate(insights, 1):
151                f.write(f"{i}. {insight}\n")
152            f.write("\n")
153            
154            # Recommendations
155            f.write("RECOMMENDATIONS\n")
156            f.write("-" * 80 + "\n")
157            recommendations = analysis_results.get('recommendations', [])
158            for i, rec in enumerate(recommendations, 1):
159                f.write(f"{i}. {rec}\n")
160            
161            f.write("\n" + "=" * 80 + "\n")
162            f.write("End of Report\n")
163        
164        return report_path
165
166
167# Example usage
168if __name__ == "__main__":
169    generator = ReportGenerator()
170    
171    # Sample data
172    sentiment_counts = {
173        'positive': 450,
174        'neutral': 250,
175        'negative': 147
176    }
177    
178    themes = [
179        {'name': 'Product Quality', 'keywords': ['quality', 'durable', 'well-made'], 'weight': 45.2},
180        {'name': 'Customer Service', 'keywords': ['support', 'helpful', 'responsive'], 'weight': 38.7},
181        {'name': 'Shipping', 'keywords': ['delivery', 'fast', 'arrived'], 'weight': 32.1},
182    ]
183    
184    # Generate visualizations
185    generator.generate_sentiment_distribution(sentiment_counts, 'sentiment_dist.png')
186    generator.generate_theme_breakdown(themes, 'themes.png')

Step 6: Putting It All Together

Update main.py with complete workflow:

python

1from feedback_loader import FeedbackLoader
2from sentiment_analyzer import SentimentAnalyzer
3from theme_extractor import ThemeExtractor
4from report_generator import ReportGenerator
5import pandas as pd
6from datetime import datetime
7
8def analyze_customer_feedback(data_path, output_dir='output/'):
9    """Complete feedback analysis workflow"""
10    
11    print("=" * 60)
12    print("CUSTOMER FEEDBACK ANALYSIS SYSTEM")
13    print("=" * 60)
14    print()
15    
16    # Step 1: Load data
17    print("[1/5] Loading feedback data...")
18    loader = FeedbackLoader(data_path)
19    df = loader.load_data('csv')
20    print(f"✓ Loaded {len(df)} responses\n")
21    
22    # Step 2: Analyze sentiment
23    print("[2/5] Analyzing sentiment...")
24    analyzer = SentimentAnalyzer()
25    sentiment_results = analyzer.analyze_batch(df['feedback'].tolist())
26    
27    # Add results to dataframe
28    df['polarity'] = [r['polarity'] for r in sentiment_results]
29    df['sentiment'] = [r['sentiment'] for r in sentiment_results]
30    df['is_urgent'] = [r['is_urgent'] for r in sentiment_results]
31    
32    sentiment_counts = df['sentiment'].value_counts().to_dict()
33    urgent_count = df['is_urgent'].sum()
34    
35    print(f"✓ Sentiment breakdown:")
36    for sentiment, count in sentiment_counts.items():
37        print(f"  {sentiment.capitalize()}: {count}")
38    print(f"✓ Urgent responses: {urgent_count}\n")
39    
40    # Step 3: Extract themes
41    print("[3/5] Extracting themes...")
42    extractor = ThemeExtractor(n_themes=5)
43    themes_result = extractor.extract_themes(df['feedback'].tolist())
44    themes = themes_result.get('themes', [])
45    
46    print(f"✓ Found {len(themes)} main themes:")
47    for theme in themes[:3]:
48        print(f"  - {theme['name']}")
49    print()
50    
51    # Step 4: Generate visualizations
52    print("[4/5] Generating visualizations...")
53    generator = ReportGenerator(output_dir)
54    
55    # Word cloud
56    all_feedback_text = ' '.join(df['feedback'].tolist())
57    generator.generate_word_cloud(
58        all_feedback_text,
59        'Customer Feedback Word Cloud',
60        'wordcloud.png'
61    )
62    
63    # Sentiment distribution
64    generator.generate_sentiment_distribution(
65        sentiment_counts,
66        'sentiment_distribution.png'
67    )
68    
69    # Theme breakdown
70    generator.generate_theme_breakdown(
71        themes,
72        'themes_breakdown.png'
73    )
74    
75    print("✓ Visualizations created\n")
76    
77    # Step 5: Generate report
78    print("[5/5] Generating executive summary...")
79    
80    # Get urgent feedback examples
81    urgent_feedback = df[df['is_urgent'] == True][['feedback', 'sentiment', 'polarity']].head(10)
82    urgent_examples = [
83        {
84            'text': row['feedback'],
85            'sentiment': row['sentiment'],
86            'urgency': abs(row['polarity'])
87        }
88        for _, row in urgent_feedback.iterrows()
89    ]
90    
91    # Generate insights
92    insights = generate_insights(df, themes, sentiment_counts)
93    
94    # Generate recommendations
95    recommendations = generate_recommendations(df, themes, sentiment_counts, urgent_count)
96    
97    analysis_results = {
98        'total_responses': len(df),
99        'sentiment': sentiment_counts,
100        'urgent_count': urgent_count,
101        'urgent_feedback': urgent_examples,
102        'themes': themes,
103        'insights': insights,
104        'recommendations': recommendations,
105        'date_range': f"{df['date'].min()} to {df['date'].max()}"
106    }
107    
108    report_path = generator.generate_executive_summary(
109        analysis_results,
110        f'feedback_report_{datetime.now().strftime("%Y%m%d")}.txt'
111    )
112    
113    print(f"✓ Report generated: {report_path}\n")
114    
115    # Save detailed data
116    output_csv = Path(output_dir) / 'reports' / f'detailed_analysis_{datetime.now().strftime("%Y%m%d")}.csv'
117    df.to_csv(output_csv, index=False)
118    print(f"✓ Detailed data exported: {output_csv}\n")
119    
120    print("=" * 60)
121    print("ANALYSIS COMPLETE")
122    print("=" * 60)
123    
124    return analysis_results
125
126
127def generate_insights(df, themes, sentiment_counts):
128    """Generate key insights from analysis"""
129    insights = []
130    
131    total = len(df)
132    positive_pct = (sentiment_counts.get('positive', 0) / total * 100)
133    negative_pct = (sentiment_counts.get('negative', 0) / total * 100)
134    
135    # Sentiment insights
136    if positive_pct > 70:
137        insights.append(f"Overall sentiment is highly positive ({positive_pct:.1f}%), indicating strong customer satisfaction.")
138    elif negative_pct > 30:
139        insights.append(f"Significant negative sentiment detected ({negative_pct:.1f}%), requiring immediate attention.")
140    
141    # Theme insights
142    if themes:
143        top_theme = themes[0]
144        insights.append(f"'{top_theme['name']}' is the most discussed topic, mentioned in approximately {top_theme['weight']:.0f} responses.")
145    
146    # Urgency insight
147    urgent_count = df['is_urgent'].sum()
148    if urgent_count > 0:
149        insights.append(f"{urgent_count} responses contain urgent language requiring immediate follow-up.")
150    
151    return insights
152
153
154def generate_recommendations(df, themes, sentiment_counts, urgent_count):
155    """Generate actionable recommendations"""
156    recommendations = []
157    
158    # Urgent issues
159    if urgent_count > 0:
160        recommendations.append(f"PRIORITY: Address {urgent_count} urgent customer issues within 24 hours.")
161    
162    # Negative sentiment
163    negative_pct = (sentiment_counts.get('negative', 0) / len(df) * 100)
164    if negative_pct > 20:
165        recommendations.append("Implement customer retention campaign targeting dissatisfied customers.")
166    
167    # Theme-based recommendations
168    for theme in themes[:3]:
169        if 'service' in theme['name'].lower():
170            recommendations.append("Review customer service processes and consider additional training for support team.")
171        elif 'shipping' in theme['name'].lower():
172            recommendations.append("Evaluate shipping partnerships and delivery times to improve logistics.")
173        elif 'product' in theme['name'].lower():
174            recommendations.append("Conduct product quality review and gather detailed feedback on specific issues.")
175    
176    return recommendations
177
178
179# Run analysis
180if __name__ == "__main__":
181    results = analyze_customer_feedback('data/feedback.csv')

Step 7: Advanced Features

Feature 1: Trend Analysis Over Time

Track how sentiment changes month-over-month:

python

1def analyze_trends(df):
2    """Analyze sentiment trends over time"""
3    df['month'] = pd.to_datetime(df['date']).dt.to_period('M')
4    
5    monthly_sentiment = df.groupby(['month', 'sentiment']).size().unstack(fill_value=0)
6    
7    # Calculate percentage positive
8    monthly_sentiment['positive_pct'] = (
9        monthly_sentiment['positive'] / monthly_sentiment.sum(axis=1) * 100
10    )
11    
12    return monthly_sentiment

Feature 2: AI-Powered Insights with GPT

Use OpenAI to generate human-like insights:

python

1import openai
2import os
3
4def generate_ai_insights(feedback_sample, themes):
5    """Use GPT to generate insights"""
6    openai.api_key = os.getenv('OPENAI_API_KEY')
7    
8    prompt = f"""
9    Analyze this customer feedback and provide 3 key insights:
10    
11    Sample feedback:
12    {feedback_sample[:1000]}
13    
14    Main themes:
15    {', '.join([t['name'] for t in themes])}
16    
17    Provide actionable business insights in bullet points.
18    """
19    
20    response = openai.ChatCompletion.create(
21        model="gpt-3.5-turbo",
22        messages=[
23            {"role": "system", "content": "You are a customer experience analyst."},
24            {"role": "user", "content": prompt}
25        ],
26        temperature=0.7,
27        max_tokens=300
28    )
29    
30    return response.choices[0].message.content

Feature 3: Alert System

Send alerts for urgent issues:

python

1def send_urgent_alert(urgent_feedback):
2    """Send email/Slack alert for urgent issues"""
3    # Example using email
4    import smtplib
5    from email.mime.text import MIMEText
6    
7    if len(urgent_feedback) == 0:
8        return
9    
10    message = f"""
11    URGENT: {len(urgent_feedback)} critical customer feedback items detected.
12    
13    Immediate action required for:
14    {urgent_feedback[['feedback', 'sentiment']].to_string()}
15    
16    Please review and respond within 24 hours.
17    """
18    
19    # Send email (configure SMTP settings)
20    # send_email(to='support@company.com', subject='Urgent Customer Feedback', body=message)

Best Practices

1. Regular Analysis Schedule

Run analysis:

Daily: For high-volume businesses with urgent issues
Weekly: For moderate feedback volume
Monthly: For comprehensive trend analysis

2. Feedback Quality

Ensure feedback is actionable:

Ask specific questions in surveys
Provide open-ended text fields
Include context (product, transaction date, etc.)

3. Act on Insights

Analysis is worthless without action:

Assign urgent issues to support team immediately
Track response time and resolution rate
Follow up with customers who had negative experiences

4. Iterate on Categories

Refine your theme categories over time:

Add new categories as products/services evolve
Update keywords based on actual feedback language
Remove irrelevant categories

Frequently Asked Questions

Can this analyze feedback in languages other than English? Yes! Replace spaCy model with language-specific models (e.g., es_core_web_sm for Spanish). TextBlob supports multiple languages with translation.

How accurate is sentiment analysis? Typically 70-85% accurate. For critical decisions, review flagged items manually. Sarcasm and context can confuse algorithms.

What if I have 100,000+ responses? Use data sampling or distributed processing (Apache Spark, Dask). Analyze most recent 10K for trends, full dataset for deep insights quarterly.

Can I integrate this with Zendesk/Salesforce? Yes! Both have APIs. Replace CSV import with API calls to fetch tickets/cases automatically.

How do I handle multilingual feedback? Detect language first (using langdetect library), then route to appropriate language model or translate to English before analysis.

What about emojis and internet slang? Preprocess to convert emojis to text (😊 → "happy") and expand slang ("gr8" → "great") using libraries like emoji and custom dictionaries.

Automate Customer Feedback Analysis with Python and NLP

What You'll Build

Why Automate Feedback Analysis?

Prerequisites

Step 1: Set Up Your Environment

Step 2: Import and Clean Feedback Data

Step 3: Build the Sentiment Analyzer

Step 4: Extract Themes and Topics

Step 5: Generate Visual Reports

Step 6: Putting It All Together

Step 7: Advanced Features

Feature 1: Trend Analysis Over Time

Feature 2: AI-Powered Insights with GPT

Feature 3: Alert System

Best Practices

1. Regular Analysis Schedule

2. Feedback Quality

3. Act on Insights

4. Iterate on Categories

Frequently Asked Questions

Share this article