Working with AI Responses
This section covers what to do after you receive a response—refining, testing, and iterating to get the best results.
🔄 Iterate and Refine
LLMs generate different outputs each time. You’ll often get the best results by iterating—treat the process like a conversation:
- Reword your prompts
- Provide feedback on what worked or didn’t
- Ask the model to fix errors or improve clarity - may be valuable to incorporate a few rows of
df.head()in your prompts
- Review the code and suggest changes—remember, you can keep going back and forth
In long conversations, LLMs can sometimes lose track of instructions or hit message limits.
If the model seems confused or the outputs start to drift, just start a new conversation. Pick up from the last working code snippet and continue refining from there.
✅ Run and Validate Test Cases
When the AI provides code with test cases (as covered in Guiding AI Behavior), make sure to actually run them. This helps you verify the code works correctly—not just that it runs without errors.
def calculate_percentage_change(old_value, new_value):
"""Calculate percentage change between two values."""
if old_value == 0:
if new_value == 0:
return 0.0
return float('inf') # or raise an error
return ((new_value - old_value) / old_value) * 100
# Test cases provided by AI - RUN THESE!
assert calculate_percentage_change(100, 150) == 50.0
assert calculate_percentage_change(200, 150) == -25.0
assert calculate_percentage_change(50, 50) == 0.0
print("All tests passed!")What to do when running tests:
- Run each test individually first to see what happens
- Check if results make sense - does 50% increase from 100 to 150 seem right?
- Test with your own data - try values from your actual dataset
- Look for edge cases the AI might have missed
🔍 Validate the Output
Remember: LLMs predict what code looks correct—they don’t test it or understand your data. Always:
- Read the code – Does it do what you asked?
- Run the code – Does it execute without errors?
- Check the results – Do the outputs make sense?
- Test edge cases – What happens with missing values or unexpected data?
This connects back to Principle 2: Always Validate AI Outputs.
🐛 Debug When Things Go Wrong
Debugging is another skill you need to learn—we’ll dive deeper into debugging strategies later on in Debugging Code.
🎯 Improve Code Quality
Once your analysis works, you can ask the AI to improve it:
Ask for Better Error Handling
When you have working code but realize it could break with unexpected inputs, ask the AI to add proper error handling. This makes your code more robust and prevents crashes:
def calculate_correlation(df, col1, col2):
return df[col1].corr(df[col2])“Add error handling for missing columns and invalid data”
def calculate_correlation(df, col1, col2):
if col1 not in df.columns:
raise ValueError(f"Column '{col1}' not found in dataframe")
if col2 not in df.columns:
raise ValueError(f"Column '{col2}' not found in dataframe")
# Check for sufficient non-null values
valid_pairs = df[[col1, col2]].dropna()
if len(valid_pairs) < 2:
raise ValueError("Need at least 2 valid data points for correlation")
return valid_pairs[col1].corr(valid_pairs[col2])Request Documentation and Type Hints
Good code should be self-documenting. When you have working functions but they lack documentation, ask the AI to add proper docstrings and type hints:
def summarize_sales_by_region(df):
return df.groupby('region')['sales'].sum().to_dict()“Add docstrings and type hints to this analysis function”
import pandas as pd
from typing import Dict
def summarize_sales_by_region(df: pd.DataFrame) -> Dict[str, float]:
"""Calculate total sales by region.
Args:
df: DataFrame with 'region' and 'sales' columns
Returns:
Dictionary mapping region names to total sales
Raises:
KeyError: If required columns are missing
ValueError: If sales data contains non-numeric values
"""
required_cols = ['region', 'sales']
missing_cols = [col for col in required_cols if col not in df.columns]
if missing_cols:
raise KeyError(f"Missing required columns: {missing_cols}")
return df.groupby('region')['sales'].sum().to_dict()Optimize for Large Datasets
When your code works with small datasets but might struggle with larger ones, ask the AI to optimize for memory efficiency and performance:
def process_large_dataset(filename):
df = pd.read_csv(filename) # Could be huge!
return df.groupby('category').sum()“Make this more memory-efficient for large datasets”
def process_large_dataset(filename, chunksize=10000):
"""Process large CSV file in chunks to save memory."""
result = None
for chunk in pd.read_csv(filename, chunksize=chunksize):
chunk_result = chunk.groupby('category').sum()
if result is None:
result = chunk_result
else:
result = result.add(chunk_result, fill_value=0)
return result📊 Enhance Data Analysis Quality
Once your data analysis works, you can ask the AI to make it more robust and insightful:
Add Data Validation and Cleaning
When you have working analysis code but realize it could break with messy real-world data, ask the AI to add proper data validation. This makes your analysis more reliable and prevents unexpected results:
def analyze_sales_trends(df):
monthly_sales = df.groupby('month')['sales'].sum()
return monthly_sales.plot()“Add data validation and cleaning for missing values and outliers”
def analyze_sales_trends(df):
# Validate required columns
if 'month' not in df.columns or 'sales' not in df.columns:
raise ValueError("DataFrame must contain 'month' and 'sales' columns")
# Clean the data
df_clean = df.copy()
df_clean = df_clean.dropna(subset=['month', 'sales'])
# Remove outliers (sales > 3 standard deviations from mean)
sales_mean = df_clean['sales'].mean()
sales_std = df_clean['sales'].std()
df_clean = df_clean[abs(df_clean['sales'] - sales_mean) <= 3 * sales_std]
monthly_sales = df_clean.groupby('month')['sales'].sum()
return monthly_sales.plot()Request Statistical Context
Good data analysis should include statistical context. When you have basic calculations but want to understand their significance, ask the AI to add statistical measures and interpretation:
def compare_groups(df, group_col, value_col):
return df.groupby(group_col)[value_col].mean()“Add statistical significance testing and confidence intervals”
import scipy.stats as stats
import numpy as np
def compare_groups(df, group_col, value_col):
"""Compare groups with statistical context.
Returns:
dict: Group means, confidence intervals, and statistical tests
"""
groups = df.groupby(group_col)[value_col]
results = {}
for name, group in groups:
mean_val = group.mean()
std_err = stats.sem(group) # Standard error of mean
ci = stats.t.interval(0.95, len(group)-1, mean_val, std_err)
results[name] = {
'mean': mean_val,
'std': group.std(),
'count': len(group),
'confidence_interval_95': ci
}
# Add statistical test if comparing two groups
group_names = list(results.keys())
if len(group_names) == 2:
group1_data = df[df[group_col] == group_names[0]][value_col]
group2_data = df[df[group_col] == group_names[1]][value_col]
t_stat, p_value = stats.ttest_ind(group1_data, group2_data)
results['statistical_test'] = {
't_statistic': t_stat,
'p_value': p_value,
'significant': p_value < 0.05
}
return resultsOptimize for Exploratory Analysis
When your analysis works but you want to explore the data more thoroughly, ask the AI to add exploratory features that help you understand patterns and relationships:
def basic_summary(df):
return df.describe()“Create a comprehensive exploratory analysis with visualizations and correlation insights”
import matplotlib.pyplot as plt
import seaborn as sns
def comprehensive_summary(df):
"""Generate comprehensive exploratory analysis."""
print("=== DATASET OVERVIEW ===")
print(f"Shape: {df.shape}")
print(f"Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
print("\n=== MISSING VALUES ===")
missing = df.isnull().sum()
print(missing[missing > 0])
print("\n=== NUMERIC SUMMARY ===")
numeric_cols = df.select_dtypes(include=[np.number]).columns
print(df[numeric_cols].describe())
# Correlation heatmap for numeric columns
if len(numeric_cols) > 1:
plt.figure(figsize=(10, 8))
correlation_matrix = df[numeric_cols].corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix')
plt.tight_layout()
plt.show()
# Highlight strong correlations
strong_corr = []
for i in range(len(correlation_matrix.columns)):
for j in range(i+1, len(correlation_matrix.columns)):
corr_val = correlation_matrix.iloc[i, j]
if abs(corr_val) > 0.7:
strong_corr.append((
correlation_matrix.columns[i],
correlation_matrix.columns[j],
corr_val
))
if strong_corr:
print("\n=== STRONG CORRELATIONS (|r| > 0.7) ===")
for col1, col2, corr in strong_corr:
print(f"{col1} ↔ {col2}: {corr:.3f}")
return df.info()Pro tip: When working with real datasets, always ask the AI to help you understand what the data is telling you, not just how to manipulate it.
Questions like “What patterns do you see?” or “What should I investigate further?” can lead to valuable insights, especially when combined with your analysis.