Working with AI Responses

This section covers what to do after you receive a response—refining, testing, and iterating to get the best results.


🔄 Iterate and Refine

LLMs generate different outputs each time. You’ll often get the best results by iterating—treat the process like a conversation:

  • Reword your prompts
  • Provide feedback on what worked or didn’t
  • Ask the model to fix errors or improve clarity - may be valuable to incorporate a few rows of df.head() in your prompts
  • Review the code and suggest changes—remember, you can keep going back and forth
Tip

In long conversations, LLMs can sometimes lose track of instructions or hit message limits.

If the model seems confused or the outputs start to drift, just start a new conversation. Pick up from the last working code snippet and continue refining from there.


✅ Run and Validate Test Cases

When the AI provides code with test cases (as covered in Guiding AI Behavior), make sure to actually run them. This helps you verify the code works correctly—not just that it runs without errors.

def calculate_percentage_change(old_value, new_value):
    """Calculate percentage change between two values."""
    if old_value == 0:
        if new_value == 0:
            return 0.0
        return float('inf')  # or raise an error
    return ((new_value - old_value) / old_value) * 100

# Test cases provided by AI - RUN THESE!
assert calculate_percentage_change(100, 150) == 50.0
assert calculate_percentage_change(200, 150) == -25.0
assert calculate_percentage_change(50, 50) == 0.0
print("All tests passed!")

What to do when running tests:

  1. Run each test individually first to see what happens
  2. Check if results make sense - does 50% increase from 100 to 150 seem right?
  3. Test with your own data - try values from your actual dataset
  4. Look for edge cases the AI might have missed
Tip

As you grow more confident, consider learning about testing frameworks like pytest or unittest. These tools make testing faster and more reliable.


🔍 Validate the Output

Remember: LLMs predict what code looks correct—they don’t test it or understand your data. Always:

  1. Read the code – Does it do what you asked?
  2. Run the code – Does it execute without errors?
  3. Check the results – Do the outputs make sense?
  4. Test edge cases – What happens with missing values or unexpected data?

This connects back to Principle 2: Always Validate AI Outputs.


🐛 Debug When Things Go Wrong

Debugging is another skill you need to learn—we’ll dive deeper into debugging strategies later on in Debugging Code.


🎯 Improve Code Quality

Once your analysis works, you can ask the AI to improve it:

Ask for Better Error Handling

When you have working code but realize it could break with unexpected inputs, ask the AI to add proper error handling. This makes your code more robust and prevents crashes:

def calculate_correlation(df, col1, col2):
    return df[col1].corr(df[col2])

“Add error handling for missing columns and invalid data”

def calculate_correlation(df, col1, col2):
    if col1 not in df.columns:
        raise ValueError(f"Column '{col1}' not found in dataframe")
    if col2 not in df.columns:
        raise ValueError(f"Column '{col2}' not found in dataframe")
    
    # Check for sufficient non-null values
    valid_pairs = df[[col1, col2]].dropna()
    if len(valid_pairs) < 2:
        raise ValueError("Need at least 2 valid data points for correlation")
    
    return valid_pairs[col1].corr(valid_pairs[col2])

Request Documentation and Type Hints

Good code should be self-documenting. When you have working functions but they lack documentation, ask the AI to add proper docstrings and type hints:

def summarize_sales_by_region(df):
    return df.groupby('region')['sales'].sum().to_dict()

“Add docstrings and type hints to this analysis function”

import pandas as pd
from typing import Dict

def summarize_sales_by_region(df: pd.DataFrame) -> Dict[str, float]:
    """Calculate total sales by region.
    
    Args:
        df: DataFrame with 'region' and 'sales' columns
        
    Returns:
        Dictionary mapping region names to total sales
        
    Raises:
        KeyError: If required columns are missing
        ValueError: If sales data contains non-numeric values
    """
    required_cols = ['region', 'sales']
    missing_cols = [col for col in required_cols if col not in df.columns]
    if missing_cols:
        raise KeyError(f"Missing required columns: {missing_cols}")
    
    return df.groupby('region')['sales'].sum().to_dict()

Optimize for Large Datasets

When your code works with small datasets but might struggle with larger ones, ask the AI to optimize for memory efficiency and performance:

def process_large_dataset(filename):
    df = pd.read_csv(filename)  # Could be huge!
    return df.groupby('category').sum()

“Make this more memory-efficient for large datasets”

def process_large_dataset(filename, chunksize=10000):
    """Process large CSV file in chunks to save memory."""
    result = None
    for chunk in pd.read_csv(filename, chunksize=chunksize):
        chunk_result = chunk.groupby('category').sum()
        if result is None:
            result = chunk_result
        else:
            result = result.add(chunk_result, fill_value=0)
    return result

📊 Enhance Data Analysis Quality

Once your data analysis works, you can ask the AI to make it more robust and insightful:

Add Data Validation and Cleaning

When you have working analysis code but realize it could break with messy real-world data, ask the AI to add proper data validation. This makes your analysis more reliable and prevents unexpected results:

def analyze_sales_trends(df):
    monthly_sales = df.groupby('month')['sales'].sum()
    return monthly_sales.plot()

“Add data validation and cleaning for missing values and outliers”

def analyze_sales_trends(df):
    # Validate required columns
    if 'month' not in df.columns or 'sales' not in df.columns:
        raise ValueError("DataFrame must contain 'month' and 'sales' columns")
    
    # Clean the data
    df_clean = df.copy()
    df_clean = df_clean.dropna(subset=['month', 'sales'])
    
    # Remove outliers (sales > 3 standard deviations from mean)
    sales_mean = df_clean['sales'].mean()
    sales_std = df_clean['sales'].std()
    df_clean = df_clean[abs(df_clean['sales'] - sales_mean) <= 3 * sales_std]
    
    monthly_sales = df_clean.groupby('month')['sales'].sum()
    return monthly_sales.plot()

Request Statistical Context

Good data analysis should include statistical context. When you have basic calculations but want to understand their significance, ask the AI to add statistical measures and interpretation:

def compare_groups(df, group_col, value_col):
    return df.groupby(group_col)[value_col].mean()

“Add statistical significance testing and confidence intervals”

import scipy.stats as stats
import numpy as np

def compare_groups(df, group_col, value_col):
    """Compare groups with statistical context.
    
    Returns:
        dict: Group means, confidence intervals, and statistical tests
    """
    groups = df.groupby(group_col)[value_col]
    results = {}
    
    for name, group in groups:
        mean_val = group.mean()
        std_err = stats.sem(group)  # Standard error of mean
        ci = stats.t.interval(0.95, len(group)-1, mean_val, std_err)
        
        results[name] = {
            'mean': mean_val,
            'std': group.std(),
            'count': len(group),
            'confidence_interval_95': ci
        }
    
    # Add statistical test if comparing two groups
    group_names = list(results.keys())
    if len(group_names) == 2:
        group1_data = df[df[group_col] == group_names[0]][value_col]
        group2_data = df[df[group_col] == group_names[1]][value_col]
        t_stat, p_value = stats.ttest_ind(group1_data, group2_data)
        results['statistical_test'] = {
            't_statistic': t_stat,
            'p_value': p_value,
            'significant': p_value < 0.05
        }
    
    return results

Optimize for Exploratory Analysis

When your analysis works but you want to explore the data more thoroughly, ask the AI to add exploratory features that help you understand patterns and relationships:

def basic_summary(df):
    return df.describe()

“Create a comprehensive exploratory analysis with visualizations and correlation insights”

import matplotlib.pyplot as plt
import seaborn as sns

def comprehensive_summary(df):
    """Generate comprehensive exploratory analysis."""
    print("=== DATASET OVERVIEW ===")
    print(f"Shape: {df.shape}")
    print(f"Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    print("\n=== MISSING VALUES ===")
    missing = df.isnull().sum()
    print(missing[missing > 0])
    
    print("\n=== NUMERIC SUMMARY ===")
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    print(df[numeric_cols].describe())
    
    # Correlation heatmap for numeric columns
    if len(numeric_cols) > 1:
        plt.figure(figsize=(10, 8))
        correlation_matrix = df[numeric_cols].corr()
        sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
        plt.title('Correlation Matrix')
        plt.tight_layout()
        plt.show()
        
        # Highlight strong correlations
        strong_corr = []
        for i in range(len(correlation_matrix.columns)):
            for j in range(i+1, len(correlation_matrix.columns)):
                corr_val = correlation_matrix.iloc[i, j]
                if abs(corr_val) > 0.7:
                    strong_corr.append((
                        correlation_matrix.columns[i],
                        correlation_matrix.columns[j],
                        corr_val
                    ))
        
        if strong_corr:
            print("\n=== STRONG CORRELATIONS (|r| > 0.7) ===")
            for col1, col2, corr in strong_corr:
                print(f"{col1}{col2}: {corr:.3f}")
    
    return df.info()
Tip

Pro tip: When working with real datasets, always ask the AI to help you understand what the data is telling you, not just how to manipulate it.

Questions like “What patterns do you see?” or “What should I investigate further?” can lead to valuable insights, especially when combined with your analysis.