Guiding AI Behavior

This section covers how to frame your prompt—setting context, guiding the AI’s behavior, and structuring your request for better results.


🎭 Use a Persona or Role

Assigning a role to the LLM helps shape its tone, level of detail, and the kind of answer you receive. By telling the model who it should be, you guide how it explains concepts and structures the code.

“Write code to calculate average sales by region.”
# Generic response - minimal structure
import pandas as pd
df = pd.read_csv('sales.csv')
print(df.groupby('region')['sales'].mean())

Why this is limited:

  1. No context: The AI doesn’t know what level of detail you need
  2. Minimal code: Just the bare minimum to answer the question
  3. No best practices: Missing error handling, comments, or structure
  4. No explanation: You don’t learn anything from this response
“You are an expert data analyst specializing in Python and Pandas. Write code to calculate average sales by region. Use best practices for code readability and include comments explaining each step.”
# Expert-level response with best practices
import pandas as pd

def calculate_regional_averages(filepath: str) -> pd.Series:
    """
    Calculate average sales by region from a CSV file.
    
    Args:
        filepath: Path to the sales data CSV file
        
    Returns:
        Series containing average sales indexed by region
    """
    # Load the sales data
    df = pd.read_csv(filepath)
    
    # Validate required columns exist
    required_cols = ['region', 'sales']
    if not all(col in df.columns for col in required_cols):
        raise ValueError(f"Missing required columns: {required_cols}")
    
    # Calculate and return average sales by region
    regional_averages = df.groupby('region')['sales'].mean()
    
    return regional_averages.sort_values(ascending=False)

# Example usage
averages = calculate_regional_averages('data/sales.csv')
print(averages)

Why this is better:

  1. Professional structure: Function with docstring and type hints
  2. Error handling: Validates data before processing
  3. Comments: Explains each step for learning
  4. Best practices: Sorted output, clean variable names

🎯 Adapt Code to Your Skill Level

If you’re new to Python, let the model know to keep things simple. If you’re more experienced, ask for advanced techniques.
The AI will match its response to your level, so you are never bored or overwhelmed by the concepts.

“Filter the dataframe for high sales and calculate statistics.”
# AI assumes you know advanced techniques
result = (df.query('sales > 1000')
            .groupby('category')
            .agg({'sales': ['mean', 'std', 'count']})
            .round(2))

Why this can be problematic for beginners:

  1. Method chaining: Multiple operations in one line can be confusing
  2. Advanced syntax: .query() and .agg() with dictionaries
  3. No comments: Hard to understand what each part does
  4. Assumed knowledge: Expects familiarity with Pandas patterns
“I am new to Python and Pandas. Please write simple code that filters a dataframe for sales over 1000 and calculates the mean. Use basic methods and add comments explaining each step. Avoid method chaining.”
import pandas as pd

# Load the data
df = pd.read_csv('data/sales.csv')

# Step 1: Filter for high sales (over 1000)
# This creates a new dataframe with only matching rows
high_sales = df[df['sales'] > 1000]

# Step 2: Calculate the mean of the filtered data
# .mean() calculates the average of the 'sales' column
average_high_sales = high_sales['sales'].mean()

# Step 3: Display the result
print(f"Average of high sales: ${average_high_sales:.2f}")
print(f"Number of high-sales transactions: {len(high_sales)}")

Why this is better for learning:

  1. Step-by-step: Each operation is on its own line
  2. Clear comments: Explains what each line does and why
  3. Basic methods: Uses simple filtering and .mean()
  4. Readable output: Shows what the result means

📝 Use Inline Comments or Pseudocode

When asking an LLM to generate code, you can guide it with comments or pseudocode. This keeps you in control of the architecture while the AI handles the syntax.

Why this matters: The AI doesn’t know your long-term goals, your data pipeline, or how this code fits into a larger project. By providing the structure, you remain the architect—the AI is just helping with implementation details.

“Analyze the sales data and create a report.”
# AI decides the structure - may not match your needs
import pandas as pd
df = pd.read_csv('sales.csv')
print("Sales Report")
print(f"Total: {df['sales'].sum()}")
print(f"Average: {df['sales'].mean()}")
# ... AI continues with whatever it thinks is relevant

Why this is risky:

  1. AI decides structure: It might not include what you actually need
  2. No control: The AI chooses which metrics matter
  3. Hard to extend: If you need to add to this later, you’re starting over
  4. Misaligned goals: The AI doesn’t know this feeds into a quarterly report
“Write Python code that does the following:
# Load sales_data.csv from the data/ folder
# Filter for Q4 2024 only (Oct-Dec)
# Group by product category
# Calculate: total sales, average order value, transaction count
# Sort by total sales descending
# Return as a DataFrame for further processing”
import pandas as pd

# Load sales_data.csv from the data/ folder
df = pd.read_csv('data/sales_data.csv')
df['date'] = pd.to_datetime(df['date'])

# Filter for Q4 2024 only (Oct-Dec)
q4_mask = (df['date'] >= '2024-10-01') & (df['date'] <= '2024-12-31')
q4_sales = df[q4_mask]

# Group by product category
# Calculate: total sales, average order value, transaction count
summary = q4_sales.groupby('category').agg(
    total_sales=('sales', 'sum'),
    avg_order_value=('sales', 'mean'),
    transaction_count=('sales', 'count')
)

# Sort by total sales descending
summary = summary.sort_values('total_sales', ascending=False)

# Return as a DataFrame for further processing
print(summary)

Why this keeps you in control:

  1. You define the structure: Each step matches your requirements
  2. Specific metrics: Only calculates what you actually need
  3. Easy to extend: Clear structure makes modifications simple
  4. Aligned with goals: The output format matches your downstream needs
Tip

Prompting this way helps you stay focused on problem solving and analytical thinking, rather than getting stuck on syntax details. You’re the architect; the AI is the assistant.


🧪 Request Test Cases in Your Prompt

When asking for Python code, include test cases or expected outputs in your prompt. This ensures the AI generates verifiable, working code—not just code that looks correct.

“Write a function to calculate percentage change.”

Why this is incomplete:

  1. No validation: You can’t verify if it works correctly
  2. Missing edge cases: What happens with zero values?
  3. No examples: Unclear what the expected behavior should be
  4. Untestable: You have no way to confirm correctness
“Write a Python function that calculates percentage change between two values. Include test cases that verify: - calculate_percentage_change(100, 150) → 50.0 - calculate_percentage_change(200, 150) → -25.0
- calculate_percentage_change(50, 50) → 0.0 - calculate_percentage_change(0, 100) → should handle gracefully”

Why this works better:

  1. Clear expectations: Specific examples show desired behavior
  2. Edge case coverage: Includes zero values and negative changes
  3. Verifiable output: You can immediately test if it works
  4. Self-documenting: Test cases explain the function’s purpose
Tip

Pro tip: Ask for test cases even for data analysis tasks. For example: “Load sales.csv and calculate monthly totals. The result should show 12 months with January having the highest sales.”