Guiding AI Behavior
This section covers how to frame your prompt—setting context, guiding the AI’s behavior, and structuring your request for better results.
🎭 Use a Persona or Role
Assigning a role to the LLM helps shape its tone, level of detail, and the kind of answer you receive. By telling the model who it should be, you guide how it explains concepts and structures the code.
# Generic response - minimal structure
import pandas as pd
df = pd.read_csv('sales.csv')
print(df.groupby('region')['sales'].mean())Why this is limited:
- No context: The AI doesn’t know what level of detail you need
- Minimal code: Just the bare minimum to answer the question
- No best practices: Missing error handling, comments, or structure
- No explanation: You don’t learn anything from this response
# Expert-level response with best practices
import pandas as pd
def calculate_regional_averages(filepath: str) -> pd.Series:
"""
Calculate average sales by region from a CSV file.
Args:
filepath: Path to the sales data CSV file
Returns:
Series containing average sales indexed by region
"""
# Load the sales data
df = pd.read_csv(filepath)
# Validate required columns exist
required_cols = ['region', 'sales']
if not all(col in df.columns for col in required_cols):
raise ValueError(f"Missing required columns: {required_cols}")
# Calculate and return average sales by region
regional_averages = df.groupby('region')['sales'].mean()
return regional_averages.sort_values(ascending=False)
# Example usage
averages = calculate_regional_averages('data/sales.csv')
print(averages)Why this is better:
- Professional structure: Function with docstring and type hints
- Error handling: Validates data before processing
- Comments: Explains each step for learning
- Best practices: Sorted output, clean variable names
🎯 Adapt Code to Your Skill Level
If you’re new to Python, let the model know to keep things simple. If you’re more experienced, ask for advanced techniques.
The AI will match its response to your level, so you are never bored or overwhelmed by the concepts.
# AI assumes you know advanced techniques
result = (df.query('sales > 1000')
.groupby('category')
.agg({'sales': ['mean', 'std', 'count']})
.round(2))Why this can be problematic for beginners:
- Method chaining: Multiple operations in one line can be confusing
- Advanced syntax:
.query()and.agg()with dictionaries - No comments: Hard to understand what each part does
- Assumed knowledge: Expects familiarity with Pandas patterns
import pandas as pd
# Load the data
df = pd.read_csv('data/sales.csv')
# Step 1: Filter for high sales (over 1000)
# This creates a new dataframe with only matching rows
high_sales = df[df['sales'] > 1000]
# Step 2: Calculate the mean of the filtered data
# .mean() calculates the average of the 'sales' column
average_high_sales = high_sales['sales'].mean()
# Step 3: Display the result
print(f"Average of high sales: ${average_high_sales:.2f}")
print(f"Number of high-sales transactions: {len(high_sales)}")Why this is better for learning:
- Step-by-step: Each operation is on its own line
- Clear comments: Explains what each line does and why
- Basic methods: Uses simple filtering and
.mean() - Readable output: Shows what the result means
📝 Use Inline Comments or Pseudocode
When asking an LLM to generate code, you can guide it with comments or pseudocode. This keeps you in control of the architecture while the AI handles the syntax.
Why this matters: The AI doesn’t know your long-term goals, your data pipeline, or how this code fits into a larger project. By providing the structure, you remain the architect—the AI is just helping with implementation details.
# AI decides the structure - may not match your needs
import pandas as pd
df = pd.read_csv('sales.csv')
print("Sales Report")
print(f"Total: {df['sales'].sum()}")
print(f"Average: {df['sales'].mean()}")
# ... AI continues with whatever it thinks is relevantWhy this is risky:
- AI decides structure: It might not include what you actually need
- No control: The AI chooses which metrics matter
- Hard to extend: If you need to add to this later, you’re starting over
- Misaligned goals: The AI doesn’t know this feeds into a quarterly report
# Load sales_data.csv from the data/ folder
# Filter for Q4 2024 only (Oct-Dec)
# Group by product category
# Calculate: total sales, average order value, transaction count
# Sort by total sales descending
# Return as a DataFrame for further processing”
import pandas as pd
# Load sales_data.csv from the data/ folder
df = pd.read_csv('data/sales_data.csv')
df['date'] = pd.to_datetime(df['date'])
# Filter for Q4 2024 only (Oct-Dec)
q4_mask = (df['date'] >= '2024-10-01') & (df['date'] <= '2024-12-31')
q4_sales = df[q4_mask]
# Group by product category
# Calculate: total sales, average order value, transaction count
summary = q4_sales.groupby('category').agg(
total_sales=('sales', 'sum'),
avg_order_value=('sales', 'mean'),
transaction_count=('sales', 'count')
)
# Sort by total sales descending
summary = summary.sort_values('total_sales', ascending=False)
# Return as a DataFrame for further processing
print(summary)Why this keeps you in control:
- You define the structure: Each step matches your requirements
- Specific metrics: Only calculates what you actually need
- Easy to extend: Clear structure makes modifications simple
- Aligned with goals: The output format matches your downstream needs
Prompting this way helps you stay focused on problem solving and analytical thinking, rather than getting stuck on syntax details. You’re the architect; the AI is the assistant.
🧪 Request Test Cases in Your Prompt
When asking for Python code, include test cases or expected outputs in your prompt. This ensures the AI generates verifiable, working code—not just code that looks correct.
Why this is incomplete:
- No validation: You can’t verify if it works correctly
- Missing edge cases: What happens with zero values?
- No examples: Unclear what the expected behavior should be
- Untestable: You have no way to confirm correctness
- calculate_percentage_change(50, 50) → 0.0 - calculate_percentage_change(0, 100) → should handle gracefully”
Why this works better:
- Clear expectations: Specific examples show desired behavior
- Edge case coverage: Includes zero values and negative changes
- Verifiable output: You can immediately test if it works
- Self-documenting: Test cases explain the function’s purpose
Pro tip: Ask for test cases even for data analysis tasks. For example: “Load sales.csv and calculate monthly totals. The result should show 12 months with January having the highest sales.”