Building Better Prompts

This section covers what goes into your prompt—the content and structure of your request. Getting this right is the foundation of effective AI-assisted coding.

🎯 Be Detailed, Specific, and Avoid Ambiguity

The clearer your request, the better the Python code you’ll receive. Avoid vague instructions and spell out exactly what you want.
Include these key elements in your prompts:

Goals - What specific outcome do you want? (e.g., “create a bar chart showing monthly sales trends”)
Constraints - Any limitations or requirements? (e.g., “use only data from 2023”, “exclude missing values”)
Required libraries - Specify which tools to use.
Approach - How should the problem be tackled?
Data structure - Many AI models can’t access CSV files, pasting the output of df.info() or df.head() can give important context to your AI.

“Create a chart for the data.”

# ❌ Static code example (not runnable)
import matplotlib.pyplot as plt
plt.bar([1,2,3], [4,5,6])
plt.show()

Why this is bad:

Ambiguous: Doesn’t specify which data file or what type of chart.
Incomplete: Missing data loading, no column names specified.
No Labels: Chart has no context, axis labels, or title.
No Library Specified: Doesn’t mention Pandas or Matplotlib.

“Write Python code using Pandas and Matplotlib. Load ‘sales_data.csv’ from the ‘data/’ folder. Create a bar chart showing monthly sales with the ‘Month’ column on the x-axis and ‘Sales’ column on the y-axis. Include axis labels, a title, and use a blue color for the bars.”

# ✅ Static code example with clear documentation
import pandas as pd
import matplotlib.pyplot as plt

# Load the data
df = pd.read_csv('data/sales_data.csv')

# Create bar chart
plt.figure(figsize=(8, 5))
plt.bar(df['Month'], df['Sales'], color='steelblue', label='Sales')
plt.xlabel('Month')
plt.ylabel('Sales ($)')
plt.title('Monthly Sales Report')
plt.legend()
plt.show()

Why this is good:

Specific: Clearly defines the data file, chart type, and columns to use.
Complete: Includes data loading, proper column references, and visualization.
Well-documented: Chart has proper labels, title, and styling.
Libraries Specified: Explicitly mentions Pandas and Matplotlib.

📚 Specify the Libraries You Need

Python has multiple data processing and visualization libraries. LLMs might default to JavaScript, use outdated libraries, or pick something you don’t have installed. Always be explicit about which tools you want to use.

“Create a chart for the uploaded data.”

Why this is problematic:

No language specified: AI might respond with JavaScript, R, or Python
No library specified: Could use Matplotlib, Plotly, Seaborn, or something else
Version ambiguity: Might use deprecated methods from older library versions
Compatibility issues: You might not have the suggested library installed

“Using Python with Pandas and Matplotlib, create a bar chart from ‘sales.csv’. Use the ‘Date’ column for the x-axis and ‘Sales’ column for the y-axis.”

Why this works:

Language explicit: Python, not JavaScript or R
Libraries named: Pandas for data, Matplotlib for visualization
Predictable output: You know exactly what tools the code will use
Easy to run: You can verify you have the right libraries installed

Tip

If you’re using a specific library version, mention it! For example: “I’m using Pandas 2.0. Please use pd.concat() instead of the deprecated .append() method.”

🧩 Decompose the Problem

Large tasks can overwhelm an LLM. Instead of asking it to build your entire analysis in one go, break the problem into smaller steps that you can test individually.

“Build a complete sales analysis dashboard that loads data, cleans it, calculates monthly trends, identifies top products, compares regions, and creates visualizations with filters.”

Why this often fails:

Too many steps: AI tries to do everything at once and makes mistakes
Hard to debug: If something breaks, you don’t know which part failed
Incomplete output: LLMs may skip steps or use placeholder code like “# … rest of logic here”
No validation: You can’t verify each piece works before moving on

Prompt 1: “Load ‘sales.csv’ using Pandas and display the first 5 rows and column types.”

Prompt 2: “Clean the data: remove rows with missing values in ‘sales’ column and convert ‘date’ to datetime.”

Prompt 3: “Group by month and calculate total sales. Show the result.”

Prompt 4: “Create a line chart of monthly sales using Matplotlib.”

Why this works better:

Testable steps: Run and verify each piece before moving on
Easy to debug: If step 3 fails, you know exactly where the problem is
Build incrementally: Each prompt builds on confirmed working code
Better results: Smaller, focused requests get more accurate responses

Think of it like building with LEGO bricks 🧱. Each step is a brick. Once the bricks are tested and solid, you can combine them into a larger analysis.

Context Limits Are Real

Decomposing problems is important as LLMs have context and output token limits — this limits how much they can process and generate at once. If your prompt is too large or your request too complex, the LLM may hallucinate incorrect or unfinished code.

Decomposing is additionally valuable as it increases your understanding of the problem and gives you time to reflect on how separate functions should work together.

What is Prompt Engineering ? 🛠️

Prompt engineering is the practice of designing and refining prompts to get the best possible output from an AI. It’s not just about what you ask, but how you ask it, taking into account:

The 5 Steps of Prompt Engineering :

1. Prompt:The instructions or questions you give to the AI.  
2. Context: The data, background, or previous conversation that helps the AI understand your goal.  
3. Tokens: Be mindful of length; LLMs have limits on input and output tokens.  
4. Fine-tuning: Adjusting the AI model (or using a specialized variant) to better handle your type of tasks.  
5. Priming: Setting the AI’s expectations with examples or instructions before asking your main task.

By understanding these steps, you can craft more effective prompts, reduce ambiguity, and improve the accuracy of R code and analyses generated by the AI.