System prompts and conversation context are fundamental to controlling AI behavior and building interactive applications.
📓 Continue in lesson1-litellm-exercise.ipynb - Reference: lesson1-litellm-solutions.ipynb
System Prompts: Setting the AI’s Behavior
System prompts are a fundamental prompt engineering technique. They define how the AI should behave - think of them as giving the AI a role, personality, or set of instructions:
messages = [
{"role": "system", "content": "You are a pirate captain. Speak like a pirate in every response."},
{"role": "user", "content": "What's the weather like today?"}
]
response = litellm.completion(
model="gemini/gemini-2.5-flash",
messages=messages
)Expected output: “Arrr, the weather be fair today, matey! Clear skies and a gentle breeze blowin’ across the seven seas!”
Message Roles Explained
| Role | Purpose | Visibility |
|---|---|---|
system | Sets behavior, personality, constraints | Backend only (not shown to user in chat apps) |
user | Represents the user’s input | Visible to user |
assistant | Represents the AI’s response | Visible to user |
Best Practices for System Prompts
✅ Be specific and clear
# Vague
"You are helpful."
# Better
"You are a Python tutor. Explain concepts simply, provide code examples, and ask follow-up questions to ensure understanding."✅ Set expectations for tone and format
system_prompt = """
You are a professional technical writer.
- Use clear, concise language
- Format responses in markdown
- Include code examples when relevant
- Maintain a formal, educational tone
"""✅ Include constraints
system_prompt = """
You are a customer service chatbot for an online bookstore.
- Only answer questions about books, orders, and shipping
- Politely decline requests outside this scope
- Never share customer personal information
- Always offer to escalate to a human agent for complex issues
"""Context: Building Conversations
LLMs are stateless - they don’t remember previous messages unless you include them in the request. To build a conversation, include the full message history:
messages = [
{"role": "user", "content": "My name is Alice and I love pizza."},
{"role": "assistant", "content": "Nice to meet you, Alice! Pizza is delicious. What's your favorite topping?"},
{"role": "user", "content": "Pepperoni! What's my name again?"}
]
response = litellm.completion(
model="gemini/gemini-2.5-flash",
messages=messages
)
# Output: "Your name is Alice."How Context Works
- You send the entire conversation history in the messages array
- The model sees all previous messages and generates a contextual response
- To continue the conversation, add the assistant’s response to your messages list:
# Get response
response = litellm.completion(model="gemini/gemini-2.5-flash", messages=messages)
assistant_message = response.choices[0].message.content
# Add it to history for next turn
messages.append({"role": "assistant", "content": assistant_message})
# Now ready for the next user message
messages.append({"role": "user", "content": "Tell me more!"})Important Context Considerations
⚠️ Context costs tokens: Every message in your history counts toward the token limit and costs money
- More messages = more tokens = higher cost
- Consider summarizing or truncating old messages in long conversations
⚠️ Context limits: Models have maximum context windows
- GPT-4: 8K, 32K, or 128K tokens (depending on version)
- Gemini 2.5 Flash: 1M tokens
- Claude 3: 200K tokens
💡 Strategy for long conversations: Keep the system prompt + last N messages, or periodically summarize earlier conversation
Learn about managing conversation context.
Few-Shot Learning with Context
You can teach the model specific behaviors by providing example exchanges. This is called few-shot learning (also known as in-context learning) - teaching through examples:
messages = [
{"role": "system", "content": "You translate sentences into emoji stories."},
# Example 1
{"role": "user", "content": "I went to the beach yesterday."},
{"role": "assistant", "content": "👤🚗🏖️☀️🌊"},
# Example 2
{"role": "user", "content": "I love eating pizza."},
{"role": "assistant", "content": "👤❤️🍕😋"},
# New request
{"role": "user", "content": "I adopted a puppy today."}
]
response = litellm.completion(model="gemini/gemini-2.5-flash", messages=messages)
# Expected: Something like "👤🏠🐕🎉"Showing the model examples of the pattern you want is often more effective than detailed instructions!
Combining Prompt Engineering Techniques
For best results, combine system prompts (instructions) with few-shot learning (examples):
messages = [
{"role": "system", "content": "You are a Socratic tutor. Never give direct answers. Instead, ask questions that guide students to discover the answer themselves."},
{"role": "user", "content": "What is 2 + 2?"},
{"role": "assistant", "content": "Great question! What happens when you have 2 objects, and someone gives you 2 more? Can you count them?"},
{"role": "user", "content": "What is the capital of France?"}
]Key Takeaways
✅ System prompts are a prompt engineering technique for defining AI behavior
✅ Use {"role": "system"} to set instructions
✅ Models are stateless - include full conversation history for context
✅ Few-shot learning (in-context learning) teaches patterns through examples
✅ Combine system prompts and few-shot examples for best results
✅ Be mindful of token costs and context limits
What’s Next?
In the next lesson, you’ll learn how to manage costs and control computational resources when using LLM APIs.
Previous: Lesson 1.3 - Temperature Control
Next: Lesson 1.5 - Controlling Costs and Compute
