Design a prompt engineering playground similar to ChatGPT Playground or Anthropic Console. This is a product-focused system design question from a client engineer's perspective, emphasizing UX flows, client state management, streaming, performance, and how the frontend collaborates with backend services.
What is Prompt Engineering?
Prompt engineering is the process of iterating on input prompts to make a model better at performing specific tasks. Unlike conversational chat, playgrounds focus on stateless runs: each run is independent, but a single run can include multi-message context (system + user + assistant examples).
Example Workflow
A user wants to generate a recipe. They might:
Start simple: "Write me a recipe"
Iterate to improve: "You are a great chef who has written many cookbooks. Write me a recipe for..."
Add examples: Include sample good/bad recipes for the model to learn from
Keep refining until they get the desired output
Save the successful prompt for future use
Disclaimer: This is a sample solution to help you get started. To better prepare for the interview, you should think through the question yourself and try to come up with your own solution. System design questions are open-ended and have multiple valid approaches.
Phase 1: Requirements (~5-7 minutes)
Functional Requirements
Users should be able to:
Create and edit prompts with system instructions, user messages, and assistant examples
Configure model parameters (model version, temperature, max tokens, top-p, stop sequences)
Execute prompts and see streaming responses in real-time
Save and organize prompts into projects/folders for reuse
View execution history to compare outputs across different configurations
Share prompts with team members (for enterprise users)
For a 45-minute interview, focus on 3-5 core flows: prompt editing, execution with streaming, and saving/organizing prompts. Mention sharing and history as stretch goals.
Product Scope
Clarify the boundaries:
MVP Focus: Web-based prompt editor with real-time execution
Platforms: Desktop-first web app (power users prefer larger screens for prompt iteration)
User Types: Individual developers and enterprise teams
Integrations: API key management, usage tracking, billing integration
Non-Functional Requirements (Client-Led)
Latency: First token p50 < 1s, p95 < 3s (LLM dependent), show progress if slower
Streaming: Responses must stream token-by-token (not wait for complete response)
Reliability: Executions should gracefully handle timeouts, disconnects, and model errors
Autosave: Prompt changes persist automatically (no lost work)
Concurrent editing: Enterprise users may need collaborative editing
Privacy: Users control history retention; clear UI on what is stored
Capacity: Quick Sanity Check
For a platform like Anthropic Console, we might have 10K daily active users, each running 20-50 prompt iterations per session. During peak hours (business hours across time zones), we could see 1000+ concurrent executions. The LLM backend is the bottleneck, not our system.
{"prompt":{"system_prompt":"You are a helpful recipe chef...","messages":[{"role":"user","content":"Write a recipe for..."},{"role":"assistant","content":"Here's a recipe..."},{"role":"user","content":"{{user_input}}"}],"model_config":{"model":"claude-3-5-sonnet-20241022","temperature":0.7,"max_tokens":1024,"top_p":1.0,"stop_sequences":[]},"variables":["user_input"]}}
Client-First Thinking: What Data Does Each Screen Need?
Screen
Data Needed
Project List
Project names, prompt counts, last modified
Prompt Editor
Full prompt content, model config, variable definitions
Execution Panel
Streaming response text, token counts, latency
History View
Past executions with inputs, outputs, timestamps, configs
Settings
API keys, usage stats, billing info
The prompt editor screen needs everything in one load—users shouldn't wait for multiple requests while iterating. Design your API to return the complete prompt state in a single call.
Phase 3: API Design (~15-20 minutes)
This is the core of the interview. Design APIs that feel intuitive to developers using the playground.
# List executions for a prompt
GET /prompts/:prompt_id/executions?cursor=...&limit=20
Response: {
"executions": [
{ "id": "...", "status": "completed", "input_tokens": 150, "output_tokens": 523, "created_at": "..." }
],
"next_cursor": "..."
}
# Compare two executions
GET /executions/compare?ids=exec_1,exec_2
Response: {
"executions": [
{ "id": "exec_1", "config": {...}, "response_text": "..." },
{ "id": "exec_2", "config": {...}, "response_text": "..." }
]
}
Sharing & Collaboration
# Share a prompt
POST /prompts/:prompt_id/shares
Request: { "email": "colleague@company.com", "permission": "edit" }
Response: { "share": { "id": "...", "user": {...}, "permission": "edit" } }
# List shared prompts
GET /prompts/shared-with-me
Response: { "prompts": [...] }
# Fork a shared prompt (create your own copy)
POST /prompts/:prompt_id/fork
Request: { "project_id": "my_project_id" }
Response: { "prompt": { "id": "new_prompt_id", ... } }
Key API Design Decisions
1. Streaming via SSE (Server-Sent Events)
SSE is simpler than WebSockets for unidirectional streaming
Works with standard HTTP infrastructure (load balancers, CDNs)
Client can use native EventSource API or fetch with reader
2. Execution is Tied to Prompt, Not Ad-Hoc
POST /prompts/:id/execute instead of POST /execute with full prompt
Benefits: Automatic history tracking, easier analytics, prompt versioning
Trade-off: Requires saving prompt first (but autosave handles this)
3. Partial Updates with PATCH
Users constantly tweak prompts—don't resend entire prompt on each keystroke
Send only changed fields: { "temperature": 0.9 }
4. Cursor-Based Pagination
For execution history, chronological cursors work well
Handles concurrent writes better than offset pagination
5. Idempotency for Executions
Optional Idempotency-Key header for POST /execute prevents duplicate runs on network retry
Less critical than payments, but useful for expensive model calls
Error Handling
{"error":{"code":"PROMPT_TOO_LONG","message":"Prompt exceeds maximum context length of 200k tokens","details":{"prompt_tokens":215000,"max_tokens":200000}}}
Common error codes:
RATE_LIMITED — Too many requests, include retry-after
CONTEXT_LENGTH_EXCEEDED — Prompt too long
MODEL_UNAVAILABLE — Model temporarily unavailable
INVALID_API_KEY — Authentication failed
QUOTA_EXCEEDED — Monthly usage limit hit
Walk Through a User Flow
A user opens the prompt playground and wants to iterate on a recipe generator:
Call GET /projects to see their projects
Select a project, triggering GET /projects/:id to load prompts
Click 'New Prompt', which sends POST /prompts with initial content
As they type in the editor, debounced PATCH /prompts/:id calls autosave changes
They click 'Run', which calls POST /prompts/:id/execute with variable values
The client opens an SSE connection and displays tokens as they stream in
When done, they tweak the temperature and run again
They open history with GET /prompts/:id/executions to compare outputs
Happy with the result, they share it via POST /prompts/:id/shares