You ask a question, the model replies. Simple, stateless (or with short memory). Best for answering questions, brainstorming, summarizing, writing drafts, or explaining concepts.
User message→Model→Response
Low complexityInstant resultsNo setup needed
Complexity
When to use: Quick lookups, creative writing, explaining topics, coding help, translations, summarization.
🛠️
Skills / Tools
Model with access to specific capabilities
▼
The model is given tools it can call — like web search, a calculator, a database, or a code executor. It decides when to invoke them. Results come back and inform the final answer.
User message→Model→Tool call→Result → Model→Response
Medium complexityNeeds tool configReal-world data
Complexity
When to use: „Search the web for this“, run code, query a database, look up today’s weather, perform calculations on live data.
🤖
Agent
Multi-step autonomous task executor
▼
Given a goal, the agent plans and executes multiple steps on its own — using tools, memory, and reasoning — until the task is done. You set the objective; the AI figures out the steps.
Goal→Plan→Step 1→Step 2…→Final output
High complexityUses more tokensHandles ambiguity
Complexity
When to use: „Research this topic and write a report“, „Fix the bug in my codebase“, „Fill out this form by looking up the right data“.
🔗
Subagent / Multi-agent
Agents orchestrating other agents
▼
An orchestrator agent breaks a big task into subtasks and delegates them to specialised subagents running in parallel or in sequence. Each subagent focuses on one thing. Results are merged.
When to use: Very large, parallelisable tasks — e.g. „Analyse 200 customer feedback responses“, „Build a full app with separate agents for frontend, backend, and tests“.
Current top models & pricing (per 1M tokens)
GPT-4o
OpenAI
In $2.50 / Out $10
Best all-rounder for chat, vision, coding. Fast and multimodal.
o3 / o4-mini
OpenAI
o3: $10/$40 · o4-mini: $1.10/$4.40
Reasoning models. Best for maths, science, complex logic. Use for agents needing deep thinking.
Claude Sonnet 4.5
Anthropic
In $3 / Out $15
Strong at coding, reasoning, long context. Great for agentic workflows and nuanced writing.
Claude Haiku 4.5
Anthropic
In $0.80 / Out $4
Fast and cheap. Best for high-volume tasks, subagent steps, classification, short answers.
Gemini 2.5 Pro
Google
In $1.25 / Out $10
Huge 1M context window. Best for analysing very long documents, codebases, videos.
Gemini 2.0 Flash
Google
In $0.10 / Out $0.40
Extremely cheap and fast. Good for real-time chat, streaming apps, or massive-scale pipelines.
Llama 3.3 70B
Meta (open)
~$0.20 / $0.40 (hosted)
Free to run locally. Strong general model. Best when data privacy or cost at scale matters.
Mistral Large
Mistral
In $2 / Out $6
Strong at multilingual tasks and European languages. Good for EU-regulated environments.
Quick decision guide
Need a fast cheap answer?
→ Gemini Flash or Haiku 4.5
Complex reasoning or maths?
→ o3 / o4-mini or Claude Sonnet
Very long document / codebase?
→ Gemini 2.5 Pro (1M context)
Build an autonomous agent?
→ Claude Sonnet 4.5 or GPT-4o
Data privacy / on-premise?
→ Llama 3.3 70B (self-hosted)
Multimodal (images/audio)?
→ GPT-4o or Gemini 2.5 Pro
Jan D.
"The only real security that a man will have in this world is a reserve of knowledge, experience, and ability."