LLM showdown

Compare

One prompt, five top language models — pick the best answer

Stop opening five tabs to ask the same question

5-column Gab AI Deck recipe for direct LLM comparison

LLM Showdown puts the same prompt in front of GPT-5.5, Claude 3.5 Sonnet, Gemini 2.5 Pro, o3, and Llama 3.3 simultaneously. You see the answers side by side; you make the call with evidence. Use it to pick a default LLM for an app, decide which model to ship on for an API, or simply settle the "which one is best" debate for a specific kind of work.

How to use this recipe

  1. Click "Use this recipe" to clone the 5-column deck.
  2. Paste the exact same prompt into all five columns; each column is bound to a different model.
  3. Run all five columns in parallel; the deck queues independently so you do not wait on one for the next.
  4. Compare the outputs side by side — score them on accuracy, voice, latency, and cost.
  5. Pin the winner and save the deck as a recipe so you can re-run the comparison on future prompts.

Best for

LLM Showdown FAQ

Why these five models?

They span the major frontier families (OpenAI, Anthropic, Google, OpenAI reasoning, open-weights) and are the most-asked-about by users. Swap any column to a different model — Mistral, Command, Cohere, Grok — via the column header.

Will it score the answers automatically?

No — auto-scoring an LLM's output is hard to do reliably. The deck surfaces all five answers; you make the qualitative call. Add a sixth chat column to ask one model to evaluate the others if you want a meta-take.

How do I compare cost and latency?

Each column shows runtime in its header; cost depends on token count + per-model pricing. For systematic benchmarking, log prompts and outputs to your own analytics.

Can I run a multi-turn conversation?

Yes — every column is a full conversation, not a one-shot. Continue the dialogue independently in each column to see how each model handles follow-ups.

Can I include vision-capable models?

Yes — paste an image into any vision-capable column (GPT-5.5, Gemini 2.5 Pro, Claude 3.5 Sonnet) and they will reason over it directly.

What about open-weights models?

Llama 3.3 is included by default. Swap to Mistral, DeepSeek, or any other open-weights model the catalog supports via the column header.

Workflow columns