Pick the right model

Honest advice — premium isn't always better. The right answer is often 'the cheap one, but check the results.'

The two tiers

TierModelsBest forCost relative
Standard Mistral Small · GPT-4o mini · Claude Haiku · Gemini Flash Bulk Q&A. Questionnaire row-by-row. Handbook chat. Short summaries.
Premium Mistral Large · GPT-4o · Claude Sonnet · Gemini Pro High-stakes drafting. Long-context reasoning. Strict instruction-following. Anything customer-facing. 5-15×

Try the cheap one first. For most RAG tasks (the answer is in the corpus, the model just needs to find and rephrase it), standard-tier models perform indistinguishably from premium. Always start cheap, measure quality on 20-30 real questions, only upgrade if there's a measurable gap.

When each tier is the right choice

Use a standard-tier model when

Use a premium model when

A simple A/B procedure

Pick a model based on intuition is fine. Pick a model based on 20 real questions is better:

  1. Make two copies of your agent — one with each model. Same system prompt, same scope, same top_k, same temperature.
  2. Run the same 20-30 questions through both. Use realistic questions, not softballs.
  3. Score each answer 1-5 on the two dimensions you care about — usually accuracy (does it match the source?) and style (is it usable as-is, or does it need editing?).
  4. If premium isn't materially better, stay with standard. Most of the time it isn't.

Provider differences (rough characterisations)

All four providers are good. They tend to be different in flavour:

Changing the model on an existing agent

Open the agent → Edit → Model dropdown → save. Existing keys keep working — the very next call uses the new model. Useful for A/B tests: clone the agent, change only the model, run both for a week, kill the loser.

Cost & quota visibility

Every generation (browser, Word, Excel, API) counts against your monthly cap. See Settings → Plan & billing for current usage. Each agent's call log shows the token count and provider, so you can see which agents are eating the budget.

If you're approaching the cap, the first place to look is high-temperature long-output agents — drop max_tokens or temperature before upgrading plan.