Benchmarks are not reliable

7 Jan· 3 min read

Based on Artificial Analysis, the most intelligent model on Intelligence Index is GPT-5.2 (xhigh) and Coding Index is Gemini 3 Pro Preview (high), and Claude Opus 4.5 is ranked 13th but most people use Opus for most of their coding work cause Opus get job done better than other models. It doesn't mean every model Anthropic makes is for coding, they just know how to make their model good at real life senairios at engineering tasks. Google models are good at UI/UX and frontend task, but I can't use them for backend or complex logic cause they got that "google thingyy" logic

As a developers we should rely on benchmarks for coding tasks, but on our experiments with different models and our money.

Doing AI Coding without any coding agent is bizzare, it's like not using IDEs for coding. Using ChatGPT or any other AI web-app is like using Notepad for coding. The LLMs doesn't have any context for codebase, tools or anything. It's just give you answer based on its training data.

I know there are some people that uses web-app like it knew everything about their codebase, they just copy-paste errors and give direct related file in context and ask them fix the error. Sometime AI fix those error in single run, but most of time it require multiple runs with more context. But if they uses coding agent like Opencode, even with the free models, their bug-fixes will be fast and will be using it add new features.

If you are using web-app, then using models based on benchmarks is acceptable. But if you are using LLMs with coding agent then benchmark score doesn't hold much value.

Good Context = Good Results

Manually adding context to chat session is not a good practice, cause we sometime make errors/mistakes which may lead to missing context. But if LLMs has right tool to add context itself or to search whole codebase(it will not do it, unless u write shit code) if want to, it will output a good code

Always use model based on need, not on intelligence or benchmark score.

How I use AI to write code

I use OpenCode as my coding agent

Workflow	Model
Searching docs or codebase	Claude Haiku 4.5
Planning	Claude Opus 4.5
Writing Code	Claude Haiku 4.5 / (rarely) Claude Opus 4.5
UI/UX	GPT-5.2 / GPT-5.1 Codex max / Google Gemini 3 Pro
Learning new things	btca with Claude Haiku 4.5