We Built the Best Coin Identification and Grading Model in the World. Here Are the Numbers.
Vardera identifies US and World coins with 98% accuracy and grades them with 96% accuracy, in under two seconds.

Written by
Ben Parfitt, Chief AI Officer

Vardera builds computer-vision infrastructure for collectibles and condition-dependent goods. Today we are publicly sharing benchmark results for our coin identification and grading model, tested head-to-head against six of the most capable frontier AI models in the world.
Vardera identifies US and World coins with 98% accuracy and grades them with 96% accuracy, in under two seconds. No frontier model we tested came within an order of magnitude on any of those metrics.
This post explains why that gap exists, presents the full benchmark data, and describes what it means for the future of coin commerce.
Why general-purpose AI falls short on coins
Frontier multimodal models are remarkable at general reasoning. They can write code, summarize legal documents, and recognize that something in a photo is a Morgan Dollar. But consistent, in-depth coin identification and grading is a fundamentally different problem.
Distinguishing two 1878-CC Morgan Dollars — one a MS-67 and the other a MS-63 — requires detecting subtle differences in luster, contact marks, and strike quality. The visual gap between those two grades represents tens of thousands of dollars in value. A general vision model can tell you something is a Morgan Dollar. It cannot reliably tell you which year and mint, in what condition, at what grade.
This is not a prompting problem or a model-size problem. General-purpose models optimize for breadth across millions of tasks. Coin grading requires purpose-built depth: domain-specific training data, specialized feature extraction, and models designed from the ground up for fine-grained visual discrimination. That's a different kind of system, not a better prompt.
Our approach: custom models, not LLM wrappers
Vardera trains category-specific computer vision models for each domain we cover. For coins, that means models trained on large-scale datasets of expert-graded imagery, built to do two things: identify the coin — often down to the specific die variety — and grade its condition on the Sheldon scale, the same standard used by NGC and PCGS.
These custom machine learning models are the backbone of our product. This is where the performance advantage comes from: 98% identification accuracy, 96% grading accuracy, and sub-2-second response times. None of those numbers are achievable with a general-purpose model today. Meanwhile, our models are constantly learning from their mistakes and improving week-over-week.
The benchmark
We evaluated Vardera against six frontier AI models on a standardized test set of coin images, measuring three things: identification error rate, grading error rate, and average inference time per image. Every model received the same images and the same task.
The dataset contains 1188 coins, spanning 69 different coin types, 191 years, seven mints, and all possible numeric grade values on the Sheldon scale.

Raw error rates and latency
Model | ID Error | Grade Error | Avg Time |
Vardera | 2% | 4% | 2.00s |
Opus 4.7 (Thinking) | 40.32% | 42.51% | 9.64s |
Gemini 3.1 Pro (Thinking) | 40.99% | 62.37% | 86.39s |
GPT 5.5 (Thinking) | 42.17% | 49.49% | 19.46s |
GPT 4.1 | 42.42% | 57.07% | 6.85s |
Kimi K2.6 (Thinking) | 36.45% | 68.27% | 70.26s |
Qwen 3.5 397b (Thinking) | 38.30% | 75.34% | 83.79s |
Vardera's error rates are 2% on identification and 4% on grading. The closest frontier model on identification (Kimi K2.6 at 36.45%) still fails on more than a third of coins. On grading, Opus 4.7 is the closest at 42.51%, meaning it gets the grade wrong on nearly half of all coins. On speed, even the fastest frontier model (GPT 4.1 at 6.85 seconds) is more than 3x slower than Vardera's 2-second average.
The relative comparison makes the gap clearer. When Vardera is the baseline (1x), the frontier models look like this:

Frontier models relative to Vardera
Model | ID Error | Grade Error | Avg Time |
Vardera | 1x | 1x | 1x |
Opus 4.7 (Thinking) | 20x | 11x | 5x |
Gemini 3.1 Pro (Thinking) | 20x | 16x | 43x |
GPT 5.5 (Thinking) | 21x | 12x | 10x |
GPT 4.1 | 21x | 14x | 3x |
Kimi K2.6 (Thinking) | 18x | 17x | 35x |
Qwen 3.5 397b (Thinking) | 19x | 19x | 42x |
All values relative to Vardera (1x = baseline). Lower is better.
Every frontier model produces at least 18x more identification errors and at least 11x more grading errors than Vardera. Several models (Gemini, Kimi, Qwen) take over a minute per coin, making them impractical for any real-time application.
Why the gap exists
The performance difference is not surprising if you understand how these systems are built. Frontier LLMs are trained on internet-scale text and image data to perform well across an enormous range of tasks. They know what a Morgan Dollar is, but have no dedicated feature extraction for surface condition, and no calibration against professional grading standards.
Vardera's coin models, by contrast, do nothing but coins. From the architecture, to training data, to evaluation metrics, everything is optimized for one job: looking at a coin image and producing an accurate identification and grade. The result is not incremental improvement. It is a category difference in capability.
We expect frontier models to continue improving on general vision tasks. However, the structural gap on fine-grained visual assessment is not something that closes with the next model release. It requires purpose-built training data, domain-specific architectures, and calibration against the standards the industry already trusts. That is what Vardera has built.
We'll continue to open source how our category intelligence performs against industry standards as we expand into new verticals. If you're running a marketplace or platform where coins, trading cards, or other collectibles need to be accurately identified, graded, or valued at scale, we'd like to talk. Reach out at info@vardera.com or visit vardera.com.

Written by
Ben Parfitt, Chief AI Officer
Want to see Vardera in action?