#2: The 10,000 Hour Rule for Agents

Doom Thesis, Healthcare AI, and Robot Soccer

Are agents overhyped?

Unequivocally yes (for now)! The vast majority of agent products today aren’t actually agentic AKA possess a level of agency and autonomy to execute against some level of ambiguity. For example, a recent experiment showed that top performing agents failed to successfully complete real world office tasks 70% of the time. What gives?

Writ large, agents in 2025 are effectively wrappers on top of LLMs. While SoTA LLMs house incredible amounts of raw knowledge, they lack two key ingredients that agents need to “make it”:

  1. Fluid (generalized) intelligence to adapt to new surroundings or tasks outside of their training set

  2. The ability to continually self-learn from personal experience

The reason humans are so useful is not mainly their raw intelligence. It’s their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task.

-Dwarkesh Patel

Let’s take driving as an example. Humans have become good at driving cars not from watching countless hours of other people driving and then one-shotting — rather, they’ve driven hundreds of hours and iterated their spatial awareness, intuition, and reflexes over time. It’s important to note that even for the most well-versed drivers, it takes them a bit of time to re-acclimate driving a new car.

Human brains use a variety of techniques to become better at driving. To use model parlance, humans continually adjust their neuron weights as they learn in a way that they become good at driving their specific car but generalizable enough that they can drive similar cars without much of an adjustment period (no overfitting). They also utilize both short and long-term memory to assist — and do this over a long enough period of time of learning where they are able to complement these techniques.

But agents today aren’t designed this way. While test-time compute and context windows help, they simply aren’t powerful enough primitives for agents to self-learn because model weights aren’t changing.

The 10,000 Hour Rule

So are agents washed? Hardly — but it’ll require both real innovation and time to become productive.

Malcolm Gladwell popularized the “10,000 Hour Rule” in his book, Outliers. The basic premise is that it takes roughly 10,000 hours to demonstrate mastery of a complex skill. Furthermore, the rule emphasizes the importance of focused, structured practice with clear feedback and goals, rather than simply accumulating hours of repetition.

I propose that agents will adhere to the same rule: agents must practice at doing a specific skill for 10,000 hours (~1.15 years if done 24/7) to get world-class at it, with mechanisms to meaningfully and continually incorporate these learnings (like a human would). To accomplish this, agent builders will need to incorporate a broader toolset or set of approaches including:

  • Continual RL per Deployment - The agent is continually fine-tuning its weights using RL based on where its deployed

  • LoRA / Adapter-based Personalization - This approach keeps the base model weights frozen, and uses lightweight, modular adapters that adjust model behavior based on learnings. This is similar to learning a new skill — you aren’t rewiring your brain when you learn to drive, you’re adding a new skill.

  • Contextual Policies - Instead of training a new model per deployment, this approach conditions the model on a deployment profile to guide behavior

  • Meta-Learning (e.g PEFT + Caching) - This approach trains a model to rapidly adapt to a few examples via few-shot fine-tuning or weight caching

  • Long-term memory - Use vector databases, RAG, and structured memory to enable adaptation via external knowledge + instructions

Agents will use a combination of these approaches based on use case, type of deployment, and business model. For example, continual RL on a per deployment basis might make sense for a large enterprise customer or very custom deployment, but is probably too expensive or risks overfitting for more standard use cases. I suspect most use cases will benefit from leveraging a mix of adapters, contextual policies and memory.

We’re still incredibly early in the agent story and currently in the hype cycle phase of the journey (see below). The good news: the most exciting and interesting times are ahead. Buckle up!

Interesting stuff this week

  1. Certainly an interesting watch. TLDR: AI actually won’t make us more productive, it’ll only just make us succumb to our dopamine-oriented vices (watching content, gambling, status comparison). Similar to how social media likely caused more harm than good. The trade is to invest in this playing out vs. extreme productivity.

  1. Microsoft launched MAI-DxO (Microsoft AI Diagnostic Operator) that hit 86% accuracy on 304 complex use cases (compared to 21% from solo human doctors). Healthcare is an obvious massive area for agents to help with once continual self-learning techniques are implemented. Pretty crazy!

  2. Ok so maybe robots aren’t quite coming for our jobs yet: