A third-year associate at a mid-sized US firm billed 1,640 hours in 2025, well under the firm’s 1,950 target. On the old math, that is an underperforming associate heading for a difficult review. On the actual facts, she was the most valuable junior in her practice group: she had become the firm’s most fluent user of its AI drafting and review tools, was completing in three hours what her peers took twelve to do, and was quietly training two first-years on how to validate AI output. Her hours were down precisely because she was good.
Her review form had no place to capture any of that. It measured hours, realization, and supervising-partner narrative comments — a framework built for a world where time spent was a reasonable proxy for value delivered. In 2026, at a growing number of US firms, it no longer is.
This is the measurement problem legal AI has created, and it is more urgent than the question of which AI platform to buy. The billable hour has been the bedrock of associate evaluation for a century. AI is dismantling the assumption underneath it — that time spent equals value created — faster than most firms have updated their review systems to keep up. Firms that keep measuring associates primarily on hours are now actively penalizing the associates who use AI best, which is the opposite of what they intend.
The Thomson Reuters Institute’s 2026 Report on the State of the US Legal Market documented the tension directly: AI efficiency gains are colliding with hours-based billing and evaluation models, and firms are struggling to identify where competitive advantage actually lives when the same work takes a fraction of the time. As of March 2026, roughly 70 percent of attorneys reported using AI weekly. The work has changed. Most performance measurement has not.
This guide covers why the billable hour is failing as an associate performance metric, what US firms should measure instead, and how to build a defensible, AI-era evaluation framework. SRA has designed and run confidential performance review and evaluation programs exclusively for US law firms since 1987 — which means we have spent decades on the exact question AI has now made unavoidable: how do you measure a lawyer’s value when the easy proxy stops working?
Why the billable hour is failing as a performance metric
The billable hour was never a measure of quality. It was a measure of time, used as a convenient proxy for effort, contribution, and value because those things are harder to measure directly. For most of legal history the proxy held up reasonably well: a diligent associate billed more hours, and more hours roughly tracked more contribution. AI breaks the proxy in three ways.
It decouples time from output. When an AI tool drafts a first-pass contract in minutes that previously took hours, the associate who uses it well bills fewer hours for the same or better output. Hours go down as performance goes up. A metric that moves in the opposite direction from the thing it is supposed to measure is worse than no metric at all.
It removes the work that used to fill junior hours. First-pass document review, legal research, and routine drafting — the work that historically filled a junior associate’s timesheet — is exactly the work AI handles fastest. Goldman Sachs estimated AI could automate roughly 44 percent of US legal tasks. As that work disappears from the timesheet, the hours metric measures less and less of what associates actually do.
It penalizes the behavior firms most want. A firm that rewards high hours is, in an AI-enabled environment, rewarding the associate who uses AI least — the one still doing manually what could be automated. The associate who has become genuinely efficient looks worse on the hours metric. Firms end up sending exactly the wrong signal about exactly the behavior that determines their competitive future.
The core problem in one sentence: in an AI-enabled firm, billable hours have shifted from being an imperfect proxy for associate value to being inversely correlated with it — and most review systems have not noticed.
None of this means hours disappear entirely. Hours still matter for client billing, capacity planning, and matters where time genuinely tracks effort. The point is narrower and sharper: hours can no longer be the primary axis of associate performance evaluation, because the better associates get at using AI, the less their hours describe their value.
What US law firms should measure instead
If not hours, then what? The dimensions below are the ones AI makes more important, not less — the human contributions that survive automation and increasingly determine which associates become valuable senior lawyers. A defensible AI-era evaluation framework measures these directly rather than assuming they will show up in a timesheet.
Notice what these have in common: none of them is captured by an hours total, and most of them were previously assumed to develop on their own through years of routine work. That assumption is now broken. The routine work that used to build judgment and depth is automated, which means firms have to measure and develop these dimensions deliberately rather than waiting for them to emerge from the grind.
Rebuilding evaluation for the AI era
Shifting associate evaluation away from hours and toward judgment, validation, depth, and AI fluency is not a matter of adding a few questions to the existing review form. It requires rethinking what the firm measures, how it collects multi-source input on dimensions that are harder to quantify than hours, and how it trains partners to assess them consistently.
This is the work SRA does. We design and run structured, multi-source performance evaluation programs for US law firms — the kind of evaluation architecture that can actually measure judgment and contribution rather than defaulting to hours because hours are easy to count. As AI makes the easy proxy obsolete, the structured evaluation becomes the thing that holds associate development together.
→ Talk to SRA about AI-era performance evaluation → Explore SRA’s review and evaluation programs
Building a defensible AI-era evaluation framework
Moving from an hours-centric to a contribution-centric evaluation framework is a design project. Five principles distinguish the firms doing it well.
1. Demote hours from primary metric to context. Hours still appear in the evaluation, but as one data point among many, framed as context rather than as the headline. The evaluation explicitly notes that lower hours can reflect strong AI leverage, so partners do not unconsciously penalize efficiency.
2. Make validation quality a scored dimension. If AI fluency and output validation are now core competencies, they need to be evaluated explicitly, with their own criteria and their own place on the review form. “How reliably does this associate catch problems in AI-generated work?” is a question worth asking directly.
3. Use multi-source input, because judgment is hard to see from one angle. Hours are easy to measure from one source (the timesheet). Judgment, depth, and client ownership require multiple observers — supervising partners, peers, sometimes clients — to assess reliably. The shift away from hours necessarily means a shift toward multi-source evaluation.
4. Evaluate trajectory, not just snapshot. In a fast-changing environment, whether an associate is developing AI fluency and judgment over time matters more than their absolute level at one point. Year-over-year trajectory on the new dimensions is more diagnostic than a single year’s score.
5. Connect the evaluation to development, not just compensation. Because the new dimensions are developable skills rather than fixed traits, the evaluation should feed a development plan. An associate weak on validation quality can be trained; the evaluation should surface that and route it to coaching, not just record it for the comp committee.
The structured, multi-source approach this requires is the same architecture we describe in What Is the Difference Between a Performance Evaluation and a Performance Review at a US Law Firm? and Attorney Performance Review: A Complete Law Firm Guide (2026). AI does not replace that architecture; it makes it essential.
The old model versus the AI-era model, side by side
The shift is easiest to see as a direct comparison. The left column is how most US firms evaluated associates in 2020. The right column is where the firms adapting well are heading in 2026.
The bottom-right cell is the one that should concern firm leadership most. In the pre-AI world, a flawed evaluation metric was mostly a fairness and morale problem. In the AI era, it is a competitive problem: a firm still rewarding hours is identifying its most AI-fluent, highest-leverage associates as underperformers — and those are precisely the associates competitors most want to hire.
Four mistakes US firms are making in the transition
1. Bolting AI questions onto an hours-centric form. Adding “rate this associate’s AI usage” to a review form still built around hours does not fix the underlying problem. If hours remain the headline metric, efficient associates still look worse overall. The fix is structural, not additive.
2. Measuring AI usage instead of AI judgment. Some firms now track how much associates use AI tools, as if usage itself were the goal. It is not. A junior who uses AI constantly but cannot tell when it is wrong is more dangerous than one who uses it rarely. Measure validation quality and judgment, not raw usage volume.
3. Assuming judgment still develops on its own. The routine work that used to build judgment is automated. Firms that assume associates will still develop judgment through osmosis — without deliberately designing for it — will find their mid-levels thinner on judgment than previous cohorts. Evaluation has to surface this gap so development can address it.
4. Letting hours quietly drive comp while claiming to value contribution. The most corrosive version of the problem: a firm says it values judgment and AI fluency in reviews, but compensation still tracks hours. Associates are not fooled. They optimize for what actually drives their comp, which means they optimize for hours, which means they use AI less. The stated metric and the real metric have to match.
Frequently asked questions
Should US law firms stop tracking billable hours entirely? No. Hours still matter for client billing, capacity planning, and matters where time genuinely reflects effort. The change is that hours should no longer be the primary axis of associate performance evaluation. Demote hours to context and elevate judgment, validation quality, depth, client ownership, and AI fluency as the dimensions that actually predict which associates become valuable senior lawyers.
How do you measure something as subjective as ‘judgment’? Through structured, multi-source evaluation. Judgment is harder to measure than hours, but not impossible: specific behaviorally-anchored questions (“How reliably does this associate identify issues the AI missed?”), input from multiple supervising partners, and trajectory over time produce a defensible assessment. The difficulty of measuring judgment is exactly why firms defaulted to hours — and exactly why that default no longer works.
Won’t associates game whatever new metric we use? Associates optimize for whatever drives their evaluation and compensation, which is precisely why the metric has to reward the right things. If you reward validation quality and judgment, associates will invest in developing validation quality and judgment. The risk is not that associates respond to incentives; it is that firms keep incentivizing hours while claiming to value contribution.
Does this apply to small firms or only Am Law firms? It applies to any US firm where associates use AI tools, which by 2026 is most of them — roughly 70 percent of attorneys report weekly AI use. The infrastructure differs by firm size (a large firm needs more formal multi-source systems), but the core problem — hours no longer tracking value — is the same at a 12-attorney firm and a 1,000-attorney firm.
How does this connect to associate retention? Directly. Associates who are penalized by an outdated hours metric for being efficient and AI-fluent are exactly the associates competitors will recruit. An evaluation framework that recognizes and rewards their actual contribution is a retention tool. Measuring people well is one of the most underrated retention levers a firm has.
What about partners — does AI change how they’re evaluated too? Yes, though differently. Partner evaluation already centers on origination, client relationships, leadership, and contribution rather than raw hours, so it is less disrupted. But AI changes partner leverage and the economics of their practices, which feeds into evaluation. We cover partner evaluation in
Partner Performance Review: How US Law Firms Evaluate Equity Partners in 2026.
Where does the legal AI software choice fit into this? The platform decision (Harvey, Legora, and others) is separate from and secondary to the measurement question. Whichever tool a firm adopts, the performance-measurement problem is the same. We covered the platform landscape and the broader people-side implications in
Legora vs. Harvey AI: What US Law Firms Should Actually Know in 2026.
Sources
- Thomson Reuters Institute & Georgetown Law (January 2026). 2026 Report on the State of the US Legal Market. abovethelaw.com
- The Agency Recruiting (May 2026). 2026 Legal Hiring Trends: AI Impact on Law Firm Staffing (citing Law360 Pulse March 2026 survey — 70% weekly AI use). theagencyrecruiting.com
- Clio (2026). How Lawyers Use AI to Boost Billable Hours and Improve Work-Life Balance. clio.com
- Barker Gilmore (January 2026). Adapt or Be Automated: AI’s Impact on Junior Lawyers. barkergilmore.com
- SignalFire (March 2026). Beyond the Billable Hour: How AI Is Reshaping Margins and Models at Law Firms. signalfire.com
- NALP Foundation (2025). Update on Associate Attrition and Hiring (CY 2025). nalpfoundation.org
Related reading on srahq.com
- → Legora vs. Harvey AI: What US Law Firms Should Actually Know in 2026
- → What Is the Difference Between a Performance Evaluation and a Performance Review at a US Law Firm?
- → Attorney Performance Review: A Complete Law Firm Guide (2026)
- → How Should US Law Firms Separate the Coaching Conversation from the Performance Review Record?
- → Partner Performance Review: How US Law Firms Evaluate Equity Partners in 2026
- → Which Employee Engagement Software Should US Law Firms Actually Use in 2026?
The billable hour is not dead, but its century-long reign as the primary measure of associate value is ending. The firms that adapt their evaluation frameworks first will identify, develop, and retain their best people. The firms that keep measuring hours will keep penalizing the associates who use AI best — and watch them leave.
SRA designs and runs confidential, structured, multi-source performance review and evaluation programs exclusively for US law firms. The architecture that measures judgment, development, and contribution rather than defaulting to hours — built for US law firms since 1987, and more necessary than ever as AI reshapes the work.
Performance Reviews | Upward Reviews | 360-Degree Feedback | Firm Engagement Survey | Schedule a Consultation
Exclusively serving United States law firms since 1987.


