SRA

A third-year associate at a mid-sized US firm billed 1,640 hours in 2025, well under the firm’s 1,950 target. On the old math, that is an underperforming associate heading for a difficult review. On the actual facts, she was the most valuable junior in her practice group: she had become the firm’s most fluent user of its AI drafting and review tools, was completing in three hours what her peers took twelve to do, and was quietly training two first-years on how to validate AI output. Her hours were down precisely because she was good.

Her review form had no place to capture any of that. It measured hours, realization, and supervising-partner narrative comments — a framework built for a world where time spent was a reasonable proxy for value delivered. In 2026, at a growing number of US firms, it no longer is.

This is the measurement problem legal AI has created, and it is more urgent than the question of which AI platform to buy. The billable hour has been the bedrock of associate evaluation for a century. AI is dismantling the assumption underneath it — that time spent equals value created — faster than most firms have updated their review systems to keep up. Firms that keep measuring associates primarily on hours are now actively penalizing the associates who use AI best, which is the opposite of what they intend.

The Thomson Reuters Institute’s 2026 Report on the State of the US Legal Market documented the tension directly: AI efficiency gains are colliding with hours-based billing and evaluation models, and firms are struggling to identify where competitive advantage actually lives when the same work takes a fraction of the time. As of March 2026, roughly 70 percent of attorneys reported using AI weekly. The work has changed. Most performance measurement has not.

This guide covers why the billable hour is failing as an associate performance metric, what US firms should measure instead, and how to build a defensible, AI-era evaluation framework. SRA has designed and run confidential performance review and evaluation programs exclusively for US law firms since 1987 — which means we have spent decades on the exact question AI has now made unavoidable: how do you measure a lawyer’s value when the easy proxy stops working?

Why the billable hour is failing as a performance metric

The billable hour was never a measure of quality. It was a measure of time, used as a convenient proxy for effort, contribution, and value because those things are harder to measure directly. For most of legal history the proxy held up reasonably well: a diligent associate billed more hours, and more hours roughly tracked more contribution. AI breaks the proxy in three ways.

It decouples time from output. When an AI tool drafts a first-pass contract in minutes that previously took hours, the associate who uses it well bills fewer hours for the same or better output. Hours go down as performance goes up. A metric that moves in the opposite direction from the thing it is supposed to measure is worse than no metric at all.

It removes the work that used to fill junior hours. First-pass document review, legal research, and routine drafting — the work that historically filled a junior associate’s timesheet — is exactly the work AI handles fastest. Goldman Sachs estimated AI could automate roughly 44 percent of US legal tasks. As that work disappears from the timesheet, the hours metric measures less and less of what associates actually do.

It penalizes the behavior firms most want. A firm that rewards high hours is, in an AI-enabled environment, rewarding the associate who uses AI least — the one still doing manually what could be automated. The associate who has become genuinely efficient looks worse on the hours metric. Firms end up sending exactly the wrong signal about exactly the behavior that determines their competitive future.

The core problem in one sentence: in an AI-enabled firm, billable hours have shifted from being an imperfect proxy for associate value to being inversely correlated with it — and most review systems have not noticed.

None of this means hours disappear entirely. Hours still matter for client billing, capacity planning, and matters where time genuinely tracks effort. The point is narrower and sharper: hours can no longer be the primary axis of associate performance evaluation, because the better associates get at using AI, the less their hours describe their value.

What US law firms should measure instead

If not hours, then what? The dimensions below are the ones AI makes more important, not less — the human contributions that survive automation and increasingly determine which associates become valuable senior lawyers. A defensible AI-era evaluation framework measures these directly rather than assuming they will show up in a timesheet.

Dimension	What it measures	Why it matters more in the AI era
Judgment & AI-output validation	Can the associate catch AI errors, hallucinations, and confident-but-wrong output?	The single most important new competency; AI makes validation skill the bottleneck
Substantive depth	Real understanding of the law and the matter, not just passing AI output through	With routine work automated, depth is expected sooner and is harder to fake
Client & matter ownership	Relationship management, responsiveness, ownership of outcomes	Distinctively human; the basis for advancement once execution is automated
Judgment under ambiguity	Decisions where there is no clear answer and AI cannot help	The work that remains irreducibly human and partner-track relevant
AI fluency & leverage	How well the associate uses AI tools to increase their own and the team’s output	Directly tied to the firm’s competitive position; now explicitly evaluable
Development of others	Helping juniors learn to use and validate AI	Becomes visible earlier as cohorts shrink and leverage matters more
Work quality & reliability	Accuracy, polish, and dependability of final output	Always mattered; now the differentiator once volume is automated

Notice what these have in common: none of them is captured by an hours total, and most of them were previously assumed to develop on their own through years of routine work. That assumption is now broken. The routine work that used to build judgment and depth is automated, which means firms have to measure and develop these dimensions deliberately rather than waiting for them to emerge from the grind.

Rebuilding evaluation for the AI era

Shifting associate evaluation away from hours and toward judgment, validation, depth, and AI fluency is not a matter of adding a few questions to the existing review form. It requires rethinking what the firm measures, how it collects multi-source input on dimensions that are harder to quantify than hours, and how it trains partners to assess them consistently.

This is the work SRA does. We design and run structured, multi-source performance evaluation programs for US law firms — the kind of evaluation architecture that can actually measure judgment and contribution rather than defaulting to hours because hours are easy to count. As AI makes the easy proxy obsolete, the structured evaluation becomes the thing that holds associate development together.

→ Talk to SRA about AI-era performance evaluation → Explore SRA’s review and evaluation programs

Building a defensible AI-era evaluation framework

Moving from an hours-centric to a contribution-centric evaluation framework is a design project. Five principles distinguish the firms doing it well.

1. Demote hours from primary metric to context. Hours still appear in the evaluation, but as one data point among many, framed as context rather than as the headline. The evaluation explicitly notes that lower hours can reflect strong AI leverage, so partners do not unconsciously penalize efficiency.

2. Make validation quality a scored dimension. If AI fluency and output validation are now core competencies, they need to be evaluated explicitly, with their own criteria and their own place on the review form. “How reliably does this associate catch problems in AI-generated work?” is a question worth asking directly.

3. Use multi-source input, because judgment is hard to see from one angle. Hours are easy to measure from one source (the timesheet). Judgment, depth, and client ownership require multiple observers — supervising partners, peers, sometimes clients — to assess reliably. The shift away from hours necessarily means a shift toward multi-source evaluation.

4. Evaluate trajectory, not just snapshot. In a fast-changing environment, whether an associate is developing AI fluency and judgment over time matters more than their absolute level at one point. Year-over-year trajectory on the new dimensions is more diagnostic than a single year’s score.

5. Connect the evaluation to development, not just compensation. Because the new dimensions are developable skills rather than fixed traits, the evaluation should feed a development plan. An associate weak on validation quality can be trained; the evaluation should surface that and route it to coaching, not just record it for the comp committee.

The structured, multi-source approach this requires is the same architecture we describe in What Is the Difference Between a Performance Evaluation and a Performance Review at a US Law Firm? and Attorney Performance Review: A Complete Law Firm Guide (2026). AI does not replace that architecture; it makes it essential.

The old model versus the AI-era model, side by side

The shift is easiest to see as a direct comparison. The left column is how most US firms evaluated associates in 2020. The right column is where the firms adapting well are heading in 2026.

Element	Pre-AI model 2020	AI-era model 2026
Primary metric	Billable hours	Judgment, contribution, and validation quality
What hours signal	Effort and value, roughly	Context only; low hours may mean high AI leverage
Core competency	Volume of competent output	Validating AI output and exercising judgment
How juniors develop	Years of routine work build skill	Supervised AI use + earlier substantive responsibility
Evaluation inputs	Hours + supervising partner narrative	Multi-source: partners, peers, sometimes clients
Efficiency	Implicitly penalized, fewer hours	Explicitly rewarded as AI leverage
What ‘good’ looks like	High hours, clean work product	Strong judgment, reliable validation, AI fluency, client ownership
Risk of getting it wrong	Mostly fairness/morale	Penalizing your best people and losing them to competitors

The bottom-right cell is the one that should concern firm leadership most. In the pre-AI world, a flawed evaluation metric was mostly a fairness and morale problem. In the AI era, it is a competitive problem: a firm still rewarding hours is identifying its most AI-fluent, highest-leverage associates as underperformers — and those are precisely the associates competitors most want to hire.

Four mistakes US firms are making in the transition

1. Bolting AI questions onto an hours-centric form. Adding “rate this associate’s AI usage” to a review form still built around hours does not fix the underlying problem. If hours remain the headline metric, efficient associates still look worse overall. The fix is structural, not additive.

2. Measuring AI usage instead of AI judgment. Some firms now track how much associates use AI tools, as if usage itself were the goal. It is not. A junior who uses AI constantly but cannot tell when it is wrong is more dangerous than one who uses it rarely. Measure validation quality and judgment, not raw usage volume.

3. Assuming judgment still develops on its own. The routine work that used to build judgment is automated. Firms that assume associates will still develop judgment through osmosis — without deliberately designing for it — will find their mid-levels thinner on judgment than previous cohorts. Evaluation has to surface this gap so development can address it.

4. Letting hours quietly drive comp while claiming to value contribution. The most corrosive version of the problem: a firm says it values judgment and AI fluency in reviews, but compensation still tracks hours. Associates are not fooled. They optimize for what actually drives their comp, which means they optimize for hours, which means they use AI less. The stated metric and the real metric have to match.

Frequently asked questions

Should US law firms stop tracking billable hours entirely? No. Hours still matter for client billing, capacity planning, and matters where time genuinely reflects effort. The change is that hours should no longer be the primary axis of associate performance evaluation. Demote hours to context and elevate judgment, validation quality, depth, client ownership, and AI fluency as the dimensions that actually predict which associates become valuable senior lawyers.

How do you measure something as subjective as ‘judgment’? Through structured, multi-source evaluation. Judgment is harder to measure than hours, but not impossible: specific behaviorally-anchored questions (“How reliably does this associate identify issues the AI missed?”), input from multiple supervising partners, and trajectory over time produce a defensible assessment. The difficulty of measuring judgment is exactly why firms defaulted to hours — and exactly why that default no longer works.

Won’t associates game whatever new metric we use? Associates optimize for whatever drives their evaluation and compensation, which is precisely why the metric has to reward the right things. If you reward validation quality and judgment, associates will invest in developing validation quality and judgment. The risk is not that associates respond to incentives; it is that firms keep incentivizing hours while claiming to value contribution.

Does this apply to small firms or only Am Law firms? It applies to any US firm where associates use AI tools, which by 2026 is most of them — roughly 70 percent of attorneys report weekly AI use. The infrastructure differs by firm size (a large firm needs more formal multi-source systems), but the core problem — hours no longer tracking value — is the same at a 12-attorney firm and a 1,000-attorney firm.

How does this connect to associate retention? Directly. Associates who are penalized by an outdated hours metric for being efficient and AI-fluent are exactly the associates competitors will recruit. An evaluation framework that recognizes and rewards their actual contribution is a retention tool. Measuring people well is one of the most underrated retention levers a firm has.

What about partners — does AI change how they’re evaluated too? Yes, though differently. Partner evaluation already centers on origination, client relationships, leadership, and contribution rather than raw hours, so it is less disrupted. But AI changes partner leverage and the economics of their practices, which feeds into evaluation. We cover partner evaluation in

Partner Performance Review: How US Law Firms Evaluate Equity Partners in 2026.

Where does the legal AI software choice fit into this? The platform decision (Harvey, Legora, and others) is separate from and secondary to the measurement question. Whichever tool a firm adopts, the performance-measurement problem is the same. We covered the platform landscape and the broader people-side implications in

Legora vs. Harvey AI: What US Law Firms Should Actually Know in 2026.

Sources

Thomson Reuters Institute & Georgetown Law (January 2026). 2026 Report on the State of the US Legal Market. abovethelaw.com
The Agency Recruiting (May 2026). 2026 Legal Hiring Trends: AI Impact on Law Firm Staffing (citing Law360 Pulse March 2026 survey — 70% weekly AI use). theagencyrecruiting.com
Clio (2026). How Lawyers Use AI to Boost Billable Hours and Improve Work-Life Balance. clio.com
Barker Gilmore (January 2026). Adapt or Be Automated: AI’s Impact on Junior Lawyers. barkergilmore.com
SignalFire (March 2026). Beyond the Billable Hour: How AI Is Reshaping Margins and Models at Law Firms. signalfire.com
NALP Foundation (2025). Update on Associate Attrition and Hiring (CY 2025). nalpfoundation.org

Measuring Associate Performance After AI: A US Law Firm Guide

Why the billable hour is failing as a performance metric

What US law firms should measure instead

Rebuilding evaluation for the AI era

Building a defensible AI-era evaluation framework

The old model versus the AI-era model, side by side

Four mistakes US firms are making in the transition

Frequently asked questions

Sources

Related reading on srahq.com

Transform Your Firm’s Performance Evaluation Today

What Is the Average Associate Attrition Rate at US Law Firms? (2026)

Why US Law Firm Leaders Need Upward Reviews in 2026 — The Data Case

How to improve performance reviews in law firms?

Survey Research Associates, Inc is a 30- year-old company specializing in Performance Review and Employee Engagement services for Law Firms.