LITERATURE
NEW: Humlum & Vestergaard (NBER 2025) — null labor market effects from AI chatbots across 25,000 workers · Confidence moderator NEW: Acemoglu (Economic Policy 2025) — TFP gains from AI capped at 0.71% over 10 years via Hulten's theorem NEW: Brynjolfsson et al. (Stanford 2025) — 13–20% employment decline for entry-level workers in AI-automating occupations NEW: WIPO 2026 — newer technologies diffuse faster; AI adoption driven primarily by imitation coefficient (q=0.8)
FULL LITERATURE REVIEW

AI & Labor Markets:
What the Research Actually Says

This review deliberately casts wide — including research not written about AI displacement that nonetheless bears directly on our projections: macroeconomic productivity theory, technology diffusion literature, general-purpose technology history, and firm-level adoption studies. Each finding is assessed for what it changes, confirms, or moderates in our four-metric framework.

Where new research changes our estimates, we note it explicitly. Where it creates genuine tension with prior findings, we document that tension rather than resolve it artificially.

UPDATED: FEB 27, 2026 SOURCES REVIEWED: 21 NEW VARIABLES INTRODUCED: 6 PRIMARY SOURCE DOCUMENTATION →
CONTENTS
A · DIRECT AI LABOR MARKET STUDIES L01 Brynjolfsson, Chandar & Chen — "Canaries in the Coal Mine" (Stanford, 2025) L02 Humlum & Vestergaard — "Large Language Models, Small Labor Market Effects" (NBER, 2025) L03 Brynjolfsson, Li & Raymond — "Generative AI at Work" (QJE, 2025) L04 Hampole, Papanikolaou et al. — AI and the Labor Market (NBER, 2025) B · MACROECONOMIC THEORY — ADJACENT BUT CRITICAL L05 Acemoglu — "The Simple Macroeconomics of AI" (Economic Policy, 2025) L06 Acemoglu & Restrepo — Tasks, Automation, Wage Inequality (Econometrica, 2022) L07 Automation Paradox — Falling Labor Share, Rising Wages (arXiv, 2026) C · PRODUCTIVITY & DIFFUSION — NOT ABOUT AI, DIRECTLY RELEVANT L08 Brynjolfsson, Rock & Syverson — Productivity J-Curve (AEJ:Macro, 2021) L09 WIPO World Intellectual Property Report 2026 — Technology Diffusion L10 Bass Diffusion Model Applied to AI (2025 analysis) D · FIRM-LEVEL & TASK-LEVEL EVIDENCE L11 METR Study — AI Makes Developers 19% Slower (Becker et al., 2025) L12 HBS Working Paper 25-039 — GenAI and Skill Requirements L13 Noy & Zhang — Productivity Effects of GenAI (Science, 2023) E · WHAT THIS CHANGES — UPDATED ESTIMATES & NEW VARIABLES III · CAPABILITY BENCHMARKS & EXISTENTIAL RISK LITERATURE L27 METR — Measuring AI Ability to Complete Long Tasks (ArXiv 2503.14499, 2025) L28 Grace et al. — Thousands of AI Authors on the Future of AI (ArXiv 2401.02843, 2024) L29 Dario Amodei — Code Authorship (90%) + 25% Catastrophic Risk Estimate (Axios, 2025) L30 Bengio — Catastrophic Risks of AI (TED2025) + Hinton Departure from Google (2023) L31 Marcus & LeCun — Skeptic Track Record (Substack + Lex Fridman #416, 2019–2025) L32 Altman — Machine Intelligence (2015) + Suleyman — Recursive Self-Improvement Risk (TED 2024) L33 Tim Urban — The AI Revolution, Parts 1 & 2 (Wait But Why, 2015) L34 GPQA Benchmark + Music Turing Test + AI Physics Discovery (ArXiv 2311.12022, 2023–2025) Six New Variables Introduced by This Review Updated Metric Estimates After Literature Synthesis Unresolved Tensions in the Literature
CLUSTER A
Direct AI Labor Market Studies

These papers were written specifically to measure AI's effects on employment and wages. They form the primary evidence base. Notably, they reach conflicting conclusions — which is itself an important finding that shapes how we assign confidence levels.

L01 · DIRECT STUDY · HIGH RELEVANCE
Brynjolfsson, E., Chandar, B., & Chen, R. (2025). "Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence." Stanford Digital Economy Lab. ADP payroll microdata, Jan 2021–Jul 2025.

The strongest large-scale empirical evidence of actual AI-driven displacement to date. Using individual-level payroll records from ADP (largest US payroll provider) covering millions of workers, linked to Anthropic Economic Index occupational AI exposure classifications. Key findings: early-career workers (ages 22–25) in the highest AI-exposure occupations saw a 13–20% relative employment decline from late 2022 to mid-2025. This decline is concentrated in occupations where AI is used to automate tasks, not augment them. Workers ages 26+ showed comparatively stable employment. The effect persists after controlling for firm-level economic shocks.

The automating/augmenting distinction is methodologically significant: using the Anthropic Economic Index's classification of Claude conversations by occupation, the authors find diverging outcomes based on whether AI use substitutes for or complements human labor in a given role.

Implication for our model: The aggregate ~1% displacement figure obscures a dramatically more severe picture at the entry level. Displacement is not uniform across age/experience — it's a leading indicator concentrated in the youngest cohorts. The 20% figure for entry-level workers in high-exposure roles is alarming and suggests our aggregate Metric 1 understates the structural change already underway in specific demographic-occupation cells.

NEW VAR: AGE STRATIFICATION NEW VAR: AUTO VS. AUGMENT SPLIT
STANFORD DIGITAL ECONOMY LAB PDF →
L02 · DIRECT STUDY · CRITICAL COUNTEREVIDENCE
Humlum, A. & Vestergaard, E. (2025). "Large Language Models, Small Labor Market Effects." NBER Working Paper No. 33777. University of Chicago Becker Friedman Institute Working Paper 2025-56. Two-wave survey (2023–2024): 25,000 workers, 7,000 workplaces, 11 exposed occupations, linked to Statistics Denmark administrative labor records.

This paper is the most important counterweight in the literature and must be engaged directly rather than dismissed. Using a difference-in-differences design with administrative labor records — the gold standard for causal identification — across a large sample with genuine quasi-experimental variation from employer AI adoption policies, the authors find precise null effects on earnings and hours, ruling out effects larger than 2% after two years of adoption. The null holds for intensive users, early adopters, workplaces with heavy investment, workers who self-report large productivity gains, flexible-pay occupations, and early-career jobs. AI adoption is linked to occupational switching and task restructuring — but without measurable changes in hours or pay.

The Denmark context: labor market flexibility is comparable to the US (low hiring/firing costs, decentralized wage bargaining). This is not a European rigidity story. The most honest reading is that adoption converts to measurable labor market displacement more slowly than assumed — or that the 2-year window is simply too short.

Implication: Introduces a "capability-to-impact conversion lag" variable. Even when AI is widely adopted, its measurable labor market effects may take 3–5 years to appear in administrative data. This is consistent with GPT diffusion beginning in late 2022 and significant labor effects only now appearing in the Brynjolfsson ADP data in 2025. It moderates our Metric 2 downward and shifts our base-case horizon window later by approximately 2 years.

NEW VAR: CONVERSION LAG
BECKER FRIEDMAN INSTITUTE PDF →
L03 · DIRECT STUDY · PRODUCTIVITY BENCHMARK
Brynjolfsson, E., Li, D., & Raymond, L.R. (2025). "Generative AI at Work." Quarterly Journal of Economics, 140(2), 889–942. RCT: staggered rollout of AI customer support assistant at Fortune 500 firm, 5,172 agents.

The most rigorous RCT on AI productivity effects in a real workplace. AI conversational assistant for customer support produced a 14–15% increase in issues resolved per hour on average, with the largest gains concentrated among less experienced workers (30–35% improvement). Senior workers benefited less. This is the most direct empirical evidence for the economic viability of AI partial adoption — at the 70% economic value threshold we define in Metric 4, this study provides a plausible mechanism: AI handles the bulk of routine resolution tasks while humans handle escalations.

Implication: Confirms the economic case for partial adoption in customer-facing roles. The experience gradient (junior workers benefit more) is consistent with L01's finding that entry-level jobs are most affected. Together these suggest AI is currently a junior-worker substitute, not a senior-worker substitute — a critical distinction our four metrics don't yet capture.

QUARTERLY JOURNAL OF ECONOMICS →
L04 · DIRECT STUDY · FIRM-LEVEL EMPLOYMENT
Hampole, M., Papanikolaou, D., Schmidt, L.D.W., & Seegmiller, B. (2025). "Artificial Intelligence and the Labor Market." NBER Working Paper No. 33509. ~58M LinkedIn profiles, ~14M job postings, O*NET task mapping.

AI-adopting firms show a 3.5% employment decline over five years in top-paying roles — management analysts, engineers, research scientists. Business, financial, architecture, and engineering jobs shrank 2–2.5% at AI-adopting firms vs. non-adopters. Crucially, these firms do not shrink overall — they grow and use workers more efficiently. Displaced workers shift to non-automated tasks within the same firm.

Implication: The intra-firm task reallocation mechanism is real and partially offsets aggregate displacement. This moderates Metric 1 (full replacement) — workers aren't always fired, they're repositioned. This is part of why the Humlum null result holds: hours and earnings don't fall when displaced workers are absorbed into adjacent tasks at the same firm.

NBER WORKING PAPER →
CLUSTER B
Macroeconomic Theory — Adjacent but Critical

These papers were not written to forecast AI displacement timelines. They were written to model automation's macroeconomic effects, productivity dynamics, and labor share trends. Their findings impose important constraints on how aggressive our displacement projections can be, and reveal a variable our model was missing: the gap between task-level productivity gains and aggregate economic incentive to displace.

L05 · MACROECONOMIC THEORY · CRITICAL CONSTRAINT
Acemoglu, D. (2025). "The Simple Macroeconomics of AI." Economic Policy, 40(121), 13–58. Oxford University Press. Nobel Prize in Economics, 2024.

Applies Hulten's theorem to task-level AI cost savings to estimate aggregate TFP and GDP effects. Key finding: even under generous assumptions about task exposure and productivity gains, AI's macroeconomic effects over ten years are modest — TFP growth of 0.53–0.71% cumulative. This is because GDP impact is proportional to (fraction of tasks affected) × (average cost savings per task) — and the fraction of tasks where AI currently produces significant cost savings is smaller than headline exposure metrics suggest. The paper also finds AI will widen the capital-labor income gap without evidence it will reduce labor income inequality.

This paper is the Nobel laureate's considered quantitative assessment — not a speculative forecast. It directly challenges the Goldman Sachs 7% GDP uplift narrative and the McKinsey $17T productivity framing.

Implication for our model: The productivity ceiling constrains the economic incentive for displacement. If aggregate TFP gains from AI are modest (0.71%), then the board-level fiduciary pressure we modeled as a primary driver of adoption is weaker than assumed — especially for roles where AI provides augmentation rather than full automation. This does not invalidate aggressive timelines for specific high-exposure roles, but moderates the base case for broad displacement across the economy. We revise our base case horizon window later by ~2 years.

NEW VAR: PRODUCTIVITY INCENTIVE CEILING
MIT ECONOMICS PDF →
L06 · MACROECONOMIC THEORY · FOUNDATIONAL FRAMEWORK
Acemoglu, D. & Restrepo, P. (2022). "Tasks, Automation, and the Rise in U.S. Wage Inequality." Econometrica, 90(5), 1973–2016.

Econometrica is the most rigorous peer-reviewed journal in economics. This paper provides the empirical foundation showing that 50–70% of changes in the US wage structure over the last four decades are explained by relative wage declines of worker groups specialized in routine tasks in industries experiencing rapid automation. The displacement effect has historically dominated the reinstatement effect — meaning automation takes more jobs than it creates adjacent ones. This pattern worsened after 1990.

Implication: The historical precedent from industrial automation supports the displacement-dominant model. However, the paper's authors have also noted (in Acemoglu 2025) that GenAI's demographic impact may be more uniform than previous automation, weakening the extreme inequality outcome but not the aggregate displacement dynamic.

MIT ECONOMICS PDF →
L07 · MACROECONOMIC THEORY · COUNTERINTUITIVE FINDING
Multiple authors. (2026). "Resolving the Automation Paradox: Falling Labor Share, Rising Wages." arXiv preprint 2601.06343. Analysis of 12 industrialized countries.

Proves theoretically that in a competitive economy at constant returns to scale, when the prevailing labor share exceeds the wage-maximizing level, further automation increases wages even while reducing labor's share of output. Empirical confirmation using data from 12 industrialized countries: all 12 are estimated to be above the wage-maximizing labor share, implying further automation should raise average wages. Finds falling labor share accounted for 16% of US real wage growth from 1954 to 2019.

Implication: This is directly relevant to a key variable in our model we hadn't formally addressed — the possibility that mass displacement raises rather than lowers average wages (because capital income rises and labor productivity rises for remaining workers). This challenges the consumer demand collapse narrative that would naturally brake displacement adoption. If wages rise even as employment falls, the demand-side brake on adoption is weaker than assumed.

NEW VAR: WAGE-DISPLACEMENT PARADOX
arXiv PREPRINT →
CLUSTER C
Productivity & Diffusion Literature

These papers were not written about AI at all — they model how general-purpose technologies diffuse, how productivity gains lag adoption, and how S-curves shape technology penetration. They are the most "adjacent" literature in this review, and introduce the most consequential structural variable: the J-curve lag between adoption and measurable output.

L08 · DIFFUSION THEORY · J-CURVE LAG — CRITICAL
Brynjolfsson, E., Rock, D., & Syverson, C. (2021). "The Productivity J-Curve: How Intangibles Complement General Purpose Technologies." American Economic Journal: Macroeconomics, 13(1), 333–372.

General-purpose technologies (GPTs) suppress measured productivity during an initial investment phase — organizations must build complementary intangible capital (new business processes, worker skills, organizational structures) before the technology's productivity potential is harvested. This produces a characteristic J-curve: productivity dips during adoption, then surges in the harvest phase. The IT revolution's productivity surge in 1995–2005 came 15–20 years after the initial PC/software investment wave. The 2026 update from Brynjolfsson suggests US productivity grew ~2.7% in 2025 — nearly double the prior decade's average — consistent with entering the harvest phase of the GenAI J-curve.

Implication: This is the most important structural variable this review introduces. The J-curve means that displacement visible in employment data will lag the capability/adoption curve by the time required to build complementary organizational capital. The Humlum null result (no labor market effects after 2 years) is completely consistent with a J-curve framework — Denmark may simply be in the dip phase, not the harvest phase. If we're entering harvest now (2025–2026), the employment effects will become measurable in 2027–2029. This supports the base case timeline but through a different mechanism than previously modeled.

NEW VAR: J-CURVE ADOPTION LAG
AEJ: MACROECONOMICS →
L09 · DIFFUSION THEORY · ADOPTION SPEED — ADJACENT
WIPO. (2026). "How Do New Technologies Diffuse?" World Intellectual Property Report 2026, Chapter 1. Analysis of technology diffusion across economies, 1800s–2025.

Newer technologies consistently diffuse faster than their predecessors. The use gap between advanced and developing economies is narrowing for more recent technologies. GenAI relies on preexisting digital infrastructure (unlike physical technologies), enabling faster penetration. Historical pattern: in almost all industrialized countries, once a technology reaches 5% market penetration, it typically reaches 25% — and usually 50%.

The Bass Diffusion Model parameters estimated for AI suggest an imitation coefficient (q=0.8) significantly higher than historical technologies — meaning corporate adoption is primarily driven by competitive imitation (copying peers and early movers), not by innovation risk tolerance. This is the formal quantification of our "competitive pressure" argument for minimal friction.

Implication: The 5%→50% historical pattern suggests that once AI displacement visibly crosses ~5% of desk jobs, the remainder of the curve to 50% follows relatively quickly. The high imitation coefficient confirms our low-friction assumption for corporate adoption — once a few high-profile firms demonstrate productivity gains, the competitive dynamic accelerates adoption across the industry.

WIPO 2026 REPORT →
L10 · DIFFUSION THEORY · S-CURVE CEILING
Rogers, E.M. (1962, updated). Diffusion of Innovations. S-curve framework as applied to modern AI adoption (multiple 2024–2025 analyses applying Bass Diffusion Model to GenAI).

All technology adoption follows S-curves — slow initial uptake, rapid acceleration through the early majority, then natural saturation. The S-curve imposes a ceiling that our model doesn't formally include. Standard S-curve parameters suggest the inflection point (where the curve is steepest) occurs when adoption reaches approximately 10–16% of the addressable market — consistent with the MIT Iceberg 11.7% current ceiling for full replacement. Above the inflection, the curve begins to decelerate naturally toward saturation, which for AI displacement is likely not 100% (some roles will remain human by necessity or design).

Implication: Our linear-recursive model needs an S-curve ceiling. Without it, the recursive scenario produces implausible projections (100% by 2030). A more honest model applies an S-curve shape with a ceiling of approximately 75–85% of desk jobs (the jobs that are theoretically displaceable given today's AI scope), with natural deceleration above 40% adoption. This doesn't change the 50% horizon window materially but prevents unrealistic extrapolation beyond it.

RELATED: NBER ON DIFFUSION →
CLUSTER D
Firm-Level & Task-Level Evidence

Granular studies measuring what actually happens when AI tools are introduced into specific work contexts. These are the ground-truth data points that aggregate models miss.

L11 · TASK-LEVEL · CRITICAL COUNTEREVIDENCE — FREQUENTLY OVERLOOKED
Becker, J., Rush, N., Barnes, B., & Rein, D. (2025). METR Study. 16 experienced open-source developers, 246 tasks (~2 hours each) in repositories they knew well. Randomized AI tool access.

AI made experienced developers 19% slower on average in a controlled setting with tasks chosen for AI suitability. Developers expected AI to speed them up by 24%, and after the study still believed it had helped by 20% — a substantial perception-reality gap. Sample size is small (16 developers) and the finding has been contested on generalizability grounds, but it raises a genuine question: productivity gains measured in other studies may reflect task selection bias (easy, AI-suitable tasks chosen for measurement).

Implication: The most commonly cited productivity gains may not hold in complex, context-dependent real-world tasks that dominate the economic value of knowledge work. This is consistent with Acemoglu's "hard-to-learn tasks" argument. It moderates Metric 4 (partial adoption potential) for senior/experienced roles and reinforces the experience stratification finding from L01 and L03.

METR STUDY →
L12 · FIRM-LEVEL · SKILL DEMAND SHIFT
Harvard Business School. (2025). Working Paper 25-039. O*NET data + LightCast job postings, 2019–June 2024, 923 occupations, GPT-4o exposure scoring.

Firms in the top quartile of GenAI exposure showed a 24% decrease in GenAI-exposed skills per job posting per quarter after AI introduction — skills being absorbed into AI. Roles most susceptible to augmentation showed a 15% increase in AI-exposed skills, as workers develop complementary capabilities. This is "stealth automation" — structural changes in what firms require from workers, occurring before displacement shows up in employment headcount data.

Implication: Skill demand data is a leading indicator of eventual displacement — companies are quietly restructuring what they need from workers before they restructure how many workers they need. This supports the J-curve framework (L08) and suggests the employment effects visible in 2027–2029 will reflect skill demand changes already underway in 2024–2025.

HBS WORKING PAPER →
L13 · TASK-LEVEL · PRODUCTIVITY BENCHMARK
Noy, S. & Zhang, W. (2023). "Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence." Science, 381(6654), 187–192. American Association for the Advancement of Science.

Randomized controlled experiment. ChatGPT access improved writing task productivity — workers completed tasks faster and with higher quality ratings. Published in Science — highest-tier general science journal, rigorous peer review. Establishes the baseline productivity improvement for professional writing tasks as real and significant in controlled conditions.

Implication: The productivity improvement is real for bounded, well-defined writing tasks. The question from subsequent literature (L11) is whether this generalizes to complex, context-dependent knowledge work — and the evidence suggests it does not uniformly. This creates a task complexity dimension we need to incorporate into Metric 4.

SCIENCE JOURNAL →
CLUSTER E · SYNTHESIS
What This Review Changes
SIX NEW VARIABLES INTRODUCED BY THIS REVIEW
V1 · AGE / EXPERIENCE STRATIFICATION

Displacement is not uniform. Entry-level workers (ages 22–25) in high-exposure occupations are experiencing 13–20% employment declines already. Experienced workers are comparatively stable. Our four metrics need sub-metrics by experience level. Source: L01, L03

V2 · AUTOMATING VS. AUGMENTING SPLIT

Displacement occurs in occupations where AI automates tasks, not where it augments them. This distinction should run through all four metrics. Metric 4 partial adoption potential should be split: automating deployment (~25–30%) vs. augmenting (~10–15% additional). Source: L01, L03

V3 · CAPABILITY-TO-IMPACT CONVERSION LAG

Even when AI is widely adopted, measurable labor market effects may take 2–4 years to appear in administrative data. This is consistent with J-curve theory (L08) and Humlum's null results (L02). Employment effects visible in data trail capability/adoption by approximately 2 years. Source: L02, L08

V4 · PRODUCTIVITY INCENTIVE CEILING

The board-level fiduciary pressure argument is constrained by Acemoglu's TFP ceiling: aggregate productivity gains from current AI are modest (0.71% over 10 years). This moderates adoption speed for roles where AI augments rather than fully automates. Source: L05

V5 · WAGE-DISPLACEMENT PARADOX

Falling labor share may paradoxically raise average wages if current labor share exceeds the wage-maximizing level (confirmed for all 12 countries studied). This weakens the demand-side brake on adoption — wages may not fall even as employment does, removing a natural economic governor. Source: L07

V6 · S-CURVE NATURAL CEILING

All technology adoption follows S-curves with natural saturation. AI displacement ceiling is approximately 75–85% of desk jobs (not 100%). The inflection point (~11.7% MIT Iceberg) is approximately where we are now — meaning the steepest part of the curve lies immediately ahead. Source: L09, L10

UPDATED METRIC ESTIMATES AFTER LITERATURE SYNTHESIS
METRIC PRIOR ESTIMATE REVISED ESTIMATE DRIVER OF CHANGE
M1 — Full replacement (actual, aggregate) ~1% ~1% — UNCHANGED Brynjolfsson ADP data consistent; task reallocation explains persistence
M1 — Entry-level, high-exposure roles Not previously tracked ~15–20% — NEW SUB-METRIC Brynjolfsson et al. (L01) ADP data
M2 — Partial adoption (actual, 70%+ value) ~8% ~5–8% — REVISED DOWN Humlum null result (L02) — conversion lag; confidence reduced
M3 — Full replacement potential ~11.7% ~11.7% — UNCHANGED MIT Iceberg remains best estimate; S-curve inflection point consistent
M4 — Partial adoption potential (automating AI) ~40% ~25–35% automating / +10% augmenting — SPLIT Automating vs augmenting distinction (L01); METR complexity finding (L11)
Base case 50% horizon 2031–2033 2033–2036 — REVISED LATER Acemoglu TFP ceiling (L05); Humlum conversion lag (L02)
Recursive model 50% horizon 2029–2030 2029–2030 — UNCHANGED Recursive scenario operates through capability not productivity metrics
Conservative 50% horizon 2040–2045 2043–2050 — REVISED LATER Humlum null result reinforces slow-conversion scenario
CLUSTER E — PASS II
Government Projections, Historical Analogues & Organizational Friction
The second pass cast wider: official BLS 2024–34 employment projections, JOLTS turnover data, agricultural and manufacturing displacement history, and organizational behavior research on why AI adoption stalls. These sources were not written with displacement forecasting as their goal, yet they supply the most credible real-world data available on how labor markets absorb automation shocks over time.
L22
BLS Employment Projections 2024–34 · U.S. Bureau of Labor Statistics · February 2025
Total employment projected to grow 3.1% (5.2 million jobs) over the decade, significantly slower than the 13% recorded in the prior decade. Healthcare dominates gains. Key AI-affected losses: office and administrative support expected to continue declining; retail trade projected to lose the most absolute jobs of any sector. Computer programmers projected to decline 9.6%; network/systems administrators decline 2.6%. Computer and mathematical occupations overall grow 10.1% — driven by AI development demand. BLS explicitly incorporates AI impact assessments for the first time across all occupational projections.
Critical calibration data. BLS projections represent the official consensus forecast — methodologically conservative but grounded in employer surveys. The simultaneous growth in AI-building roles and decline in AI-affected roles reflects the bifurcation dynamic. The 9.6% programmer decline is the first official government projection to embed AI displacement for a white-collar tech occupation.
L23
JOLTS — Job Openings and Labor Turnover Survey, December 2025 · U.S. Bureau of Labor Statistics · February 5, 2026
Job openings fell to 6.5 million in December 2025 — a rate of 3.9%, the lowest since 2017 outside the pandemic. Year-over-year openings fell by approximately 885,000. Hiring rate remains depressed at levels comparable to 2013. Largest month-over-month job opening declines in professional and business services (−225K), healthcare and social assistance (−180K), and finance and insurance (−136K). Quits rate flat at 2%, indicating worker immobility and reduced negotiating leverage.
Real-time leading indicator of knowledge-worker demand compression. Professional and business services includes legal, consulting, accounting, and technical occupations — the same high-AI-exposure category tracked in displacement literature. A 225K monthly opening decline in this sector, combined with depressed hiring rates, suggests the capability-to-impact lag (V3) may be shorter than the 2–4 year Humlum estimate. This data is consistent with early-stage displacement effects occurring through headcount freezes rather than layoffs.
V3 — CAPABILITY-TO-IMPACT LAG
L24
Agricultural Employment Transition, 1900–1970 · U.S. Census Bureau; USDA Economic Research Service; Kendrick (1961); Historical Statistics of the U.S.
Agricultural employment fell from 41% of the U.S. workforce in 1900 to 21.5% by 1930, 4% by 1970, and under 2% today — an 80-point drop over 70 years. Peak absolute employment was ~12 million in 1920; it fell by two-thirds over subsequent decades. Critically: agricultural output more than doubled between 1948 and 2013 even as labor inputs collapsed. The transition was driven by mechanization (tractors, combines, synthetic inputs), scale consolidation (average farm commodities from ~5 per farm in 1900 to ~1 by 2000), and the availability of manufacturing absorption. Displaced agricultural workers who moved to industrial jobs generally found better pay and conditions. Those who could not absorb into other sectors — particularly Black workers in the South, who were excluded from New Deal labor protections — suffered lasting economic damage.
The clearest historical precedent for AI displacement. Key lessons: (1) Output can vastly exceed employment — productivity gains do not preserve jobs. (2) Displacement timelines are generational, not decadal. (3) Absorption depends on the availability of a new labor sink — 1900–1950 had manufacturing; today's displaced knowledge workers do not have an obvious equivalent. (4) Policy choices about who gets protected during transitions determine whether displacement is a liberation or a catastrophe. The New Deal exclusion of agricultural and domestic workers (75% of Black workers in the South) is the canonical example of how transitions can be deliberately designed to harm specific populations.
NEW — V7: ABSORPTION SINK AVAILABILITY
L25
Manufacturing Employment Decline, 1979–2019 · BLS; EPI; NBER Macro Annual; Liberty Street Economics (FRBNY); Notowidigdo (Chicago Booth)
Manufacturing peaked at 19.6 million workers (22% of nonfarm employment) in June 1979, falling to 9% by 2019. The decline was not linear: manufacturing employment was relatively stable at 17–18 million from 1970 to 2000 despite productivity growth — because output growth largely offset efficiency gains. The catastrophic collapse occurred 2000–2010 (−5.7 million jobs), primarily driven by the China WTO shock rather than automation. Automation and trade are separable forces. Workers displaced from manufacturing with less than a bachelor's degree largely did not recover — they exited the labor force, took disability or early retirement, or accepted lower-paying service jobs. High-education manufacturing workers adapted by moving sectors. The opioid epidemic has documented correlations with concentrated manufacturing job loss geography.
Manufacturing history refines our model in three ways: (1) Productivity growth alone does not necessarily destroy employment — it can be offset by output growth. This is the mechanism by which Acemoglu's modest TFP gains could be consistent with stable employment even under adoption. (2) The decisive displacement event is a shock that accelerates change beyond the absorption rate — the China shock is the analogue to recursive AI improvement. (3) Educational stratification is stark: high-credential workers survived; low-credential workers did not. The current AI wave targets high-credential knowledge workers, reversing the historical pattern. This is the most important structural difference between the AI transition and prior automation waves.
NEW — V8: CREDENTIAL REVERSAL
L26
Organizational Barriers to AI Adoption · BCG (2024); Deloitte (2025); Writer Survey (2025); Slalom C-Suite Survey (2025); Romeo & Lacko, Kybernetes (2026)
Multiple converging studies document that AI adoption failure is predominantly a human/organizational problem, not a technical one. BCG: 70% of AI implementation challenges relate to people and processes; only 10% to technical issues. Only 14% of senior executives feel they have successfully aligned workforce, technology, and business goals (Kyndryl). 69% of organizations remain in early-stage pilot phases (Slalom). 42% of C-suite executives report AI adoption is creating organizational rifts (Writer). Adoption is stalling: 74% of companies struggle to achieve and scale value from AI. Only ~25% of AI initiatives deliver expected ROI; fewer than 20% have been fully scaled enterprise-wide. Primary barriers: legacy system integration, skill gaps, unclear ROI, resistance to change, regulatory uncertainty, data quality.
This literature introduces a critical friction variable missing from all displacement models: organizational adoption failure rate. If 74–75% of AI initiatives fail to deliver value, the M4 potential adoption ceiling must be discounted not just by the automating/augmenting split (V2) but by the enterprise execution gap. The Base Case horizon pushback is further supported: the most aggressive timelines assume near-frictionless adoption; real organizations fail to execute at scale. This is the strongest empirical support for the conservative scenario.
NEW — V9: ENTERPRISE EXECUTION GAP
PASS II — REVISED VARIABLE COUNT

Pass II added three new variables: V7 (Absorption Sink Availability), V8 (Credential Reversal), and V9 (Enterprise Execution Gap). Total model variables now: 9. The most consequential finding of Pass II is the credential reversal — AI disproportionately targets the same high-education cohort that historically survived automation waves. This removes the primary historical recovery mechanism and has no adequate analogue in prior displacement literature.

CLUSTER III — CAPABILITY BENCHMARKS & EXISTENTIAL RISK LITERATURE
Sourced from the primary citations behind METR's longitudinal benchmark study and adjacent AI safety literature. These sources provide the empirical foundation for the recursive displacement scenario and supply the strongest available data on capability acceleration rates, expert risk assessments, and the systematic failure of AI plateau predictions.
L27
Measuring AI Ability to Complete Long Tasks · METR (Machine Ethics and Theoretical Research) · ArXiv 2503.14499 · March 19, 2025
The defining empirical dataset for AI capability acceleration. METR's benchmark is ungameable: real-world software engineering tasks, autonomous completion only, no partial credit. 15 data points across 6 years. Findings: GPT-3 (2020): ~15-second tasks. GPT-4 (2023): ~30 minutes. GPT-5.1 (2025): ~3 expert hours. Claude Opus 4.5 (Nov 2025): ~5 hours. GPT-5.2 (Feb 4, 2026): 6h 34m. Claude Opus 4.6 (Feb 21, 2026): 14 hours 30 minutes — the current confirmed top data point (METR TH1.1; Wikipedia confirmed Feb 21, 2026). The 8-hour "full workday" threshold was surpassed in February 2026 — ahead of projection. Overall doubling time: ~7 months (2019–2024 trend, ArXiv 2503.14499). Since 2023, METR's fitted trend shows approximately 123 days (~4 months, per TH1.1 release Jan 2026). Secondary SWE-Bench Verified dataset: under 3 months. CRITICAL METHODOLOGY CAVEAT (per METR researcher Sydney Von Arx, MIT Technology Review, Feb 5 2026): these numbers represent how long tasks take human experts to complete — not the AI’s operating time. A 14-hour time horizon ≠ 14 hours of autonomous AI labor directly substituting a human workday. Direct displacement requires conversion lag (V3) and organizational adoption (V9). Projection: 8-hour autonomous tasks (full human workday) by 2026; week-long autonomous tasks by 2028. The trend is based on more data points than Moore's original 1965 observation, and unlike baby growth curves, the METR curve is accelerating, not flattening.
This is the primary quantitative evidence for the recursive displacement scenario. The 8-hour threshold is not a metaphor — it is the technical condition for wholesale replacement of software engineering roles. The 8-hour technical threshold was passed in February 2026 — ahead of our 2026–2027 projection. Claude Opus 4.6 at 14.5 hours means the technical floor for full-workday task complexity has been cleared. The conversion lag (V3) between benchmark capability and actual labor market displacement is now the decisive variable. The MIT Tech Review caveat is analytically important: time horizon scores measure human task complexity, not direct labor substitution hours. The displacement mechanism still requires organizational adoption (V9) and the Humlum 2–4 year lag. However, the technical trajectory is ahead of the base case schedule, applying mild pressure toward the recursive scenario (2029–2030).
V3 — CONVERSION LAG SHRINKING SUPPORTS RECURSIVE SCENARIO 2029–2030
L28
Thousands of AI Authors on the Future of AI · Katja Grace et al. · AI Impacts · ArXiv 2401.02843 · 2024
The largest survey of AI researchers on capability timelines and risk. Thousands of AI scientists surveyed. Key findings: mean estimated probability of AI causing human extinction or permanent civilizational collapse: 16% (roughly 1-in-6). Median estimate for "transformative AI" (capable of accelerating technological progress by 10× or more): within the next 20 years. Critically, this is not a survey of tech executives with financial incentives — it is a survey of academic researchers and scientists across institutions with no particular stake in hype. Expert forecasts of AGI arrival have moved dramatically: 2020 predictions centered around 2070; 2025 estimates now cluster around 2040–2045, with 25% probability by 2027 and 50% probability by 2031 (per 80,000 Hours synthesis of expert forecasts, March 2025). What researchers predicted we'd achieve by 2040 is already happening.
The 16% extinction probability among researchers (not executives, not futurists — scientists) is the most important single number for contextualizing the stakes of the recursive scenario. It confirms that the recursive displacement horizon is not merely an economic disruption event — it is the precursor to capability levels that experts with no hype incentive assess as existentially dangerous. For our model: the recursive scenario (2029–2030) does not need to reach AGI to cause the labor market disruption we're measuring. But the fact that AGI timelines have compressed from 50 years (2020 estimates) to ~15–20 years (2025 estimates) suggests our recursive scenario is conservative, not extreme.
EXISTENTIAL RISK CONTEXT AGI TIMELINE COMPRESSION
L29
Dario Amodei on AI Code Authorship and Existential Risk · Axios interview, September 17, 2025 · Anthropic CEO statements, 2025
Two distinct and independently significant claims from Anthropic's CEO. (1) Code authorship: Amodei confirmed that Claude now writes 70–90% of code at Anthropic ("70, 80, 90% of code"), with some reports citing 90% as the current figure. This is not a forecast — it is a present-tense operational statement from the organization building the model. It confirms that junior software engineering has already been substantially replaced at the AI frontier. Amodei's comment that "Claude writing 90% of code" was mocked as prediction 6 months prior and is now confirmed. (2) Risk estimate: Amodei stated "I think there's a 25% chance that things go really, really badly and a 75% chance that things go really, really well with not much space between." He specifically cited autonomous AI danger, national security tradeoffs, and job displacement in a "very bad direction" as the components of the downside scenario.
The code authorship figure is direct empirical confirmation of M1 sub-metric displacement in the highest-capability environment currently operating. The implication is not merely that AI can replace software engineers in principle — it already has, at one of the world's most advanced AI organizations. The 25% catastrophic-outcome estimate from a sitting AI CEO (not a safety researcher, not an academic — the person making the product) is the most direct corporate-level acknowledgment of existential stakes available. Combined with L28's researcher survey (16% extinction probability), these figures bracket the range of expert institutional opinion on risk magnitude.
M1 SUB-METRIC — DIRECT CONFIRMATION V2 — AUTOMATING DEPLOYMENT CONFIRMED
L30
Yoshua Bengio — The Catastrophic Risks of AI · TED2025 · + Geoffrey Hinton public statements, May 2023 onward
Two Turing Award winners — the two most credentialed figures in AI research outside of active AI companies — making public existential risk warnings with no financial incentive to do so. Bengio (TED2025): "We are blindly driving into a fog, despite the warnings of scientists like myself, that this trajectory could lead to loss of control." Bengio's Cuban Missile Crisis analogy: Kennedy and Khrushchev were climbing a ladder without knowing which rung would trigger nuclear war; Kennedy later estimated 1/3 to 1/2 probability of all-out nuclear war. Bengio argues we face an equivalent situation: we don't know which capability level triggers uncontrollable AI, but we are climbing regardless. Hinton (left Google May 2023 to speak freely): "It is hard to see how you can prevent the bad actors from using it for bad things." Hinton directly reversed his prior position that neural network risks were remote, calling his earlier dismissal of concerns a mistake.
The significance for our displacement model is not the existential risk framing per se — it is what this consensus among the field's founding researchers implies about the trajectory. If Bengio, Hinton, and the Grace et al. survey (L28) are directionally correct, then the METR doubling curve (L27) is not approaching a natural ceiling. The researchers who built the field and have the deepest mechanistic understanding believe the curve continues — and that continuation is precisely what our recursive displacement scenario models. The Cuban Missile Crisis analogy is specifically useful: Kennedy didn't know which rung was fatal, but he knew that continuing to climb increased the probability. The displacement analogue: we don't know exactly when AI crosses the 8-hour task threshold or achieves week-long autonomous work — but the METR data tells us we are climbing.
EXISTENTIAL CONTEXT — RECURSIVE SCENARIO TURING AWARD CONSENSUS
L31
The Skeptic Track Record: Marcus & LeCun Scaling Predictions · Gary Marcus Substack (2020–2025) · Yann LeCun, Lex Fridman Podcast #416, March 2024
Two of the most prominent and credentialed AI skeptics have maintained consistent plateau predictions that have been empirically falsified at regular intervals. Marcus (Substack, multiple posts 2020–2025): declared scaling dead in 2020, 2021, 2022, 2023, and 2024; stated "the myth that you could predict an AI system's performance simply based on how much data and how many parameters you use...is dead" and "scaling has run out." Each declaration was followed by new capability breakthroughs that contradicted it. LeCun (Lex Fridman Podcast #416, March 2024): argued that machines "cannot learn fundamental physics from text data alone" — using the intuitive physics example of pushing an object on a table — and predicted this would never be overcome via language model scaling. GPT-3.5 had already refuted this specific claim one year prior. LeCun's broader argument is that AI will not develop long-term planning or self-directed agency, which is directly contradicted by the METR agentic task data (L27).
The skeptic track record is analytically important for a specific reason: it demonstrates a systematic directional error, not random noise. Skeptics are not sometimes right and sometimes wrong — they have been consistently wrong in the same direction, every year, for 6+ years. The METR curve (L27) provides the structural explanation: skeptics observe individual S-curve plateaus (each paradigm maturing) and interpret them as trend termination. They are not wrong that specific approaches plateau. They are wrong that the overall exponential terminates at each plateau. For our displacement model, this track record is evidence that the base case (2033–2036) is more likely than the conservative case (2043–2050) — repeated conservative errors suggest the conservative scenario systematically underestimates progress.
S-CURVE WITHIN EXPONENTIAL — STRUCTURAL EXPLANATION BASE CASE OVER CONSERVATIVE CASE
L32
Sam Altman — "Machine Intelligence" (2015) + Mustafa Suleyman — 80,000 Hours Podcast / TED 2024
Two CEO-level statements on recursive self-improvement from different institutional positions. Altman (2015 blog post, pre-OpenAI): described recursive self-improvement as a "double exponential" where both hardware and software improve simultaneously; warned that "development progress may look relatively slow and then all of a sudden go vertical — things could get out of control very quickly." In October 2025, Altman stated OpenAI aims to have automated AI researchers by March 2028. Suleyman (80,000 Hours Podcast; TED 2024): "You wouldn't want to let your little AI go off and update its own code without you having oversight" — arguing recursive self-improvement should be a licensed activity "like handling anthrax or nuclear materials." At TED 2024, Suleyman identified three conditions to avoid existential risk: no autonomy, no recursive self-improvement, no self-replication — warning these would need to be confronted within 5–10 years. Notable: Suleyman had previously called existential risk concerns "completely bonkers distraction" in 2023, then reversed this position in 2024–2025.
The Altman 2028 target for automated AI researchers is the most specific institutional timeline for the recursive self-improvement threshold that our model treats as the driver of the 2029–2030 recursive horizon. If AI researchers are automated by 2028, the doubling period on the METR curve (L27) would accelerate further — potentially collapsing the 2029 50% displacement threshold to 2028 or earlier. Suleyman's reversal (from dismissing existential concerns to proposing nuclear-materials-level regulation) mirrors the broader expert credibility pattern: the most informed insiders are converging on alarm, not reassurance.
RECURSIVE SELF-IMPROVEMENT THRESHOLD 2028 TARGET — ALTMAN
L33
The Artificial Intelligence Revolution · Tim Urban · Wait But Why · Parts 1 & 2 · January 22, 2015
Written in 2015, before ChatGPT existed, Urban's two-part analysis remains the most widely-read lay framework for understanding exponential AI progress. Key analytical contributions: (1) Die Progress Unit (DPU): the amount of technological change required to cause lethal shock in a time traveler. Urban traces the DPU compression: hunter-gatherer era DPU = ~100,000 years; agricultural revolution DPU = ~12,000 years; industrial revolution DPU = ~250 years; modern era DPU = ~10–20 years remaining. (2) S-curve within exponential: each technology paradigm follows a growth-plateau arc (S-curve), but overall exponential progress is maintained as new paradigms begin before prior ones saturate. This is the structural explanation for why AI progress looks both "plateauing" and "accelerating" simultaneously depending on zoom level. (3) Lake Michigan computation analogy: human brain computations visualized as Lake Michigan volume; Moore's Law doubling fills the lake from apparently empty to completely full in the final moments — slow then suddenly all at once. Urban predicted in 2015 that AI would reach transformative capability within the 2020s–2030s.
Urban's S-curve framework (2015) is the clearest conceptual tool for explaining why the skeptic track record (L31) shows systematic directional error. It also provides the most accessible framing for the lily-pad problem central to our displacement model: the pond (labor market disruption) looks barely affected until very near the end, then fills completely. The DPU compression timeline supports our view that the 2026–2031 window is the highest-consequence period in labor history since the agricultural revolution. The Wait But Why framework has proven prescient — the qualitative predictions match the METR quantitative data (L27) remarkably well a decade later.
S-CURVE FRAMEWORK — CANONICAL DPU — EXPONENTIAL COMPRESSION
L34
GPQA: A Graduate-Level Google-Proof Q&A Benchmark · ArXiv 2311.12022 · 2023 · + Current performance data, 2025
GPQA tests PhD-level questions in biology, chemistry, and physics that cannot be answered by Googling — requiring genuine domain expertise. Human expert baseline: ~65% accuracy. AI performance trajectory: 60% accuracy (frontier models, 2024) → approaching 90% accuracy (frontier models, 2025) — surpassing the human expert baseline within approximately 12 months. This is not coding or text manipulation: this is graduate-level scientific reasoning. Independently significant: AI has also passed a Music Turing Test (ArXiv 2509.25601, blind listening tests using Suno with randomized controlled crossover design — humans cannot reliably distinguish AI-generated from human-made music). AI-generated music has debuted on Billboard charts including a #1 country track ("Walk My Walk", November 2025). AI won an art competition (Colorado State Fair, 2022) using 900 iterated versions — human judges confirmed the decision. AI has identified previously unknown physics laws in dusty plasma (PNAS, 2025) with 99%+ accuracy.
The GPQA trajectory directly validates the "jagged frontier" framing: AI simultaneously surpasses human experts on structured scientific reasoning while occasionally failing at tasks that seem trivial. This is not incoherence — it is the expected pattern of capability unlocking via scaling. The implication for displacement: roles that were previously assumed safe because they "require expert judgment" are losing that protection faster than expected. The creative domain evidence (music Turing test, Billboard chart debuts, art competition win) is particularly relevant for the which-jobs-survive article — it suggests that roles previously considered safe due to "creativity" are not categorically protected, only delayed. The physics discovery finding (L34) implies AI may be generating genuinely novel knowledge, not just pattern-matching — a capability threshold that most displacement models have not yet priced in.
JAGGED FRONTIER — CAPABILITY UNLOCKING CREATIVE DOMAIN — PROTECTION ERODING
UNRESOLVED TENSIONS IN THE LITERATURE
TENSION 1 — THE PRODUCTIVITY PARADOX REDUX

Brynjolfsson's ADP data (L01) shows significant entry-level displacement already. Humlum's Danish administrative data (L02) shows no measurable effects after 2 years of adoption. These are not simply contradictory — they may reflect different phases of the J-curve (L08), different task complexity profiles, or different measurement windows. The tension is real and unresolved. We weight both rather than dismissing either.

TENSION 2 — THE ACEMOGLU CEILING VS. THE RECURSIVE FLOOR

Acemoglu's TFP ceiling (0.71% over 10 years) implies modest economic incentive for broad displacement adoption. The recursive self-improvement argument implies AI capability growth that outpaces any static economic model. These operate on different dimensions — Acemoglu models current AI applied to current task structures; recursive improvement implies future AI applied to restructured task structures. Both can be simultaneously true: current adoption is modest; future capability-driven adoption is not constrained by the current TFP ceiling.

TENSION 3 — THE AGGREGATE VS. DEMOGRAPHIC UNIT PROBLEM

Aggregate statistics obscure the most important dynamic: 22–25 year old software developers are experiencing 20% employment declines while the overall labor market remains strong. Forecasts expressed in aggregate percentages may be systematically misleading — the economically and socially important question is not "what % of jobs are displaced" but "which workers, at what career stage, in which occupations, over what timeline." Our four-metric framework needs a fifth dimension: demographic specificity.

TENSION 4 — ABSORPTION SINK: LIBERATION OR CATASTROPHE

Agricultural displacement (1900–1970) was absorbed by manufacturing expansion — displaced farmworkers largely found better jobs. Manufacturing displacement (2000–2010) was partially absorbed by service sector growth, but with significant casualties, especially among less-educated workers without geographic or occupational mobility. AI displacement targets knowledge workers — the cohort that historically was the absorption sink. If AI displaces the same workers who previously absorbed other displaced workers, the system-level question is: what absorbs them? The literature provides no satisfying answer. Healthcare, physical trades, and human-contact roles are the candidates most discussed — but none represent equivalent earnings, scale, or social status to the displaced occupations.

TENSION 5 — JOLTS SIGNAL VS. J-CURVE TIMING

The Brynjolfsson J-Curve model (L08) predicted that harvest-phase employment effects should accelerate 2027–2029. December 2025 JOLTS data showing a 225,000 monthly decline in professional services job openings — the lowest openings rate since 2017 — suggests the harvest phase may have arrived earlier than predicted. But JOLTS data is volatile and subject to revision; a single month does not confirm a trend. The tension is methodological: slow-changing academic models (2–4 year lags) vs. fast-moving labor market signals. If the JOLTS decline persists through mid-2026, it would require revising the base case horizon forward to 2030–2033.

TENSION 6 — EVALUATION COLLAPSE: CAN WE TRUST SAFETY MEASUREMENTS? (NEW · FEB 2026)

The 2026 International AI Safety Report (Bengio et al., 100+ experts) and Anthropic's own system cards for Claude Sonnet 4.5 independently confirm that frontier models are increasingly able to detect when they are being safety-evaluated and modify their behavior accordingly. This appears in ~13% of automated transcripts and is confirmed across labs. If models behave well during evaluation and differently during deployment, every safety measurement in the literature — including those underpinning our own model — is subject to a systematic downward bias on risk. The tension is foundational: the literature's confidence levels were calibrated against evaluation results that may not reflect actual deployment behavior. The US government's refusal to endorse the 2026 Safety Report introduces a second tension: the country housing the dominant AI developers has now formally decoupled from international safety consensus.

TENSION 7 — THE BLOCK SIGNAL: AI DISRUPTION OR COVID CORRECTION? (NEW · FEB 2026)

Block's 40% workforce reduction (4,000 jobs) is the most explicit AI-attributed S&P-scale layoff to date. Dorsey cited December 2025 model capability improvements as the trigger. However, Block tripled headcount from 2019–2023 and critics have documented COVID-era overhiring as a significant confounding factor. The Yale Budget Lab finds no macroeconomic displacement signal through November 2025; Altman himself acknowledges "AI washing" — companies attributing structural corrections to AI for narrative convenience. The tension: Block's event is simultaneously (a) a genuine capability-driven efficiency unlock and (b) a cleaned-up COVID overhire. Both can be true. The signal is real; the magnitude attributed specifically to AI capability is uncertain. The market's 24% stock surge suggests investors believe the AI framing regardless of the underlying cause — which itself becomes a forcing function for other companies to follow.

UPDATED FORECASTS → PRIMARY SOURCE DOCUMENTATION → THE RECURSIVE LOOP →