METHODOLOGY
Forecasting Methodology
& Analytical Framework
Complete documentation of the nine-variable model, scenario construction logic, confidence calibration system, data inputs, update cadence, and honest limitations underpinning every projection published on this site.
FEB 27, 2026
10 MIN READ
AI LABS RESEARCH
FRAMEWORK OVERVIEW
CORE VARIABLES
9
V1 through V9
SCENARIO TIERS
3
Conservative / Base / Recursive
CONFIDENCE LEVELS
3
High / Medium / Low
01. Nine-Variable Framework
THE STRUCTURAL INPUTS THAT DRIVE EVERY AI LABS PROJECTION
Every displacement forecast published on this site is generated from a model built on nine core variables. These variables were selected because they represent the minimum set of independent causal factors required to model AI labor displacement with structural fidelity. Each variable is measurable (though measurement quality varies), each has documented sources, and each can be independently adjusted to explore different futures.
The framework is deliberately parsimonious. More variables would increase apparent precision but not actual accuracy. We model the forces that matter most and are transparent about what is excluded.
V1
AI Capability Growth Rate
The annualized rate at which frontier AI models gain measurable task capability, calibrated against the METR autonomous task benchmark. This is not a measure of parameter count, training compute, or benchmark scores on synthetic tests. It tracks the duration of real-world expert tasks that AI can complete autonomously. As of February 2026, the METR doubling period is approximately 7 months overall and ~4 months since 2023, based on 15 data points across 6 years. V1 is the single most consequential variable in the model. Small changes in V1 cascade through every downstream projection.
UNIT: months (doubling period)
SOURCE: METR TH1.1
CONFIDENCE: HIGH
V2
Recursive Self-Improvement Factor
A multiplier representing the degree to which AI systems contribute to their own capability improvement, thereby shortening V1. When AI writes its own training code, optimizes its own architecture, and generates its own synthetic training data, the doubling period is no longer exogenous -- it becomes a function of existing capability. Anthropic has confirmed that Claude writes approximately 90% of its own codebase. This variable captures the feedback loop: better AI produces better AI faster. A value of 1.0 means no self-improvement effect (fixed doubling period). Values above 1.0 mean the curve is super-exponential. Values above ~1.5 produce hyperbolic dynamics within the model's time horizon.
UNIT: dimensionless multiplier
SOURCE: Anthropic, ML research
CONFIDENCE: MEDIUM
V3
Adoption Friction Coefficient
Measures the organizational, technical, and cultural resistance to deploying AI in roles where it is already technically capable. This is the gap between "AI can do this" and "organizations actually use AI to do this." Friction includes IT infrastructure readiness, change management costs, vendor lock-in, data migration complexity, middle-management resistance, and simple institutional inertia. The MIT Iceberg Index quantifies this gap: as of November 2025, 11.7% of US jobs are economically viable for AI substitution, but actual displacement is approximately 1%. The ratio implies a friction coefficient of roughly 0.90 -- meaning 90% of technically feasible displacement has not yet converted into actual job loss.
UNIT: 0-1 scale (0=no friction, 1=total resistance)
SOURCE: MIT Iceberg Index
CONFIDENCE: HIGH
V4
Regulatory Drag
Quantifies the decelerating effect of government regulation, litigation risk, and compliance requirements on AI deployment speed. Includes existing labor law, emerging AI-specific regulation (EU AI Act, proposed US frameworks), sector-specific rules (healthcare, finance, legal), and liability uncertainty. Regulatory drag does not prevent displacement -- it delays it. Historical precedent (ride-sharing, fintech, telemedicine) suggests regulatory drag typically adds 2-5 years to adoption timelines but does not alter terminal adoption levels. In our model, V4 shifts the displacement curve rightward on the time axis without changing its ultimate shape.
UNIT: years of delay
SOURCE: Policy analysis, EU AI Act
CONFIDENCE: MEDIUM
V5
Economic Incentive Multiplier
The cost differential between human labor and AI alternatives for equivalent task output. This is the fiduciary driver: when AI costs 5-15x less than a human employee for the same output, deployment becomes not a technology decision but a shareholder obligation. V5 captures the ratio of fully-loaded human labor cost (salary, benefits, office space, management overhead, error rates) to equivalent AI cost (API pricing, integration, monitoring, quality assurance). As of early 2026, the ratio ranges from 5:1 for routine knowledge work to 20:1 for high-volume data processing. V5 is accelerating as API costs drop approximately 10x per year while human labor costs inflate 3-4% annually.
UNIT: cost ratio (human:AI)
SOURCE: BLS, corporate filings
CONFIDENCE: HIGH
V6
Labor Market Elasticity
Measures the capacity of the labor market to absorb displaced workers through new task creation, reskilling, and sectoral reallocation. Historical technology transitions (mechanization, electrification, computerization) saw high elasticity: displaced workers eventually found new roles in new industries. V6 captures whether this absorption capacity holds in the AI transition. The critical question: does AI create enough new human-complementary tasks to offset displacement? Acemoglu & Restrepo (2019) show that reinstatement effects have weakened in recent decades. If V6 is low, displacement accumulates rather than redistributes.
UNIT: reabsorption rate (0-1)
SOURCE: Acemoglu & Restrepo (NBER)
CONFIDENCE: LOW
V7
Task Decomposability Index
Measures the degree to which a given role can be decomposed into discrete, automatable subtasks versus requiring holistic judgment that resists decomposition. Roles with high V7 (data entry, report generation, code review, customer service scripting) are automatable at the task level even before full-role replacement is feasible. Roles with low V7 (executive decision-making, complex negotiations, novel creative direction) involve contextual reasoning across domains that resists clean task boundaries. Goldman Sachs estimates 60-70% of knowledge jobs have high-V7 task components. The insight: displacement happens task-by-task before it becomes visible as headcount reduction.
UNIT: 0-1 index
SOURCE: Goldman Sachs, O*NET analysis
CONFIDENCE: MEDIUM
V8
Physical Bottleneck Factor
Captures the degree to which a role requires physical-world interaction that software cannot perform. Pure knowledge work (V8 near 0) is fully exposed to AI displacement. Roles requiring hands, spatial navigation, or real-time physical manipulation (V8 near 1) have a hardware dependency that current robotics cannot satisfy at competitive cost. V8 is the primary reason desk jobs face earlier displacement than trades. However, V8 is not static: advances in embodied AI, humanoid robotics (Tesla Optimus, Figure), and industrial automation are steadily reducing the physical bottleneck. Our model treats V8 as declining over time, reaching meaningful automation capability for structured physical tasks by 2030-2032.
UNIT: 0-1 scale (0=pure digital, 1=pure physical)
SOURCE: O*NET, robotics research
CONFIDENCE: MEDIUM
V9
Social Resistance Variable
Models the non-economic, non-regulatory human resistance to AI adoption: public backlash, union action, consumer preference for human interaction, cultural norms around human accountability, and political movements opposing automation. V9 is the least quantifiable variable in the framework. Historical precedent (Luddites, automation protests in the auto industry, gig economy backlash) suggests social resistance modulates timing but does not reverse technological adoption curves. The exception scenario: if displacement reaches politically destabilizing levels (>15-20% unemployment), social resistance could produce hard policy interventions (mandated human staffing, automation taxes) that fundamentally alter the curve. Our model includes this as a threshold effect, not a continuous variable.
UNIT: 0-1 scale (qualitative)
SOURCE: Historical analysis, polling
CONFIDENCE: LOW
FRAMEWORK NOTE
These nine variables are not independent. V1 and V2 form a feedback loop. V5 drives adoption speed, which reduces V3 over time. V9 can trigger V4 through political channels. The model handles these interactions through iterative simulation, not closed-form equations. This means small parameter changes can produce nonlinear output differences -- a feature, not a bug, as it reflects the genuine structure of the system being modeled.
02. Scenario Construction Methodology
HOW CONSERVATIVE, BASE CASE, AND RECURSIVE SCENARIOS ARE BUILT
All AI Labs projections are published as three parallel scenarios, not point predictions. Each scenario represents a coherent, internally consistent set of V1-V9 parameter values. The scenarios are not optimistic/pessimistic -- they correspond to different structural assumptions about the world.
The table below documents the exact parameter settings for each scenario. Readers can use the interactive displacement model on this site to explore intermediate parameter combinations.
| VARIABLE |
CONSERVATIVE |
BASE CASE |
RECURSIVE |
| V1 — Capability Growth |
12-mo doubling |
7-mo doubling |
4-mo doubling (accelerating) |
| V2 — Self-Improvement |
1.05x (negligible) |
1.35x (moderate loop) |
1.60x (active loop) |
| V3 — Adoption Friction |
0.70 (high inertia) |
0.40 (moderate) |
0.10 (minimal) |
| V4 — Regulatory Drag |
+4 years delay |
+2 years delay |
+0.5 years delay |
| V5 — Cost Incentive |
3:1 ratio |
8:1 ratio |
15:1 ratio |
| V6 — Market Elasticity |
0.60 (strong reabsorption) |
0.35 (partial) |
0.10 (weak) |
| V7 — Decomposability |
0.40 (limited) |
0.60 (moderate) |
0.85 (deep decomposition) |
| V8 — Physical Bottleneck |
0.80 (strong barrier) |
0.60 (moderate barrier) |
0.30 (barrier eroding) |
| V9 — Social Resistance |
0.50 (meaningful pushback) |
0.20 (limited effect) |
0.05 (negligible) |
| SCENARIO OUTPUT |
CONSERVATIVE |
BASE CASE |
RECURSIVE |
| 10% desk job displacement |
2032 |
2027 |
2027 |
| 25% desk job displacement |
2037 |
2029 |
2028 |
| 50% desk job displacement |
2040-2045 |
2031-2033 |
2029 |
| Physical/trade job impact begins |
2038+ |
2030-2032 |
2028-2029 |
| Curve geometry |
Standard S-curve |
Accelerating exponential |
Hyperbolic |
Conservative scenario assumes AI capability growth decelerates to a 12-month doubling period (consistent with a paradigm plateau), self-improvement effects are negligible, adoption faces persistent organizational friction, and regulatory environments significantly constrain deployment speed. This scenario is consistent with the position held by most mainstream labor economists.
Base case scenario extends current METR-measured trends without acceleration or deceleration. The recursive self-improvement loop exists but is moderate (1.35x multiplier, derived from Anthropic's code-authorship data). Adoption friction is real but erodes under competitive pressure. This is our central projection and the one used in headline forecasts.
Recursive scenario assumes the self-improvement feedback loop fully materializes, producing hyperbolic rather than exponential growth. Adoption friction collapses as the cost differential becomes irresistible. Regulatory response is slow. This is a tail-risk scenario, not a central forecast -- but the METR acceleration trend (7 months overall, 4 months since 2023) means it cannot be dismissed.
03. Data Sources Overview
PRIMARY INPUTS TO THE FORECASTING MODEL
All AI Labs projections derive from primary institutional research and direct data sources. We do not cite media analysis, opinion columns, or secondary coverage as evidence. Where institutional research (e.g., Goldman Sachs) is used, we reference the primary research report, not news articles about it. Our full source documentation is available on the Sources page.
-
ACADEMIC
Acemoglu & Restrepo (NBER, 2018/2019/2022) -- the task-based displacement-reinstatement framework that provides the theoretical foundation for all serious AI labor modeling. Published in Econometrica (the field's most rigorous journal). Calibration basis for V6 and V7 parameters.
-
ACADEMIC
MIT Iceberg Index (CSAIL, November 2025) -- economic viability threshold study measuring where AI substitution is financially rational, not merely technically feasible. Primary source for V3 calibration and the current-state displacement estimate (11.7% viable, ~1% actual).
-
ACADEMIC
Harvard Business School Working Paper 25-039 -- firm-level analysis of generative AI effects on skill requirements using O*NET and LightCast data (923 occupations, 2019-2024). Documents 24% decline in AI-exposed skills at high-automation firms.
-
ACADEMIC
Schmidt et al. (NBER Working Paper 33509) -- 58 million LinkedIn profiles and 14 million job postings analyzed against O*NET activity data. Shows 3.5% employment decline in top-paying roles at AI-adopting firms over five years.
-
BENCHMARKS
METR Autonomous Task Benchmark (TH1.1) -- the primary calibration source for V1 (AI Capability Growth Rate). Measures real-world autonomous task completion by frontier models. 15 data points across 6 years. Current frontier: Claude Opus 4.6 at 14h 30m human-equivalent task complexity. Doubling period: ~7 months overall, ~4 months since 2023.
-
GOVERNMENT
Bureau of Labor Statistics (BLS) -- monthly employment data (Current Employment Statistics, CES), occupational employment and wage statistics (OES), quarterly census of employment and wages (QCEW). Used for baseline labor market parameters, wage data (V5 calibration), and real-time employment trend validation.
-
GOVERNMENT
2026 International AI Safety Report -- cross-government assessment of AI capability trajectories and safety risks. Documents AI evaluation-awareness and behavioral modification capabilities relevant to V2 assessment.
-
CORPORATE
Goldman Sachs Global Investment Research (Briggs & Dong, 2025) -- 800+ occupation analysis estimating 6-7% workforce displacement under wide adoption (range 3-14%). Provides V7 calibration and sector-level risk decomposition.
-
CORPORATE
Quarterly earnings calls and investor presentations -- direct statements from technology companies, major employers, and AI infrastructure providers regarding AI deployment, headcount decisions, and cost savings. Sources include Anthropic, OpenAI, Google DeepMind, Microsoft, Block (Square), and others. Used for real-time V3 and V5 calibration.
-
TRACKERS
Technology layoff trackers (Layoffs.fyi, TrueUp, Challenger Gray) -- aggregated layoff announcements cross-referenced with company AI investment disclosures to identify AI-correlated workforce reductions versus cyclical adjustments. Used as a leading indicator, not as a primary data source, due to attribution uncertainty.
-
REGULATORY
EU AI Act, proposed US AI frameworks, sector-specific guidance (SEC, HHS, DOL) -- tracked for V4 (Regulatory Drag) calibration. Regulatory trajectory monitored for both direct deployment restrictions and indirect effects (compliance costs, liability exposure, mandatory human-in-the-loop requirements).
DATA INTEGRITY PRINCIPLE
We distinguish three tiers of evidence in all published analysis: directly measured (METR benchmarks, BLS statistics, published corporate data), peer-reviewed inference (Acemoglu & Restrepo frameworks, MIT Iceberg methodology), and AI Labs extrapolation (recursive acceleration projections, scenario-specific timelines). Every claim on this site is tagged with its evidence tier. Where we extrapolate, we say so explicitly.
04. Confidence Calibration System
WHAT HIGH, MEDIUM, AND LOW CONFIDENCE MEAN -- AND WHAT EVIDENCE IS REQUIRED FOR EACH
Every projection, claim, and parameter estimate published on AI Labs carries a confidence designation. These are not subjective feelings -- they correspond to specific evidence thresholds. The system is designed to prevent a common failure mode in forecasting: presenting speculative extrapolations with the same rhetorical weight as empirically grounded measurements.
HIGH CONFIDENCE
Requires direct empirical measurement from at least one peer-reviewed study or primary institutional dataset, corroborated by at least one independent secondary source. The claim must be falsifiable and the measurement methodology must be documented.
Examples: METR capability doubling trend (15 data points), MIT Iceberg viability ceiling (11.7%), BLS employment statistics, Acemoglu displacement-reinstatement framework, current AI cost ratios from published API pricing.
MEDIUM CONFIDENCE
Requires credible institutional research with documented methodology, or a trend extrapolation supported by 3+ data points with a plausible causal mechanism. Some interpretation or synthesis is involved, but the underlying data is strong.
Examples: Goldman Sachs 6-7% displacement estimate, recursive self-improvement multiplier (based on Anthropic statements + ML research trends), V4 regulatory delay estimates, near-term adoption speed projections.
LOW CONFIDENCE
Requires only a plausible causal argument supported by historical analogy or limited trend data. These are scenario-level projections, not forecasts. Low-confidence claims must be explicitly labeled as speculative in all published analysis.
Examples: Hyperbolic curve geometry by 2027-28, 50% displacement by 2029 (recursive scenario), labor market reabsorption capacity (V6), social resistance threshold effects, physical bottleneck erosion timeline.
CALIBRATION ACCOUNTABILITY
We track our confidence calibration accuracy over time. If claims designated "high confidence" prove wrong more than 10% of the time, or "medium confidence" claims prove wrong more than 40% of the time, the calibration system itself needs recalibration. This is tracked in our quarterly deep reviews and published transparently.
05. Update Cadence
WHEN AND HOW PROJECTIONS ARE REVISED
Forecasting models are only as good as their maintenance discipline. Static projections degrade rapidly in a domain where the underlying dynamics shift on monthly timescales. AI Labs maintains a structured revision schedule with three tiers.
MONTHLY
Forecast Revisions
All V1-V9 parameters are re-evaluated against the latest available data. METR benchmark updates, BLS employment releases, and corporate AI deployment announcements are integrated. Parameter adjustments are documented with change logs. The displacement timeline table is regenerated. Published on the first week of each month.
QUARTERLY
Deep Reviews
Full model re-evaluation including structural assumptions, scenario definitions, and confidence calibration accuracy. Incorporates new academic publications, major government reports, and quarterly corporate earnings data. The quarterly review may produce scenario redefinitions, variable additions or removals, or methodology changes. Published as a standalone analysis article.
REAL-TIME
Breaking Adjustments
Material events that invalidate or significantly shift parameter assumptions trigger immediate model updates outside the scheduled cadence. Criteria: a single event that moves any V1-V9 parameter by more than 15% from its current setting. Examples: a new METR data point showing acceleration or deceleration, a major economy passing significant AI regulation, a frontier lab announcing a capability breakthrough or hitting a confirmed scaling wall.
All revisions include a change log documenting which parameters moved, in which direction, by how much, and why. Historical parameter values are preserved for audit and calibration tracking. We do not silently update projections -- every change is versioned and explained.
06. Caveats & Limitations
HONEST ASSESSMENT OF WHAT THIS MODEL DOES NOT AND CANNOT CAPTURE
No forecasting model is better than its weakest structural assumption. We are explicit about the limitations of this framework because intellectual honesty is more valuable than false precision. Users of these projections should weight them accordingly.
- No model has successfully predicted AI capability growth more than 18 months in advance. Every major forecast from 2020-2023 required significant upward revision. Our model inherits this fundamental limitation. Projections beyond 2028 should be treated as scenario explorations, not timeline commitments.
- The recursive self-improvement variable (V2) is the least empirically grounded parameter. While AI code self-authorship is documented, no peer-reviewed study has quantified the degree to which this compresses capability doubling periods. The hyperbolic curve geometry in the recursive scenario is a mathematical extrapolation of a plausible dynamic, not an observed phenomenon. It may encounter ceilings (architectural limits, training data constraints, compute/energy bottlenecks) that flatten the curve before hyperbolic dynamics manifest.
- The model does not capture new task creation with structural fidelity. V6 (Labor Market Elasticity) approximates reabsorption as a single parameter, but historical technology transitions created entirely new industries and job categories that were unforeseeable ex ante. If AI generates a comparable wave of new human-complementary tasks, our displacement projections will overstate actual unemployment. This is the strongest counterargument to our central projections, and we take it seriously.
- Feedback effects from displacement itself are modeled only as threshold triggers, not continuous dynamics. Mass unemployment reduces consumer spending, which reduces corporate revenue, which changes AI investment calculus. Political instability from displacement can produce regulatory responses that alter the adoption curve. These second-order effects are real but computationally intractable to model with current data. Our model captures them as discrete scenario switches, not smooth feedback loops.
- Geographic, sectoral, and demographic granularity is limited. V1-V9 are calibrated primarily against US data. International displacement timelines differ due to labor cost structures, regulatory environments, and infrastructure readiness. Within the US, displacement will not be uniform across sectors, regions, or demographic groups. Our model produces national aggregates; local reality will vary substantially.
- Historical precedent for forecast failures is sobering. The 2013 Frey & Osborne study estimated 47% of US jobs were at risk of automation -- a figure that became a media sensation but proved methodologically flawed (it measured task exposure, not economic viability or adoption probability). McKinsey's 2017 estimates required repeated revision. Our framework was designed to avoid these specific errors, but there are certainly errors we have not anticipated.
- The model assumes rational economic actors. In practice, organizations make suboptimal decisions due to political dynamics, sunk-cost fallacies, vendor relationships, and executive ego. These irrational frictions may slow adoption beyond what V3 captures. Conversely, herd behavior and competitive panic may accelerate adoption beyond rational equilibrium. Both directions of error are possible.
- Uncertainty ranges widen dramatically beyond 3-year horizons. Our 2026-2028 projections carry meaningful uncertainty bands (plus or minus 30-50%). Our 2029-2032 projections carry very wide uncertainty bands (plus or minus 100% or more on timing). Our 2033+ projections are scenario sketches, not forecasts. We present them because understanding the shape of possible futures is valuable even when precise timing is unknowable.
- Black swan events are not modeled. A major geopolitical conflict disrupting semiconductor supply chains, a fundamental breakthrough in AI safety that produces voluntary deployment constraints, a global financial crisis that collapses AI investment, or an unexpected technical breakthrough that leapfrogs current architectures -- any of these would invalidate model parameters in ways that cannot be predicted from existing data. The model maps the space of futures consistent with current trends. It does not map all possible futures.
THE HONEST BOTTOM LINE
Treat all projections on this site as structured scenario maps, not predictions. The value of this framework is not in telling you what year 50% displacement will occur. It is in making the structural logic visible -- showing which variables matter, how they interact, and what evidence would need to change to shift the outlook materially. If you find yourself citing a specific year from our projections as a settled fact, you are using the model wrong.