Lecture Twelve · 14 June 2026
12Lecture Twelve

AI Topic I —
Estimation &
Productivity Disruption

What changes about software economics when machines write much of the code — and what stubbornly doesn't.

Instructor
Dr. Zhijiang Chen
Session
No. 12 of 16
Date
14 June 2026
Room
SY109
Duration
110 minutes
Format
Lecture + Empirics
First of two AI lectures. Today: estimation. Tomorrow: token economics & ROI.
Lecture XIIAgenda02 / 20
Today's Plan

Re-calibrating SEE for the AI era.

Lectures 1–11 were the canon — written before LLMs. Today we ask what survives, what breaks, and what's new.

§TopicMinutes
I.Why classical estimation breaks under AI assistance15
II.The productivity paradox: junior +10–30%, senior −19%20
III.New productivity metrics for AI-augmented teams20
IV.Adapting COCOMO II for AI assistance15
Discussion: stress-test productivity claims10
V.Real case study: AI coding tool ROI in a 50-person org20
HW12, questions10
Part — One
I

Why classical
estimation breaks.

§ IWhat breaks04 / 20
Three classical assumptions, three failures

Three pillars, all wobbling.

Classical assumptionWhy it breaks under AI assistance
LOC ≈ effortAI generates working code in minutes that would have taken hours. The LOC-to-PM curve is no longer stable.
FP ≈ scopeFunction Points still measure scope correctly, but the conversion from FP to KSLOC, and from KSLOC to PM, has changed.
COCOMO calibrationThe model was calibrated against 161 projects from a non-AI era. Effort multipliers (TOOL, APEX, PLEX) no longer span the right range.

Function Points still count the system's specification correctly. The downstream conversion to effort is what needs recalibration.

§ IHidden costs05 / 20
The new effort categories AI introduces

What AI assistance moves, not removes.

AI doesn't eliminate effort — it relocates it. Classical models miss these new categories:

  • Prompt engineering — crafting the prompts that drive code generation.
  • Verification — reading, testing, and validating generated code.
  • Integration overhead — fitting generated snippets into real architectures.
  • Drift management — re-validating outputs as model versions change.
  • Hallucination triage — catching plausible-looking but incorrect AI output.

A senior engineer can spend more time verifying AI output than they would have spent writing the code by hand. This is the verification overhead — the central puzzle of today's lecture.

Part — Two
II

The productivity paradox —
different effects for
different engineers.

§ IIEmpirical data07 / 20
2026 productivity benchmarks

One tool, three populations, three answers.

PopulationProductivity deltaWhy
Junior developers (0–2 yrs)+10 to +30%AI fills knowledge gaps; reduces idle research time.
Mid-level (3–6 yrs)+5 to +15%AI accelerates boilerplate; modest verification cost.
Senior (7+ yrs)−19%Verification overhead exceeds time saved on routine code.

Source: 2026 enterprise studies (Larridin Benchmarks; Exceeds.ai productivity paradox report). The senior slowdown is reproducible — not a statistical fluke.

"AI raises team productivity" is a true headline that hides a within-team rearrangement: AI lifts your junior staff and slows your senior staff. If your team is half senior, your headline gain is much smaller than the vendor claims.

§ IIWhy seniors slow down08 / 20
Verification overhead in detail

The senior engineer's tax.

A senior engineer doesn't accept AI output the way a junior does. They:

  • Read every generated line to confirm correctness.
  • Mentally check for security and performance issues the AI may have missed.
  • Verify the suggested approach matches the system's architecture.
  • Run extra tests because trust is not yet earned.

For a routine task, this verification can take longer than typing the code from memory would have. The net effect is negative — until the engineer either skips verification (risky) or learns to use AI on tasks where it has the most leverage (rarely the routine ones).

The economic question is not "should we adopt AI tools?" but "for which tasks, by which engineers, with what verification protocol?"

Part — Three
III

New metrics —
what to measure
when LOC is no longer trustworthy.

§ IIIOutcome metrics10 / 20
Five replacement candidates

From LOC to outcomes.

MetricWhat it capturesRisk
PR-to-production cycle timeSpeed from commit to deployment.Cycle-time gaming.
Defect-free deploy frequencyReliable changes per period.Encourages small, safe changes.
Customer-visible features shippedValue delivered.Defining "feature" is hard.
Tickets resolved per engineerConcrete progress on backlog.Penalises hard problems.
Engineer-reported flow timeSubjective effectiveness.Self-report bias.

No single replacement for LOC works alone. Use 2–3 metrics together, and re-evaluate quarterly — Goodhart's law applies fast.

Part — Four
IV

Adapting COCOMO II —
which knobs to turn.

§ IVEffort multiplier adjustment12 / 20
A proposed re-calibration

The three EMs to modify.

Effort MultiplierClassical rangeProposed AI-era range
TOOL — Tool support0.78 – 1.170.65 – 1.10 (better tools, narrower range)
APEX — Application experience0.81 – 1.220.70 – 1.30 (AI fills gaps; widens range)
PLEX — Platform experience0.85 – 1.190.75 – 1.20 (same direction)

A possible new EM might be added: AIUSE — effective use of AI assistance — rated by team practice. Calibration data not yet sufficient; this is a research direction, not a recommendation.

Adjust EMs, but keep the model's structure. The power-law in size and the scale factors still describe reality. The AI era changes coefficients, not equations.

Discussion10 minutes13 / 20
A vendor claim, stress-tested

Vendor X claims their AI tool delivers "55% productivity gains". Where would you push back?

In pairs (4 min), interrogate the claim. What are the five questions you must ask before believing it?

  • Sample. Junior-heavy team? Tasks selected for AI strengths?
  • Metric. LOC/hour? PRs merged? Self-reported?
  • Counterfactual. Compared to what — last quarter, without AI, or a control group?
  • Duration. One-week pilot or a full quarter? Was the novelty boost still in play?
  • Hidden costs. Verification time, integration overhead, model-drift re-work — included?
Part — Five
V

A case study —
real adoption,
real numbers.

§ VCase study15 / 20
A 50-engineer SaaS company adopts AI coding tools (2026 data)

Mid-market SaaS — full ROI analysis.

ItemPer year
Licences (50 × $25/mo)−$15,000
Token spend (50 × $80/mo avg)−$48,000
Adoption / training (one-time, year 1)−$24,000
Productivity gain — juniors (15 × +20% × $90K)+$270,000
Productivity gain — mid (20 × +10% × $130K)+$260,000
Productivity LOSS — seniors (15 × −19% × $180K)−$513,000
Net Year 1−$70,000

A naive headline of "55% productivity" would have projected $500K+ in gains. Disaggregated by seniority, the project is negative in year 1. Year 2 may turn positive once seniors adapt their workflow.

§ VMitigations16 / 20
How to make the senior tax shrink

Closing the senior productivity gap.

  • Restrict AI to leveraged tasks — code review preparation, doc generation, exploratory prototyping. Avoid on critical-path security or performance code.
  • Build verification habits — automated tests of AI output, prompt templates with built-in invariant checks.
  • Pair seniors with juniors — junior uses AI, senior verifies. Captures both populations' strengths.
  • Track outcomes, not lines — measure features shipped and defects avoided, not LOC.

Year-2 ROI typically turns positive once teams build deliberate AI-usage patterns. Year-1 ROI is mostly a measure of change-management quality, not tool quality.

BridgeTo Lecture 1317 / 20

Tomorrow: the cost side of AI products.

Today we looked at how AI changes the cost of building software. Tomorrow we look at how AI changes the cost of running it — tokens, caching, agent multipliers, payback periods.

HomeworkDue Lecture 1418 / 20
Homework 12 — due Tuesday 16 June

An honest AI ROI memo.

  1. Build a year-1 ROI model for adopting an AI coding tool at a hypothetical 30-person company. Disaggregate by seniority — at least three buckets.
  2. Identify the breakeven assumption (the one input that, if it changed, would flip the sign).
  3. One paragraph: what change-management investment would you propose alongside the tool, and what's its NPV?
RecapWhat to remember19 / 20

What today bought you.

  1. Function Points still measure scope correctly. Downstream conversions need re-calibration.
  2. The productivity paradox: juniors faster, seniors slower. Headline gains hide redistribution.
  3. The right question is not "should we adopt AI?" but "for which tasks, by which engineers, with what verification protocol?"
EndLecture Twelve20 / 20
&

Questions & conversation.

Dr. Zhijiang Chen
Software Engineering Economics · Summer 2026
frostburg-state-university.github.io/bju