Lecture Twelve · 14 June 2026

12Lecture Twelve

AI Topic I —
Estimation &
Productivity Disruption

What changes about software economics when machines write much of the code — and what stubbornly doesn't.

Instructor

Dr. Zhijiang Chen

Session

No. 12 of 16

Date

14 June 2026

Room

SY109

Duration

110 minutes

Format

Lecture + Empirics

First of two AI lectures. Today: estimation. Tomorrow: token economics & ROI.

Lecture XIIAgenda02 / 20

Today's Plan

Re-calibrating SEE for the AI era.

Lectures 1–11 were the canon — written before LLMs. Today we ask what survives, what breaks, and what's new.

§	Topic	Minutes
I.	Why classical estimation breaks under AI assistance	15
II.	The productivity paradox: junior +10–30%, senior −19%	20
III.	New productivity metrics for AI-augmented teams	20
IV.	Adapting COCOMO II for AI assistance	15
—	Discussion: stress-test productivity claims	10
V.	Real case study: AI coding tool ROI in a 50-person org	20
	HW12, questions	10

Part — One

Why classical
estimation breaks.

§ IWhat breaks04 / 20

Three classical assumptions, three failures

Three pillars, all wobbling.

Classical assumption	Why it breaks under AI assistance
LOC ≈ effort	AI generates working code in minutes that would have taken hours. The LOC-to-PM curve is no longer stable.
FP ≈ scope	Function Points still measure scope correctly, but the conversion from FP to KSLOC, and from KSLOC to PM, has changed.
COCOMO calibration	The model was calibrated against 161 projects from a non-AI era. Effort multipliers (TOOL, APEX, PLEX) no longer span the right range.

Function Points still count the system's specification correctly. The downstream conversion to effort is what needs recalibration.

§ IHidden costs05 / 20

The new effort categories AI introduces

What AI assistance moves, not removes.

AI doesn't eliminate effort — it relocates it. Classical models miss these new categories:

Prompt engineering — crafting the prompts that drive code generation.
Verification — reading, testing, and validating generated code.
Integration overhead — fitting generated snippets into real architectures.
Drift management — re-validating outputs as model versions change.
Hallucination triage — catching plausible-looking but incorrect AI output.

A senior engineer can spend more time verifying AI output than they would have spent writing the code by hand. This is the verification overhead — the central puzzle of today's lecture.

Part — Two

The productivity paradox —
different effects for
different engineers.

§ IIEmpirical data07 / 20

2026 productivity benchmarks

One tool, three populations, three answers.

Population	Productivity delta	Why
Junior developers (0–2 yrs)	+10 to +30%	AI fills knowledge gaps; reduces idle research time.
Mid-level (3–6 yrs)	+5 to +15%	AI accelerates boilerplate; modest verification cost.
Senior (7+ yrs)	−19%	Verification overhead exceeds time saved on routine code.

Source: 2026 enterprise studies (Larridin Benchmarks; Exceeds.ai productivity paradox report). The senior slowdown is reproducible — not a statistical fluke.

"AI raises team productivity" is a true headline that hides a within-team rearrangement: AI lifts your junior staff and slows your senior staff. If your team is half senior, your headline gain is much smaller than the vendor claims.

§ IIWhy seniors slow down08 / 20

Verification overhead in detail

The senior engineer's tax.

A senior engineer doesn't accept AI output the way a junior does. They:

Read every generated line to confirm correctness.
Mentally check for security and performance issues the AI may have missed.
Verify the suggested approach matches the system's architecture.
Run extra tests because trust is not yet earned.

For a routine task, this verification can take longer than typing the code from memory would have. The net effect is negative — until the engineer either skips verification (risky) or learns to use AI on tasks where it has the most leverage (rarely the routine ones).

The economic question is not "should we adopt AI tools?" but "for which tasks, by which engineers, with what verification protocol?"

Part — Three

III

New metrics —
what to measure
when LOC is no longer trustworthy.

§ IIIOutcome metrics10 / 20

Five replacement candidates

From LOC to outcomes.

Metric	What it captures	Risk
PR-to-production cycle time	Speed from commit to deployment.	Cycle-time gaming.
Defect-free deploy frequency	Reliable changes per period.	Encourages small, safe changes.
Customer-visible features shipped	Value delivered.	Defining "feature" is hard.
Tickets resolved per engineer	Concrete progress on backlog.	Penalises hard problems.
Engineer-reported flow time	Subjective effectiveness.	Self-report bias.

No single replacement for LOC works alone. Use 2–3 metrics together, and re-evaluate quarterly — Goodhart's law applies fast.

Part — Four

Adapting COCOMO II —
which knobs to turn.

§ IVEffort multiplier adjustment12 / 20

A proposed re-calibration

The three EMs to modify.

Effort Multiplier	Classical range	Proposed AI-era range
TOOL — Tool support	0.78 – 1.17	0.65 – 1.10 (better tools, narrower range)
APEX — Application experience	0.81 – 1.22	0.70 – 1.30 (AI fills gaps; widens range)
PLEX — Platform experience	0.85 – 1.19	0.75 – 1.20 (same direction)

A possible new EM might be added: AIUSE — effective use of AI assistance — rated by team practice. Calibration data not yet sufficient; this is a research direction, not a recommendation.

Adjust EMs, but keep the model's structure. The power-law in size and the scale factors still describe reality. The AI era changes coefficients, not equations.

Discussion10 minutes13 / 20

A vendor claim, stress-tested

⚖

Vendor X claims their AI tool delivers "55% productivity gains". Where would you push back?

In pairs (4 min), interrogate the claim. What are the five questions you must ask before believing it?

Sample. Junior-heavy team? Tasks selected for AI strengths?
Metric. LOC/hour? PRs merged? Self-reported?
Counterfactual. Compared to what — last quarter, without AI, or a control group?
Duration. One-week pilot or a full quarter? Was the novelty boost still in play?
Hidden costs. Verification time, integration overhead, model-drift re-work — included?

Part — Five

A case study —
real adoption,
real numbers.

§ VCase study15 / 20

A 50-engineer SaaS company adopts AI coding tools (2026 data)

Mid-market SaaS — full ROI analysis.

Item	Per year
Licences (50 × $25/mo)	−$15,000
Token spend (50 × $80/mo avg)	−$48,000
Adoption / training (one-time, year 1)	−$24,000
Productivity gain — juniors (15 × +20% × $90K)	+$270,000
Productivity gain — mid (20 × +10% × $130K)	+$260,000
Productivity LOSS — seniors (15 × −19% × $180K)	−$513,000
Net Year 1	−$70,000

A naive headline of "55% productivity" would have projected $500K+ in gains. Disaggregated by seniority, the project is negative in year 1. Year 2 may turn positive once seniors adapt their workflow.

§ VMitigations16 / 20

How to make the senior tax shrink

Closing the senior productivity gap.

Restrict AI to leveraged tasks — code review preparation, doc generation, exploratory prototyping. Avoid on critical-path security or performance code.
Build verification habits — automated tests of AI output, prompt templates with built-in invariant checks.
Pair seniors with juniors — junior uses AI, senior verifies. Captures both populations' strengths.
Track outcomes, not lines — measure features shipped and defects avoided, not LOC.

Year-2 ROI typically turns positive once teams build deliberate AI-usage patterns. Year-1 ROI is mostly a measure of change-management quality, not tool quality.

BridgeTo Lecture 1317 / 20

Tomorrow: the cost side of AI products.

Today we looked at how AI changes the cost of building software. Tomorrow we look at how AI changes the cost of running it — tokens, caching, agent multipliers, payback periods.

HomeworkDue Lecture 1418 / 20

Homework 12 — due Tuesday 16 June

An honest AI ROI memo.

Build a year-1 ROI model for adopting an AI coding tool at a hypothetical 30-person company. Disaggregate by seniority — at least three buckets.
Identify the breakeven assumption (the one input that, if it changed, would flip the sign).
One paragraph: what change-management investment would you propose alongside the tool, and what's its NPV?

RecapWhat to remember19 / 20

What today bought you.

Function Points still measure scope correctly. Downstream conversions need re-calibration.
The productivity paradox: juniors faster, seniors slower. Headline gains hide redistribution.
The right question is not "should we adopt AI?" but "for which tasks, by which engineers, with what verification protocol?"

EndLecture Twelve20 / 20

Questions & conversation.

Dr. Zhijiang Chen
Software Engineering Economics · Summer 2026
frostburg-state-university.github.io/bju

AI Topic I —Estimation &Productivity Disruption

Re-calibrating SEE for the AI era.

Why classicalestimation breaks.

Three pillars, all wobbling.

What AI assistance moves, not removes.

The productivity paradox —different effects fordifferent engineers.

One tool, three populations, three answers.

The senior engineer's tax.

New metrics —what to measurewhen LOC is no longer trustworthy.

From LOC to outcomes.

Adapting COCOMO II —which knobs to turn.

The three EMs to modify.

Vendor X claims their AI tool delivers "55% productivity gains". Where would you push back?

A case study —real adoption,real numbers.

Mid-market SaaS — full ROI analysis.

Closing the senior productivity gap.

Tomorrow: the cost side of AI products.

An honest AI ROI memo.

What today bought you.

Questions & conversation.

AI Topic I —
Estimation &
Productivity Disruption

Why classical
estimation breaks.

The productivity paradox —
different effects for
different engineers.

New metrics —
what to measure
when LOC is no longer trustworthy.

Adapting COCOMO II —
which knobs to turn.

A case study —
real adoption,
real numbers.