Lecture 8 — Cost Estimation I: Size Metrics & Function Points¶
Session Info
Date: Wednesday, 2026-06-10 · Time: 16:20–18:10 · Room: YF302 Instructor: Dr. Zhijiang Chen
▶ Open the lecture slides — 110-minute web deck
Using the slide deck
The slides are a self-contained web presentation (Reveal.js, no install). Keyboard shortcuts: F for full-screen, S for speaker view, Esc for slide overview, arrow keys to navigate. Designed for a single 110-minute session.
Learning Objectives¶
By the end of this lecture, you should be able to:
- Explain why size estimation is unavoidable for any project that has a budget, a schedule, or a price — and why it is empirically hard, with industry error ranges of 2× to 4×.
- State the strengths and weaknesses of Lines of Code (LOC) as a size metric, and explain why the rise of AI-assisted code generation in 2025–2026 has broken the LOC ↔ effort link that 1980s estimation relied on.
- Count Unadjusted Function Points (UFP) for a small specification by identifying the five IFPUG function types — EI, EO, EQ, ILF, EIF — and applying the standard low/average/high complexity weights.
- Apply the Value Adjustment Factor (VAF) using the 14 General System Characteristics to convert UFP into Adjusted Function Points (AFP).
- Compare Use-Case Points (Karner 1993) to Function Points and choose one consistently across a project's alternatives.
- Recognise and counter the three dominant estimation biases — optimism, anchoring, and the planning fallacy — and explain why a "method without a bias filter" is still a wrong number.
- Read the Group Project brief released today and assemble a 3–4 person team with a candidate product by Lecture 11.
Lecture Map — Six Movements, One Number¶
By the end of today you should be able to look at any software specification — yours or someone else's — and produce a defensible number for its size, knowing exactly which assumptions you have made and where the number is fragile.
| § | Topic | Minutes |
|---|---|---|
| I | Why estimate? Why is it hard? | 10 |
| II | LOC — uses and abuses | 10 |
| III | IFPUG Function Points — the five types and the worked count | 30 |
| IV | Value Adjustment Factor (VAF) and the 14 GSC | 10 |
| V | Use-Case Points and modern variants | 10 |
| — | Discussion — "How many Function Points is WeChat?" | 10 |
| VI | Estimation biases — and the group project | 25 |
| — | Homework 8, Q&A | 5 |
Prerequisite from Lectures 2–4
This lecture turns the cost vocabulary of Lecture 2 and the cash-flow shapes of Lecture 4 into a number you can feed into a spreadsheet. From Lecture 9 onward (COCOMO II, NPV/IRR comparisons), every model assumes you have a size estimate. Today is where that number first appears.
§ I. Why Estimate — and Why We Are So Bad At It¶
Every economic model in this course begins with a number that someone made up. Net present value, internal rate of return, payback, profitability index — all of them take a cost stream as input. Without estimation, there is no economics. And yet, estimation is one of the few engineering activities where being wrong by a factor of two is considered routine.
Four reasons estimation is unavoidable¶
| Reason | What we need a number for | Failure mode if we skip it |
|---|---|---|
| Plan the work | Schedule, staffing, dependency graph, critical path | "When will it ship?" → an honest answer is "I have no idea" |
| Budget the work | Funding requests, capital allocation, runway planning | Out-of-money before out-of-features |
| Price the work | Fixed-price contracts, internal chargeback, customer quotes | Sign a contract that loses money on every unit |
| Manage the work | Track progress vs plan; signal when to escalate | No baseline → no variance → no early warning |
Even a wrong estimate, used honestly, is more useful than no estimate. The estimate is the baseline; the deviation from the baseline is the signal that lets a manager act. A team with no estimate has no signal — they only know they are late after the deadline has already passed.
Why it is so hard — Brooks's verdict¶
"More software projects have gone awry for lack of calendar time than for all other causes combined." — Fred Brooks, The Mythical Man-Month (1975)
Brooks identified five reasons software estimation is structurally harder than estimating, say, a bridge:
- Software is invented, not assembled. Every project does something at least slightly novel; calibration data is thinner than in civil engineering.
- Effort is non-linear in size. Doubling the size more than doubles the effort, because communication paths grow as \(n(n-1)/2\).
- Productivity varies 10×–25× between engineers — yet estimators routinely assume an "average" rate.
- Requirements churn during the build. The thing you are estimating is changing while you estimate it.
- Optimism is endemic. The person making the estimate is usually the person who wants the project to happen.
Industry studies (Standish Group, McConnell, Jørgensen) consistently report estimation errors of 2× to 4× as typical, and 10× as not unheard of. The honest position: an estimate is a probability distribution, not a single number — and our job today is to produce a distribution we can defend.
§ II. Lines of Code — Uses and Abuses¶
The oldest size metric in software is Lines of Code (LOC), usually counted as non-comment, non-blank source lines, sometimes called SLOC or KSLOC (thousands of source lines). It is also the metric that has caused the most harm.
Strengths and weaknesses¶
| Strengths | Weaknesses |
|---|---|
Trivial to count after the fact with a wc -l and a comment filter. |
Rewards verbosity. A 20-line solution scores worse than a 200-line equivalent. |
| Decades of calibration data. COCOMO, COCOMO II, SLIM, and most pre-2000 cost models speak LOC fluently. | Language-dependent. 100 lines of Rust does very different work from 100 lines of Python or 100 lines of COBOL. |
| Useful as the input to COCOMO II (Lecture 9). | Says nothing about feature value, complexity, or user-visible behaviour. |
| Easy to communicate to non-technical stakeholders. | You cannot count LOC of code that has not been written. As an upstream estimate, it is guessing. |
The AI-era complication — the LOC ↔ effort link is broken¶
For forty years, COCOMO-style models assumed a roughly stable conversion: so many engineer-months produce so many KSLOC, and so many KSLOC require so many engineer-months. That relationship was always noisy, but it was usable.
In 2026, the relationship has been broken by AI-assisted code generation. A pair of typical observations on real teams:
- A developer using a competent coding assistant can produce 3× to 10× more LOC per hour on routine code — boilerplate, CRUD endpoints, test scaffolding.
- The same developer produces roughly the same amount of LOC per hour on novel, architectural, or debugging-heavy work.
- A concise human author may now ship fewer LOC than ever to deliver a given feature, because the assistant suggests refactors that reduce duplication.
The result: KSLOC has become an even worse proxy for effort than it was. A team that ships 50 KSLOC in 2026 might have done two months of work, or eight months of work, or six weeks of work with extensive AI generation. The number alone tells you nothing.
Working rule for this course
Use LOC as a downstream output that feeds COCOMO II — never as an upstream estimate of unwritten code. Estimating in LOC before any code exists is guessing dressed up as engineering.
This is the deep reason Function Points exist: they measure what the system does, not how many lines it takes to do it.
§ III. IFPUG Function Points — Size From the Specification¶
In 1979, Allan Albrecht at IBM proposed a size metric that is language-neutral and computable from a specification before any code is written. The International Function Point Users Group (IFPUG) standardised it, and the IFPUG counting practices manual is still maintained today.
What you count: five function types¶
The IFPUG count enumerates five categories of system function, then weights each by complexity (low / average / high).
| Type | What it is | Examples | Weight (L/A/H) |
|---|---|---|---|
| EI — External Input | Data crossing the system boundary into the application, maintaining an internal file. | "Add book", "submit order", "update profile" | 3 / 4 / 6 |
| EO — External Output | Data leaving the system that involves derivation (calculations, summaries, formatting). | Monthly sales report, computed analytics dashboard | 4 / 5 / 7 |
| EQ — External Inquiry | A direct read that returns data without derivation. | "Search catalogue", "look up customer by ID" | 3 / 4 / 6 |
| ILF — Internal Logical File | A logical group of data the application maintains itself. | Customer master, order table, audit log | 7 / 10 / 15 |
| EIF — External Interface File | A logical group of data the application reads but does not maintain — owned by another system. | Currency-rate feed, partner catalogue, third-party identity provider | 5 / 7 / 10 |
Complexity (L/A/H) is determined by data element types (DETs) and record/file types (RETs/FTRs) referenced — the IFPUG manual has lookup tables, but for course purposes the average weight is the safe default unless a function is clearly trivial or clearly elaborate.
The Unadjusted Function Point (UFP) total is simply:
Worked count — a small library-management module¶
We illustrate with an eight-function library module — small enough to read in one screen, complete enough to exercise every function type.
| # | Function | Type | Complexity | Weight | UFP |
|---|---|---|---|---|---|
| 1 | Add book | EI | average | 4 | 4 |
| 2 | Search catalogue | EQ | average | 4 | 4 |
| 3 | Borrow book | EI | average | 4 | 4 |
| 4 | Generate monthly report | EO | high | 7 | 7 |
| 5 | Catalogue (book master) | ILF | average | 10 | 10 |
| 6 | Member roster | ILF | low | 7 | 7 |
| 7 | Borrowing log | ILF | average | 10 | 10 |
| 8 | External vendor catalogue feed | EIF | average | 7 | 7 |
| Total UFP | 53 |
How to read the count.
- Add book and Borrow book are EI — data flows in across the boundary and maintains an internal file (the catalogue and the borrowing log). Average DET/FTR counts → weight 4.
- Search catalogue is EQ — direct read, no derivation, no calculation. Weight 4.
- Monthly report is EO, not EQ, because it derives new data: totals, averages, top-borrower lists. Elaborate → weight 7 (high).
- Catalogue, Member roster, Borrowing log are ILF — the system owns and updates them. The catalogue has many DETs (title, author, ISBN, year, publisher, shelf …) so it earns the average weight 10; the member roster is leaner → low, weight 7.
- Vendor catalogue feed is EIF, not ILF, because the library only reads it; the vendor owns it. Weight 7 (average).
The total is 53 UFP — a small module. For perspective, a typical departmental web app runs 200–500 UFP; a serious enterprise system runs 2,000–10,000 UFP.
Why this works as an estimate¶
The genius of Function Points is that every quantity in the count comes from the specification — no code required. If you have a use-case list, a screen mockup, a data dictionary, and an interface inventory, you have everything you need. That is why FP estimates can be made at the proposal stage, when LOC estimates are pure fantasy.
§ IV. Value Adjustment Factor — Tuning the Count for Context¶
The UFP captures what the system does. It does not capture how the system has to do it — whether it must be real-time, distributed, highly available, easy to install, or change-friendly. The Value Adjustment Factor (VAF) is IFPUG's correction for these context factors.
The formula¶
Each of the 14 General System Characteristics is rated 0 (no influence) through 5 (essential). The sum ranges 0–70, so VAF ranges from 0.65 (no characteristic matters) to 1.35 (every characteristic is essential) — a ±35% adjustment.
The 14 General System Characteristics¶
| # | GSC | What it asks |
|---|---|---|
| 1 | Data communications | How much data crosses networks? |
| 2 | Distributed data processing | Is processing split across nodes? |
| 3 | Performance | Are response-time targets stringent? |
| 4 | Heavily used configuration | Does the system run hot — saturated infra? |
| 5 | Transaction rate | High volumes per second/day? |
| 6 | Online data entry | Interactive forms vs batch? |
| 7 | End-user efficiency | Is UX optimisation a priority? |
| 8 | Online update | Real-time maintenance of ILFs? |
| 9 | Complex processing | Heavy logic, math, state machines? |
| 10 | Reusability | Is code intended for reuse elsewhere? |
| 11 | Installation ease | Must installation be turnkey? |
| 12 | Operational ease | Easy to operate, backup, recover? |
| 13 | Multiple sites | Many deployments, many environments? |
| 14 | Facilitate change | Is the system designed for evolution? |
Worked AFP for the library module¶
Suppose for our library module the GSC ratings sum to 35 (a fairly average system: moderate performance, some online efficiency, some reusability, mild change pressure). Then:
By construction, VAF = 1.00 means a "typical" system. A safety-critical, high-throughput, multi-site rewrite might score VAF = 1.30, pushing 53 UFP to 69 AFP. A simple internal tool might score VAF = 0.80, pulling 53 UFP down to 42 AFP.
Modern IFPUG counts the GSCs as optional
Newer IFPUG releases (4.3+) treat VAF as informational rather than mandatory — many counters report UFP and leave context adjustment to the downstream effort model. For this course, keep VAF in the workflow: it forces you to name the context factors, which is itself a useful discipline.
§ V. Use-Case Points — A Brief Comparison¶
In 1993, Gustav Karner at Objectory AB (later part of Rational) proposed Use-Case Points (UCP) as an FP analogue for OO projects whose specification lives in use cases and class diagrams rather than data-flow diagrams. UCP has the same shape as FP — count something, weight it, then adjust.
How UCP differs from FP¶
| Function Points (IFPUG) | Use-Case Points (Karner) | |
|---|---|---|
| What you count | Functions: EI, EO, EQ, ILF, EIF | Actors (simple/average/complex by interaction style) + use cases (by number of transactions/steps) |
| Specification artefact assumed | Data-oriented spec, screen + file inventory | Use-case diagram + textual scenarios |
| Adjustment | VAF from 14 GSCs | TCF (Technical Complexity Factor, 13 items) × ECF (Environmental Complexity Factor, 8 items) |
| Final number | AFP | UUCP → UCP |
| Best fit | Information systems, CRUD-heavy, line-of-business apps | OO-modelled projects, customer-facing workflows, agile teams |
Empirically, FP and UCP correlate well within a single domain but are not directly interchangeable across domains. Different teams will get systematically different numbers using each method — that is fine as long as your alternatives are counted consistently.
For the group project
Choose FP or UCP, not both. Whichever you pick, use the same method on every alternative your team analyses. Consistency across alternatives matters far more than which method you chose.
§ VI. Estimation Biases — Why Method Alone Is Not Enough¶
A correct counting method, applied through an uncorrected optimism filter, is still a wrong number. Three biases reliably eat estimation accuracy.
1. Optimism bias¶
The estimator imagines the happy path — every API works, the team is fully staffed, requirements are stable, no one quits, the cloud bill is what the calculator says. Reality is the sum of small disasters: an integration that takes two weeks instead of two days, a security review that adds a sprint, a key engineer on parental leave. The output is consistently too low.
Counter: Reference-class forecasting (Kahneman, Flyvbjerg). Find five comparable past projects from your organisation or the literature. Use their actual durations as the base rate; adjust only modestly from there.
2. Anchoring bias¶
The first number spoken in the room — usually by the most senior person, often before any analysis — becomes the anchor. Subsequent analysis only nudges the estimate by 10–20%. If the CEO said "we ship by July", every estimate that follows is mysteriously close to "we ship by July."
Counter: Estimate before discussing. Each team member writes a number on an index card before hearing anyone else's. Reveal simultaneously. Discuss the spread, not the leader's number.
3. Planning fallacy¶
A specific case of optimism: the belief that "this time will be different" — this project won't suffer the integration delays, the scope creep, the requirements churn that every previous project suffered. It almost always does.
Counter: Add a buffer of 25%–50% to the bottom-up estimate and refuse to compress it without a structural reason. The buffer is not a fudge; it is the empirical adjustment for the planning fallacy. A team that delivers a "25%-buffered" estimate on time will be trusted next quarter; a team that hits the unbuffered estimate is just lucky.
The combined effect¶
In practice, these biases stack. A team that picks the happy-path number, anchored to a leadership target, with no buffer for "this time will be different", produces an estimate that is 2× to 4× too low — the canonical industry error range. The method (FP, UCP, COCOMO) explains some of the gap, but never all of it. Bias correction is half the work of estimation.
Group Project Brief — Released Today¶
Today, Wednesday 2026-06-10, your group-project assignment goes live. The project carries 30% of your course grade and is the place you will put every concept from Lectures 1–13 to work on a problem of your choice.
Team formation¶
- Teams of 3 to 4 students. Solo work and pairs are not accepted; teams of 5 require instructor approval.
- Teams self-organise. A signup sheet is on the course site and in this room today.
- Teams must be confirmed by Lecture 11 (Saturday 2026-06-13). Late team registration delays you from feedback in the Lec 11 office hours.
Step 1 — Choose a software product¶
The product can be real or hypothetical:
- An app, service, or platform you would build (e.g. a campus food-delivery app, an AI tutor, an open-source dev tool).
- An existing product you would re-engineer (e.g. modernise a 1990s ERP, port a desktop app to web).
- A specific internal tool for a real organisation (with the org's consent).
The product must be complex enough to require ≥ 200 Function Points and simple enough to specify in 2 pages. Avoid "build the next Facebook" — pick something narrower and analysable.
Step 2 — Required deliverables (all submitted as one PDF report)¶
| # | Deliverable | Lecture connection |
|---|---|---|
| 1 | Project description and scope (1–2 pages) — what it does, who uses it, what's out of scope. | Lec 1, Lec 8 |
| 2 | Size estimate using FP or COCOMO II (your choice). Show every function, weight, GSC. | Lec 8, Lec 9 |
| 3 | Cash-flow forecast for ≥ 2 alternatives. E.g. build-vs-buy, self-host-vs-SaaS, with-AI vs without-AI. | Lec 4 |
| 4 | NPV, IRR, payback, and PI for each alternative. State your MARR and justify it. | Lec 5, Lec 6 |
| 5 | Sensitivity or risk analysis. Tornado diagram, decision tree, or Monte Carlo on the two largest cost drivers. | Lec 7 |
| 6 | AI-cost bonus (optional, +10%) — model token spend, productivity uplift, and human-verification cost in your alternatives. | Lec 12, Lec 13 |
| 7 | Final recommendation with stakeholder justification (1 page). Who decides, on what basis, with what residual risk. | All |
Step 3 — Presentation¶
- Dates: Lectures 14 (Tuesday 2026-06-16) and 15 (Wednesday 2026-06-17). Half the teams present each day; slot allocation by random draw at Lec 11.
- Format per team: 12 minutes of presentation + 5 minutes of Q&A + 3 minutes of written peer evaluation by the class. Hard stop at 20 minutes total.
- Slides: any tool, but submit as PDF the morning of your slot.
Rubric¶
The project is graded on four criteria, each weighted 25%:
| Criterion | What earns full marks |
|---|---|
| Clarity of scope and stakeholders | Every reader can answer "what is this, for whom, and against what alternative?" within 60 seconds of reading. |
| Methodology rigour | FP/COCOMO count is reproducible. NPV/IRR computation is correct. MARR is stated and justified. Unit consistency is maintained. |
| Sensitivity and risk | At least two cost drivers tested. The team can articulate the break-even values where the decision flips. |
| Recommendation quality | The recommendation is defensible to a real stakeholder — not a hedge, not "it depends." Residual risks are named and accepted. |
Schedule at a glance¶
| Date | Milestone |
|---|---|
| Wed 2026-06-10 (today, Lec 8) | Brief released, teams begin forming. |
| Sat 2026-06-13 (Lec 11) | Teams confirmed. Office hours for product-scope check. |
| Mon 2026-06-15 (Lec 13) | Last in-class checkpoint; draft alternatives reviewed. |
| Tue 2026-06-16 (Lec 14) | First half of presentations. |
| Wed 2026-06-17 (Lec 15) | Second half of presentations. |
| Thu 2026-06-18 (Lec 16 — final) | Final report PDF due 09:00, before the final exam. |
The brief PDF, rubric, and signup form are at projects/group/brief.pdf on the course site.
Class Discussion — How Many Function Points Is WeChat?¶
Open floor, 10 minutes, in pairs.
WeChat is a familiar product to most of you and ridiculously over-scoped for a classroom exercise. Pick a target between you of 15 representative functions (you will miss thousands — that is the point). Tag each with one of the five IFPUG types and an honest complexity guess. Sum the UFP.
Suggested starting points (don't restrict yourselves to these):
| Function | Possible classification |
|---|---|
| Send text message | EI |
| View chat history | EQ |
| Make voice call | EI / EO |
| Pay via WeChat Pay | EI (high) |
| Friend list | ILF |
| Moments feed | EO |
| Mini-program launcher | EI + EQ |
| Currency rate (for cross-border pay) | EIF |
After 4 minutes, compare with the team next to you. Are your totals within 30%? If yes, your team's instincts are calibrated; if no, which functions did you classify differently? The exercise is not about the right answer — it is about discovering where your estimating intuition diverges from a peer's, so that you can negotiate a shared baseline before any money is at stake.
Homework 8 — Count Your Group Project¶
Due: Friday 2026-06-12, before Lecture 10. Late: −10% per day.
- Choose your group-project product (or a candidate, if your team is not finalised). Write a one-page scope — what it does, who uses it, what is out of scope.
- Enumerate the functions. Identify every EI, EO, EQ, ILF, EIF you can. Aim for at least 20 functions.
- Assign weights (L/A/H) for each function. Justify each non-average choice in one sentence.
- Compute UFP as a table; sum to a total.
- Apply VAF. Score the 14 GSCs for your system. Show the sum, the VAF, and the resulting AFP.
- Reflection (half page): which three functions in your count are you least confident about, and what evidence would resolve them?
Bring your AFP number to Lecture 9. We will convert AFP → KSLOC using language-conversion ratios, then run it through COCOMO II for an effort and schedule estimate.
Submission. A single PDF (spreadsheet export + the paragraph) committed to submissions/HW8/<your-name>.pdf.
Office hours. Tomorrow, 10:00–11:30, before Lecture 9.
Key Vocabulary — Quick Reference¶
| Term | Plain-language meaning |
|---|---|
| LOC / SLOC / KSLOC | Lines of source code, in ones or thousands; a lagging size measure. |
| Function Points (FP) | A language-neutral size measure computable from a specification. |
| UFP | Unadjusted Function Points — sum of weighted function counts. |
| AFP | Adjusted Function Points — UFP × VAF. |
| EI / EO / EQ | External Input / Output / Inquiry — three transaction function types. |
| ILF / EIF | Internal Logical File / External Interface File — two data function types. |
| VAF | Value Adjustment Factor; ranges 0.65–1.35; depends on 14 GSCs. |
| GSC | One of fourteen General System Characteristics, each rated 0–5. |
| UCP | Use-Case Points (Karner 1993); FP analogue for OO/use-case projects. |
| Optimism bias | The estimator imagines the happy path; the cure is reference-class forecasting. |
| Anchoring bias | The first number spoken sticks; the cure is silent simultaneous estimation. |
| Planning fallacy | "This time will be different"; the cure is a 25%–50% buffer. |
What Today Bought You¶
- LOC is a lagging measure, not a forward-looking one — and the AI-coding era has broken the link between LOC and effort that 1980s estimation depended on.
- Function Points let you size a system from its specification, before any code is written, in a language-neutral way. The five types — EI, EO, EQ, ILF, EIF — and the L/A/H weights are the entire vocabulary.
- VAF turns "what it does" into "what it does in this context." The 14 GSCs force you to name the non-functional pressures that an unadjusted count ignores.
- Bias is the dominant error. A correct method applied through an optimism filter is still a wrong number; reference-class forecasting and a buffer are part of the method.
- You now have the input to COCOMO II. Tomorrow's lecture turns your AFP into an effort estimate (person-months), a schedule, and a staffing curve.
Further Reading¶
- Albrecht, A. J. (1979). "Measuring application development productivity." IBM Conference on Application Development — the original Function Points paper.
- IFPUG (2010). Function Point Counting Practices Manual, Release 4.3.1 — the authoritative reference; available through IFPUG membership and major university libraries.
- Karner, G. (1993). "Resource estimation for objectory projects." Objectory Systems internal report — the original Use-Case Points proposal.
- Brooks, F. P. (1975, 1995 anniversary ed.). The Mythical Man-Month. Addison-Wesley — Chapter 2 on the optimism of programmers and Brooks's Law on adding people to a late project.
- Jørgensen, M., & Shepperd, M. (2007). "A systematic review of software development cost estimation studies." IEEE Trans. Software Engineering — meta-analysis of estimation accuracy across 304 studies.
- Boehm, B. W. (1981). Software Engineering Economics. Prentice Hall — Chapter 22 on size estimation.
- McConnell, S. (2006). Software Estimation: Demystifying the Black Art. Microsoft Press — practitioner-friendly, with the most usable treatment of bias correction.
- Kahneman, D. (2011). Thinking, Fast and Slow — Chapter 23 on the planning fallacy and reference-class forecasting.
- (Forward pointer) Lecture 9 — COCOMO II uses your AFP (converted to KSLOC) to compute effort, schedule, and team size.
Prepared by Dr. Zhijiang Chen — Frostburg State University, Summer 2026.