Evaluation & Systems Thinking

POSC 315 — Lecture 13.1

What Is Policy Evaluation?

  • Evaluation: systematic inquiry into a public program’s merit, worth, or significance.
  • Core aim → determine whether observed changes can credibly be attributed to the policy, not to chance or outside shocks.
  • It’s research in service of learning, accountability, and improvement.

The Big Question

Did the program generate a significant and positive impact on its target population that would not have occurred otherwise?

Requires counterfactual thinking — what would the world look like in the program’s absence?

Why It Matters — Stakeholders

  • Taxpayers: want value for money.
  • Program Beneficiaries: care about real outcomes.
  • Program Managers: need feedback to refine operations.
  • Elected Officials: must justify budgets and defend choices.
  • Analysts & Scholars: build evidence and theory.

A Systems‑Thinking Lens

Component Guiding Question Example: SNAP
Inputs What resources flow in? Federal $$, state admin staff
Activities What does the program do? Issue EBT cards, certify eligibility
Outputs Immediate products Number of households served
Outcomes Short‑term results Reduced food insecurity
Impacts Long‑run effects Improved child health metrics
Feedback Signals for adaptation Error‑rate audits feed rule tweaks

Evaluation Designs — Overview

We choose a design based on data, ethics, and practicality.

  1. Before‑and‑After
  2. After‑Only
  3. With‑and‑Without (Quasi‑Experiment)
  4. Time‑Series

Before‑and‑After

  • Observe outcome O₀ pre‑program.
  • Implement policy.
  • Observe outcome O₁ post‑program.
  • Attribute Δ = O₁ – O₀ to policy if rival explanations ruled out (rare in practice).

After‑Only

  • Snap‑shot post‑implementation; no baseline.
  • Cheap, fast, but weakest inference.

With‑and‑Without

  • Treatment vs Control groups.
  • Approximates counterfactual when random assignment isn’t feasible.
  • Methods: matching, difference‑in‑differences, synthetic controls.

Time‑Series

  • Long string of observations before and after policy.
  • Detects trends, seasonality, structural breaks.
  • Gold standard when randomization impossible and data rich (e.g., crime rates after ban).

Quick Design Comparison

Design Strength Key Limitation
Before‑After Simple Confounding trends
After‑Only Low cost No baseline
With‑Without Strong causal claim Need comparable control
Time‑Series Detects gradual effects Data intensive

Takeaways for Practitioners

  • Pick the strongest feasible design given constraints.
  • Plan evaluation before rollout; retrofit = expensive.
  • Combine quantitative + qualitative evidence to capture mechanism.

Questions?

Next deck: digging into the questions evaluators ask and the philosophical roots of policy science.