The Phillips Curve: From Fixed Pipes and Automatons to a New Notion of Economic Equilibrium¶
Time-Varying Parameters, Stochastic Volatility, and Information-Theoretic Diagnostics¶
In 1958, A.W. Phillips — a New Zealand-born engineer who survived Japanese captivity as a prisoner of war and built a hydraulic computer from tanks, pipes, and colored water to simulate the British economy — published a paper showing that low unemployment correlated with high wage inflation. For a generation, policymakers treated it as a dial they could turn. Then came the 1970s, and the dial broke.
This notebook fits a time-varying parameter Phillips Curve with stochastic volatility to 64 years of US data using PyJAGS (Gibbs sampling), then applies the information-theoretic diagnostics from the divergence package to measure what the data taught us, decompose the uncertainty, and flag the years when the model was most surprised.
Why Gibbs sampling? This model class — state-space models with random-walk parameters — is the home turf of Gibbs sampling. The conditional distributions are conjugate: given the parameters, the states are linear-Gaussian (exact forward-filtering backward-sampling). Given the states, the parameters have closed-form conditionals. Gibbs exploits this structure directly. HMC/NUTS treats the same problem as a 195-dimensional black box, struggles with the funnel geometry between scale parameters and increments, and requires non-centered parameterizations and careful tuning. On this problem, PyJAGS runs in 2 seconds. NumPyro NUTS on GPU didn't finish in 30 minutes.
This is not a universal statement — HMC excels for non-conjugate models and can detect multimodality that Gibbs misses (Farkas & Tatar, 2020). But for TVP state-space models, the Bayesian macroeconometrics literature (Primiceri 2005, Cogley & Sargent 2005, Nakajima 2011) uses Gibbs for good reason.
Prerequisites: Bayesian Diagnostics (Notebook 6) for information gain and uncertainty decomposition. Real-World Applications (Notebook 7) for the static Phillips Curve analysis.
import numpy as np
import matplotlib.pyplot as plt
import pyjags
import arviz as az
import time
from pathlib import Path
plt.rcParams.update({
'figure.figsize': (12, 5),
'axes.spines.top': False,
'axes.spines.right': False,
'font.size': 12,
})
np.random.seed(42)
FIGURES_DIR = Path('figures/phillips_curve_tvp')
FIGURES_DIR.mkdir(parents=True, exist_ok=True)
1. The Data: 64 Years of Unemployment and Inflation¶
Annual US data from 1960 to 2023. Unemployment is the civilian unemployment rate; inflation is CPI-based. Both from FRED, hardcoded here to avoid API dependencies.
years = np.arange(1960, 2024)
T = len(years)
unemployment = np.array([
5.5, 6.7, 5.5, 5.7, 5.2, 4.5, 3.8, 3.8, 3.6, 3.5, # 1960-69
4.9, 5.9, 5.6, 4.9, 5.6, 8.5, 7.7, 7.1, 6.1, 5.8, # 1970-79
7.1, 7.6, 9.7, 9.6, 7.5, 7.2, 7.0, 6.2, 5.5, 5.3, # 1980-89
5.6, 6.8, 7.5, 6.9, 6.1, 5.6, 5.4, 4.9, 4.5, 4.2, # 1990-99
4.0, 4.7, 5.8, 6.0, 5.5, 5.1, 4.6, 4.6, 5.8, 9.3, # 2000-09
9.6, 8.9, 8.1, 7.4, 6.2, 5.3, 4.9, 4.4, 3.9, 3.7, # 2010-19
8.1, 5.4, 3.6, 3.6, # 2020-23
])
inflation = np.array([
1.7, 1.0, 1.2, 1.3, 1.3, 1.6, 2.9, 3.1, 4.2, 5.5, # 1960-69
5.7, 4.4, 3.2, 6.2, 11.0, 9.1, 5.8, 6.5, 7.6, 11.3, # 1970-79
13.5, 10.3, 6.2, 3.2, 4.3, 3.6, 1.9, 3.6, 4.1, 4.8, # 1980-89
5.4, 4.2, 3.0, 3.0, 2.6, 2.8, 3.0, 2.3, 1.6, 2.2, # 1990-99
3.4, 2.8, 1.6, 2.3, 2.7, 3.4, 3.2, 2.8, 3.8, -0.4, # 2000-09
1.6, 3.2, 2.1, 1.5, 1.6, 0.1, 1.3, 2.1, 2.4, 1.8, # 2010-19
1.2, 4.7, 8.0, 4.1, # 2020-23
])
print(f'Data: {T} annual observations, {years[0]}-{years[-1]}')
print(f'Unemployment: {unemployment.min():.1f}% to {unemployment.max():.1f}%')
print(f'Inflation: {inflation.min():.1f}% to {inflation.max():.1f}%')
# Plot the raw data
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 6), sharex=True)
ax1.plot(years, inflation, 'o-', color='coral', markersize=4, lw=1.5)
ax1.set_ylabel('Inflation (%)')
ax1.set_title('US Annual Data, 1960-2023', fontsize=14)
ax1.axvspan(1970, 1982, alpha=0.1, color='coral', label='Stagflation era')
ax1.legend(fontsize=10)
ax2.plot(years, unemployment, 'o-', color='steelblue', markersize=4, lw=1.5)
ax2.set_ylabel('Unemployment (%)')
ax2.set_xlabel('Year')
plt.tight_layout()
plt.savefig(FIGURES_DIR / 'data_timeseries.png', dpi=300, bbox_inches='tight')
plt.show()
Data: 64 annual observations, 1960-2023 Unemployment: 3.5% to 9.7% Inflation: -0.4% to 13.5%
2. The Model: Time-Varying Parameters with Stochastic Volatility¶
A static Phillips Curve assumes the relationship between unemployment and inflation is fixed. It isn't. The slope changes, the intercept drifts, and the noise level varies over time.
Our model lets all three evolve as random walks:
$$\text{inflation}_t = \alpha_t + \beta_t \cdot \text{unemployment}_t + \varepsilon_t$$
$$\alpha_t = \alpha_{t-1} + \eta_t^\alpha, \qquad \eta_t^\alpha \sim \mathcal{N}(0, \sigma_\alpha^2)$$
$$\beta_t = \beta_{t-1} + \eta_t^\beta, \qquad \eta_t^\beta \sim \mathcal{N}(0, \sigma_\beta^2)$$
$$h_t = h_{t-1} + \eta_t^h, \qquad \eta_t^h \sim \mathcal{N}(0, \sigma_h^2)$$
where $\alpha_t$ is the trend inflation, $\beta_t$ is the Phillips slope, and $h_t$ is the log-volatility so that $\varepsilon_t \sim \mathcal{N}(0, e^{h_t})$. The innovation variances $\sigma_\alpha^2$, $\sigma_\beta^2$, $\sigma_h^2$ control how fast each parameter can drift — the data determines how much time variation is needed.
This is the model class that Primiceri (2005), Cogley and Sargent (2005), and the TVP-VAR literature estimate with Gibbs sampling. We use a single-equation version for transparency. Like all TVP regressions, this is a reduced-form object — it summarizes how inflation and unemployment co-move over time, not why. The Lucas Critique reminds us that these co-movements can shift precisely when policy changes. Our model tracks those shifts but doesn't claim to explain them structurally.
# JAGS model specification
model_code = '''
model {
# Priors on innovation precisions (JAGS uses precision, not variance)
tau_alpha ~ dgamma(0.01, 0.01)
tau_beta ~ dgamma(0.01, 0.01)
tau_h ~ dgamma(0.01, 0.01)
# Initial states
alpha[1] ~ dnorm(3.0, 0.04) # mean 3, sd 5
beta[1] ~ dnorm(-0.5, 0.25) # mean -0.5, sd 2
h[1] ~ dnorm(0.0, 0.25) # log-volatility, sd 2
# Random walks
for (t in 2:T) {
alpha[t] ~ dnorm(alpha[t-1], tau_alpha)
beta[t] ~ dnorm(beta[t-1], tau_beta)
h[t] ~ dnorm(h[t-1], tau_h)
}
# Observation model with stochastic volatility
for (t in 1:T) {
mu[t] <- alpha[t] + beta[t] * x[t]
tau_obs[t] <- exp(-h[t])
y[t] ~ dnorm(mu[t], tau_obs[t])
}
# Derived: observation standard deviation
for (t in 1:T) {
sigma[t] <- exp(h[t] / 2.0)
}
}
'''
# Run Gibbs sampler
print('Compiling model and running Gibbs sampler...')
t0 = time.perf_counter()
model = pyjags.Model(
code=model_code,
data=dict(y=inflation, x=unemployment, T=T),
chains=4,
adapt=2000,
)
# Burn-in
model.sample(1000, vars=['alpha', 'beta', 'sigma', 'tau_alpha', 'tau_beta', 'tau_h'])
# Posterior samples
samples = model.sample(5000, vars=['alpha', 'beta', 'sigma', 'tau_alpha', 'tau_beta', 'tau_h'])
elapsed = time.perf_counter() - t0
print(f'Gibbs sampling complete: 4 chains x 5,000 draws in {elapsed:.1f}s')
Compiling model and running Gibbs sampler... adapting: iterations 8000 of 8000, elapsed 0:00:01, remaining 0:00:00 sampling: iterations 4000 of 4000, elapsed 0:00:00, remaining 0:00:00 sampling: iterations 20000 of 20000, elapsed 0:00:01, remaining 0:00:00 Gibbs sampling complete: 4 chains x 5,000 draws in 2.3s
# PyJAGS returns shape (dim, iterations, chains)
alpha_samples = samples['alpha'] # (T, 5000, 4)
beta_samples = samples['beta'] # (T, 5000, 4)
sigma_samples = samples['sigma'] # (T, 5000, 4)
# Posterior means and 90% credible intervals
alpha_mean = alpha_samples.mean(axis=(1, 2))
beta_mean = beta_samples.mean(axis=(1, 2))
sigma_mean = sigma_samples.mean(axis=(1, 2))
def credible_interval(arr, prob=0.9):
flat = arr.reshape(arr.shape[0], -1)
lo = np.percentile(flat, (1 - prob) / 2 * 100, axis=1)
hi = np.percentile(flat, (1 + prob) / 2 * 100, axis=1)
return lo, hi
alpha_lo, alpha_hi = credible_interval(alpha_samples)
beta_lo, beta_hi = credible_interval(beta_samples)
sigma_lo, sigma_hi = credible_interval(sigma_samples)
print('Posterior summaries computed.')
Posterior summaries computed.
3. The Phillips Curve in Motion¶
The static Phillips Curve is a single line. The time-varying version is a movie — the slope, intercept, and noise level all evolve year by year.
fig, axes = plt.subplots(3, 1, figsize=(12, 10), sharex=True)
# Trend inflation (alpha)
axes[0].plot(years, alpha_mean, color='coral', lw=2)
axes[0].fill_between(years, alpha_lo, alpha_hi, color='coral', alpha=0.2)
axes[0].set_ylabel('Trend inflation $\\alpha_t$ (%)')
axes[0].set_title('Time-Varying Phillips Curve Parameters', fontsize=14)
axes[0].axvspan(1970, 1982, alpha=0.08, color='gray')
axes[0].axhline(0, color='gray', lw=0.5, ls='--')
# Phillips slope (beta)
axes[1].plot(years, beta_mean, color='steelblue', lw=2)
axes[1].fill_between(years, beta_lo, beta_hi, color='steelblue', alpha=0.2)
axes[1].set_ylabel('Phillips slope $\\beta_t$')
axes[1].axhline(0, color='gray', lw=0.5, ls='--')
axes[1].axvspan(1970, 1982, alpha=0.08, color='gray')
axes[1].annotate('Slope near zero\n(stagflation)', xy=(1975, 0), fontsize=10,
ha='center', color='gray')
# Stochastic volatility (sigma)
axes[2].plot(years, sigma_mean, color='seagreen', lw=2)
axes[2].fill_between(years, sigma_lo, sigma_hi, color='seagreen', alpha=0.2)
axes[2].set_ylabel('Volatility $\\sigma_t$ (%)')
axes[2].set_xlabel('Year')
axes[2].axvspan(1970, 1982, alpha=0.08, color='gray')
plt.tight_layout()
plt.savefig(FIGURES_DIR / 'tvp_parameters.png', dpi=300, bbox_inches='tight')
plt.show()
How well does the model fit?¶
fitted = alpha_mean + beta_mean * unemployment
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(years, inflation, 'o', color='coral', markersize=5, label='Observed inflation')
ax.plot(years, fitted, '-', color='steelblue', lw=2, label='TVP model fit')
# 90% posterior predictive band
fitted_all = alpha_samples.reshape(T, -1) + beta_samples.reshape(T, -1) * unemployment[:, None]
noise = sigma_samples.reshape(T, -1) * np.random.randn(T, fitted_all.shape[1])
pred = fitted_all + noise
pred_lo = np.percentile(pred, 5, axis=1)
pred_hi = np.percentile(pred, 95, axis=1)
ax.fill_between(years, pred_lo, pred_hi, alpha=0.15, color='steelblue',
label='90% predictive interval')
ax.set_xlabel('Year')
ax.set_ylabel('Inflation (%)')
ax.set_title('Time-Varying Phillips Curve: Fit vs Data', fontsize=13)
ax.legend(fontsize=10)
ax.axvspan(1970, 1982, alpha=0.05, color='gray')
plt.tight_layout()
plt.savefig(FIGURES_DIR / 'model_fit.png', dpi=300, bbox_inches='tight')
plt.show()
4. Information-Theoretic Diagnostics¶
Now we apply the divergence package's diagnostic toolkit to the TVP model. We need to convert the PyJAGS output to an ArviZ DataTree, then compute information gain, uncertainty decomposition, and Bayesian surprise.
from scipy.stats import norm as normal_dist
import xarray as xr
# Build ArviZ DataTree from PyJAGS samples
# PyJAGS shape: (param_dim, iterations, chains) -> ArviZ needs (chains, draws, ...)
# Posterior: rearrange to (chains, draws, T) for vector params
post_dict = {}
for name, arr in [('alpha', alpha_samples), ('beta', beta_samples), ('sigma', sigma_samples)]:
# (T, iter, chains) -> (chains, iter, T)
post_dict[name] = np.transpose(arr, (2, 1, 0))
# Scalar hyperparameters
for name in ['tau_alpha', 'tau_beta', 'tau_h']:
# (1, iter, chains) -> (chains, iter)
post_dict[name] = samples[name].squeeze().T
# Prior samples (generate from priors for information_gain)
rng = np.random.default_rng(42)
n_prior = post_dict['alpha'].shape[1]
n_chains = 4
prior_dict = {
'alpha': rng.normal(3.0, 5.0, (n_chains, n_prior, T)),
'beta': rng.normal(-0.5, 2.0, (n_chains, n_prior, T)),
}
# Log-likelihood for Bayesian surprise
# shape: (chains, draws, T)
log_lik = np.zeros((n_chains, n_prior, T))
for c in range(n_chains):
for d in range(n_prior):
mu = post_dict['alpha'][c, d] + post_dict['beta'][c, d] * unemployment
sig = post_dict['sigma'][c, d]
log_lik[c, d] = normal_dist.logpdf(inflation, loc=mu, scale=sig)
# Posterior predictive for uncertainty decomposition
pred_samples = np.zeros((n_chains, n_prior, T))
for c in range(n_chains):
for d in range(n_prior):
mu = post_dict['alpha'][c, d] + post_dict['beta'][c, d] * unemployment
sig = post_dict['sigma'][c, d]
pred_samples[c, d] = rng.normal(mu, sig)
# Build the DataTree
idata = az.from_dict({
'posterior': {k: v for k, v in post_dict.items()},
'prior': prior_dict,
'log_likelihood': {'inflation': log_lik},
'posterior_predictive': {'inflation': pred_samples},
})
print(f'ArviZ DataTree created with groups: {list(idata.children)}')
ArviZ DataTree created with groups: ['posterior', 'prior', 'log_likelihood', 'posterior_predictive']
What did the data teach us?¶
Information gain (KL divergence from prior to posterior) measures how much the 64 years of data updated our beliefs about each parameter.
from divergence import information_gain
ig = information_gain(idata, var_names=['alpha', 'beta'])
print('Information gain (nats):')
for name, val in ig.items():
if hasattr(val, '__len__') and len(val) > 1:
print(f' {name}: mean={np.mean(val):.2f}, total={np.sum(val):.2f} nats across {len(val)} time points')
else:
print(f' {name}: {float(val):.2f} nats')
Information gain (nats): alpha: mean=1.07, total=68.68 nats across 64 time points beta: mean=1.64, total=104.67 nats across 64 time points
Which years surprised the model?¶
Bayesian surprise measures how much each observation deviates from the model's expectations. High surprise means the model didn't see it coming.
from divergence import bayesian_surprise
bs = bayesian_surprise(idata)
surprise = bs['inflation']
# Plot
fig, ax = plt.subplots(figsize=(12, 4))
colors = ['coral' if 1970 <= y <= 1982 else
'goldenrod' if y >= 2020 else
'steelblue' for y in years]
ax.bar(years, surprise, color=colors, edgecolor='white', linewidth=0.5)
ax.set_xlabel('Year')
ax.set_ylabel('Bayesian surprise (nats)')
ax.set_title('Which years surprised the TVP model most?', fontsize=13)
ax.axvspan(1970, 1982, alpha=0.08, color='coral', label='Stagflation era')
ax.axvspan(2020, 2023, alpha=0.08, color='goldenrod', label='Post-COVID')
ax.legend(fontsize=10)
plt.tight_layout()
plt.savefig(FIGURES_DIR / 'bayesian_surprise.png', dpi=300, bbox_inches='tight')
plt.show()
# Top 10 most surprising years
top_idx = np.argsort(surprise)[::-1][:10]
print('Top 10 most surprising years:')
for idx in top_idx:
print(f' {years[idx]}: {surprise[idx]:.2f} nats')
Top 10 most surprising years: 1974: 3.01 nats 2022: 2.95 nats 1975: 1.86 nats 2021: 1.57 nats 2023: 1.40 nats 1973: 1.37 nats 1976: 1.29 nats 1979: 1.28 nats 1972: 1.26 nats 1978: 0.96 nats
Interpretation¶
The TVP model can adapt — its parameters drift to track the changing economy. So the years that still surprise it after adaptation are the ones where the change was too sudden or too large for a random walk to follow. These are genuine structural breaks, not just noise.
Compare this to the static model in Notebook 7, where the 1970s dominate the surprise chart because the fixed model has no way to accommodate them. The TVP model partially absorbs the 1970s shift through its drifting parameters, so whatever surprise remains represents residual anomaly — the part of stagflation that even a flexible model struggles to explain.
5. The Story the Parameters Tell¶
Look at the three panels of time-varying parameters:
Trend inflation ($\alpha_t$) peaked in the late 1970s and has been declining since — the Great Moderation, anchored inflation expectations, and central bank credibility.
The Phillips slope ($\beta_t$): during stagflation, the credible interval includes zero — the model can no longer confidently detect the tradeoff. This may partly reflect omitted supply shocks (oil, food, price controls) that a bivariate model can't separate from a genuine structural change. But the pattern is consistent with what Friedman and Phelps argued theoretically: the short-run tradeoff is unstable when expectations de-anchor.
Stochastic volatility ($\sigma_t$) peaked in the 1970s and again in the 2020s — the two major inflationary episodes. The Great Moderation (1990s-2000s) shows minimal volatility. The model learned that uncertainty itself is time-varying.
These three paths tell the macroeconomic history of the United States more clearly than any single number. A correlation coefficient can't show you a slope that weakens. A fixed-variance model can't show you volatility that changes. The time-varying Bayesian model, equipped with information-theoretic diagnostics, reveals the structure that mechanical thinking misses.
The Divergence Notebook Series¶
| # | Notebook | What it covers |
|---|---|---|
| 1 | Divergence | Shannon's foundations: entropy, cross entropy, KL divergence, Jensen-Shannon, mutual information, joint and conditional entropy |
| 2 | Beyond KL | f-divergences (TV, Hellinger, chi-squared, Jeffreys, Cressie-Read) and the Rényi family |
| 3 | Distances and Testing | Sample-based methods: Wasserstein, Sinkhorn, energy distance, MMD, kNN estimators, two-sample permutation tests |
| 4 | Dependence and Causality | Multivariate dependence (TC, NMI, VI) and directed information flow (transfer entropy) |
| 5 | Bayesian Diagnostics | End-to-end MCMC with emcee on the Nile change-point — convergence diagnostics, information gain, Bayesian surprise |
| 6 | Real-World Applications | Stock market contagion, crop yields, Phillips Curve — real data, real stakes |
| 7 | Score-Based Divergences: Fisher and Stein | Fisher divergence and kernel Stein discrepancy |
| 8 | Did My Sampler Find the Truth? | KSD as convergence diagnostic with NumPyro: NUTS vs VI, the 250-year journey from Bayes to Stein |
| 9 | Phillips Curve TVP (this notebook) | Time-varying Phillips Curve with PyJAGS Gibbs sampling — stagflation as a structural break |