The Phillips Curve: From Fixed Pipes and Automatons to a New Notion of Economic Equilibrium¶

Time-Varying Parameters, Stochastic Volatility, and Information-Theoretic Diagnostics¶

In 1958, A.W. Phillips — a New Zealand-born engineer who survived Japanese captivity as a prisoner of war and built a hydraulic computer from tanks, pipes, and colored water to simulate the British economy — published a paper showing that low unemployment correlated with high wage inflation. For a generation, policymakers treated it as a dial they could turn. Then came the 1970s, and the dial broke.

This notebook fits a time-varying parameter Phillips Curve with stochastic volatility to 64 years of US data using PyJAGS (Gibbs sampling), then applies the information-theoretic diagnostics from the divergence package to measure what the data taught us, decompose the uncertainty, and flag the years when the model was most surprised.

Why Gibbs sampling? This model class — state-space models with random-walk parameters — is the home turf of Gibbs sampling. The conditional distributions are conjugate: given the parameters, the states are linear-Gaussian (exact forward-filtering backward-sampling). Given the states, the parameters have closed-form conditionals. Gibbs exploits this structure directly. HMC/NUTS treats the same problem as a 195-dimensional black box, struggles with the funnel geometry between scale parameters and increments, and requires non-centered parameterizations and careful tuning. On this problem, PyJAGS runs in 2 seconds. NumPyro NUTS on GPU didn't finish in 30 minutes.

This is not a universal statement — HMC excels for non-conjugate models and can detect multimodality that Gibbs misses (Farkas & Tatar, 2020). But for TVP state-space models, the Bayesian macroeconometrics literature (Primiceri 2005, Cogley & Sargent 2005, Nakajima 2011) uses Gibbs for good reason.

Prerequisites: Bayesian Diagnostics (Notebook 6) for information gain and uncertainty decomposition. Real-World Applications (Notebook 7) for the static Phillips Curve analysis.

In [1]:

Copied!





import numpy as np
import matplotlib.pyplot as plt
import pyjags
import arviz as az
import time

from pathlib import Path

plt.rcParams.update({
    'figure.figsize': (12, 5),
    'axes.spines.top': False,
    'axes.spines.right': False,
    'font.size': 12,
})

np.random.seed(42)

FIGURES_DIR = Path('figures/phillips_curve_tvp')
FIGURES_DIR.mkdir(parents=True, exist_ok=True)
import numpy as np
import matplotlib.pyplot as plt
import pyjags
import arviz as az
import time

from pathlib import Path

plt.rcParams.update({
    'figure.figsize': (12, 5),
    'axes.spines.top': False,
    'axes.spines.right': False,
    'font.size': 12,
})

np.random.seed(42)

FIGURES_DIR = Path('figures/phillips_curve_tvp')
FIGURES_DIR.mkdir(parents=True, exist_ok=True)

1. The Data: 64 Years of Unemployment and Inflation¶

Annual US data from 1960 to 2023. Unemployment is the civilian unemployment rate; inflation is CPI-based. Both from FRED, hardcoded here to avoid API dependencies.

In [2]:

Copied!





years = np.arange(1960, 2024)
T = len(years)

unemployment = np.array([
    5.5, 6.7, 5.5, 5.7, 5.2, 4.5, 3.8, 3.8, 3.6, 3.5,  # 1960-69
    4.9, 5.9, 5.6, 4.9, 5.6, 8.5, 7.7, 7.1, 6.1, 5.8,  # 1970-79
    7.1, 7.6, 9.7, 9.6, 7.5, 7.2, 7.0, 6.2, 5.5, 5.3,  # 1980-89
    5.6, 6.8, 7.5, 6.9, 6.1, 5.6, 5.4, 4.9, 4.5, 4.2,  # 1990-99
    4.0, 4.7, 5.8, 6.0, 5.5, 5.1, 4.6, 4.6, 5.8, 9.3,  # 2000-09
    9.6, 8.9, 8.1, 7.4, 6.2, 5.3, 4.9, 4.4, 3.9, 3.7,  # 2010-19
    8.1, 5.4, 3.6, 3.6,                                    # 2020-23
])

inflation = np.array([
    1.7, 1.0, 1.2, 1.3, 1.3, 1.6, 2.9, 3.1, 4.2, 5.5,  # 1960-69
    5.7, 4.4, 3.2, 6.2, 11.0, 9.1, 5.8, 6.5, 7.6, 11.3, # 1970-79
    13.5, 10.3, 6.2, 3.2, 4.3, 3.6, 1.9, 3.6, 4.1, 4.8, # 1980-89
    5.4, 4.2, 3.0, 3.0, 2.6, 2.8, 3.0, 2.3, 1.6, 2.2,   # 1990-99
    3.4, 2.8, 1.6, 2.3, 2.7, 3.4, 3.2, 2.8, 3.8, -0.4,  # 2000-09
    1.6, 3.2, 2.1, 1.5, 1.6, 0.1, 1.3, 2.1, 2.4, 1.8,   # 2010-19
    1.2, 4.7, 8.0, 4.1,                                    # 2020-23
])

print(f'Data: {T} annual observations, {years[0]}-{years[-1]}')
print(f'Unemployment: {unemployment.min():.1f}% to {unemployment.max():.1f}%')
print(f'Inflation: {inflation.min():.1f}% to {inflation.max():.1f}%')

# Plot the raw data
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 6), sharex=True)

ax1.plot(years, inflation, 'o-', color='coral', markersize=4, lw=1.5)
ax1.set_ylabel('Inflation (%)')
ax1.set_title('US Annual Data, 1960-2023', fontsize=14)
ax1.axvspan(1970, 1982, alpha=0.1, color='coral', label='Stagflation era')
ax1.legend(fontsize=10)

ax2.plot(years, unemployment, 'o-', color='steelblue', markersize=4, lw=1.5)
ax2.set_ylabel('Unemployment (%)')
ax2.set_xlabel('Year')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'data_timeseries.png', dpi=300, bbox_inches='tight')
plt.show()
years = np.arange(1960, 2024)
T = len(years)

unemployment = np.array([
    5.5, 6.7, 5.5, 5.7, 5.2, 4.5, 3.8, 3.8, 3.6, 3.5,  # 1960-69
    4.9, 5.9, 5.6, 4.9, 5.6, 8.5, 7.7, 7.1, 6.1, 5.8,  # 1970-79
    7.1, 7.6, 9.7, 9.6, 7.5, 7.2, 7.0, 6.2, 5.5, 5.3,  # 1980-89
    5.6, 6.8, 7.5, 6.9, 6.1, 5.6, 5.4, 4.9, 4.5, 4.2,  # 1990-99
    4.0, 4.7, 5.8, 6.0, 5.5, 5.1, 4.6, 4.6, 5.8, 9.3,  # 2000-09
    9.6, 8.9, 8.1, 7.4, 6.2, 5.3, 4.9, 4.4, 3.9, 3.7,  # 2010-19
    8.1, 5.4, 3.6, 3.6,                                    # 2020-23
])

inflation = np.array([
    1.7, 1.0, 1.2, 1.3, 1.3, 1.6, 2.9, 3.1, 4.2, 5.5,  # 1960-69
    5.7, 4.4, 3.2, 6.2, 11.0, 9.1, 5.8, 6.5, 7.6, 11.3, # 1970-79
    13.5, 10.3, 6.2, 3.2, 4.3, 3.6, 1.9, 3.6, 4.1, 4.8, # 1980-89
    5.4, 4.2, 3.0, 3.0, 2.6, 2.8, 3.0, 2.3, 1.6, 2.2,   # 1990-99
    3.4, 2.8, 1.6, 2.3, 2.7, 3.4, 3.2, 2.8, 3.8, -0.4,  # 2000-09
    1.6, 3.2, 2.1, 1.5, 1.6, 0.1, 1.3, 2.1, 2.4, 1.8,   # 2010-19
    1.2, 4.7, 8.0, 4.1,                                    # 2020-23
])

print(f'Data: {T} annual observations, {years[0]}-{years[-1]}')
print(f'Unemployment: {unemployment.min():.1f}% to {unemployment.max():.1f}%')
print(f'Inflation: {inflation.min():.1f}% to {inflation.max():.1f}%')

# Plot the raw data
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 6), sharex=True)

ax1.plot(years, inflation, 'o-', color='coral', markersize=4, lw=1.5)
ax1.set_ylabel('Inflation (%)')
ax1.set_title('US Annual Data, 1960-2023', fontsize=14)
ax1.axvspan(1970, 1982, alpha=0.1, color='coral', label='Stagflation era')
ax1.legend(fontsize=10)

ax2.plot(years, unemployment, 'o-', color='steelblue', markersize=4, lw=1.5)
ax2.set_ylabel('Unemployment (%)')
ax2.set_xlabel('Year')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'data_timeseries.png', dpi=300, bbox_inches='tight')
plt.show()

Data: 64 annual observations, 1960-2023
Unemployment: 3.5% to 9.7%
Inflation: -0.4% to 13.5%

No description has been provided for this image

2. The Model: Time-Varying Parameters with Stochastic Volatility¶

A static Phillips Curve assumes the relationship between unemployment and inflation is fixed. It isn't. The slope changes, the intercept drifts, and the noise level varies over time.

Our model lets all three evolve as random walks:

$$\text{inflation}_t = \alpha_t + \beta_t \cdot \text{unemployment}_t + \varepsilon_t$$

$$\alpha_t = \alpha_{t-1} + \eta_t^\alpha, \qquad \eta_t^\alpha \sim \mathcal{N}(0, \sigma_\alpha^2)$$

$$\beta_t = \beta_{t-1} + \eta_t^\beta, \qquad \eta_t^\beta \sim \mathcal{N}(0, \sigma_\beta^2)$$

$$h_t = h_{t-1} + \eta_t^h, \qquad \eta_t^h \sim \mathcal{N}(0, \sigma_h^2)$$

where $\alpha_t$ is the trend inflation, $\beta_t$ is the Phillips slope, and $h_t$ is the log-volatility so that $\varepsilon_t \sim \mathcal{N}(0, e^{h_t})$. The innovation variances $\sigma_\alpha^2$, $\sigma_\beta^2$, $\sigma_h^2$ control how fast each parameter can drift — the data determines how much time variation is needed.

This is the model class that Primiceri (2005), Cogley and Sargent (2005), and the TVP-VAR literature estimate with Gibbs sampling. We use a single-equation version for transparency. Like all TVP regressions, this is a reduced-form object — it summarizes how inflation and unemployment co-move over time, not why. The Lucas Critique reminds us that these co-movements can shift precisely when policy changes. Our model tracks those shifts but doesn't claim to explain them structurally.

In [3]:

Copied!





# JAGS model specification
model_code = '''
model {
    # Priors on innovation precisions (JAGS uses precision, not variance)
    tau_alpha ~ dgamma(0.01, 0.01)
    tau_beta ~ dgamma(0.01, 0.01)
    tau_h ~ dgamma(0.01, 0.01)

    # Initial states
    alpha[1] ~ dnorm(3.0, 0.04)     # mean 3, sd 5
    beta[1] ~ dnorm(-0.5, 0.25)      # mean -0.5, sd 2
    h[1] ~ dnorm(0.0, 0.25)          # log-volatility, sd 2

    # Random walks
    for (t in 2:T) {
        alpha[t] ~ dnorm(alpha[t-1], tau_alpha)
        beta[t] ~ dnorm(beta[t-1], tau_beta)
        h[t] ~ dnorm(h[t-1], tau_h)
    }

    # Observation model with stochastic volatility
    for (t in 1:T) {
        mu[t] <- alpha[t] + beta[t] * x[t]
        tau_obs[t] <- exp(-h[t])
        y[t] ~ dnorm(mu[t], tau_obs[t])
    }

    # Derived: observation standard deviation
    for (t in 1:T) {
        sigma[t] <- exp(h[t] / 2.0)
    }
}
'''

# Run Gibbs sampler
print('Compiling model and running Gibbs sampler...')
t0 = time.perf_counter()

model = pyjags.Model(
    code=model_code,
    data=dict(y=inflation, x=unemployment, T=T),
    chains=4,
    adapt=2000,
)

# Burn-in
model.sample(1000, vars=['alpha', 'beta', 'sigma', 'tau_alpha', 'tau_beta', 'tau_h'])

# Posterior samples
samples = model.sample(5000, vars=['alpha', 'beta', 'sigma', 'tau_alpha', 'tau_beta', 'tau_h'])

elapsed = time.perf_counter() - t0
print(f'Gibbs sampling complete: 4 chains x 5,000 draws in {elapsed:.1f}s')
# JAGS model specification
model_code = '''
model {
    # Priors on innovation precisions (JAGS uses precision, not variance)
    tau_alpha ~ dgamma(0.01, 0.01)
    tau_beta ~ dgamma(0.01, 0.01)
    tau_h ~ dgamma(0.01, 0.01)

    # Initial states
    alpha[1] ~ dnorm(3.0, 0.04)     # mean 3, sd 5
    beta[1] ~ dnorm(-0.5, 0.25)      # mean -0.5, sd 2
    h[1] ~ dnorm(0.0, 0.25)          # log-volatility, sd 2

    # Random walks
    for (t in 2:T) {
        alpha[t] ~ dnorm(alpha[t-1], tau_alpha)
        beta[t] ~ dnorm(beta[t-1], tau_beta)
        h[t] ~ dnorm(h[t-1], tau_h)
    }

    # Observation model with stochastic volatility
    for (t in 1:T) {
        mu[t] <- alpha[t] + beta[t] * x[t]
        tau_obs[t] <- exp(-h[t])
        y[t] ~ dnorm(mu[t], tau_obs[t])
    }

    # Derived: observation standard deviation
    for (t in 1:T) {
        sigma[t] <- exp(h[t] / 2.0)
    }
}
'''

# Run Gibbs sampler
print('Compiling model and running Gibbs sampler...')
t0 = time.perf_counter()

model = pyjags.Model(
    code=model_code,
    data=dict(y=inflation, x=unemployment, T=T),
    chains=4,
    adapt=2000,
)

# Burn-in
model.sample(1000, vars=['alpha', 'beta', 'sigma', 'tau_alpha', 'tau_beta', 'tau_h'])

# Posterior samples
samples = model.sample(5000, vars=['alpha', 'beta', 'sigma', 'tau_alpha', 'tau_beta', 'tau_h'])

elapsed = time.perf_counter() - t0
print(f'Gibbs sampling complete: 4 chains x 5,000 draws in {elapsed:.1f}s')

Compiling model and running Gibbs sampler...
adapting: iterations 8000 of 8000, elapsed 0:00:01, remaining 0:00:00
sampling: iterations 4000 of 4000, elapsed 0:00:00, remaining 0:00:00
sampling: iterations 20000 of 20000, elapsed 0:00:01, remaining 0:00:00
Gibbs sampling complete: 4 chains x 5,000 draws in 2.3s

In [4]:

Copied!





# PyJAGS returns shape (dim, iterations, chains)
alpha_samples = samples['alpha']     # (T, 5000, 4)
beta_samples = samples['beta']       # (T, 5000, 4)
sigma_samples = samples['sigma']     # (T, 5000, 4)

# Posterior means and 90% credible intervals
alpha_mean = alpha_samples.mean(axis=(1, 2))
beta_mean = beta_samples.mean(axis=(1, 2))
sigma_mean = sigma_samples.mean(axis=(1, 2))

def credible_interval(arr, prob=0.9):
    flat = arr.reshape(arr.shape[0], -1)
    lo = np.percentile(flat, (1 - prob) / 2 * 100, axis=1)
    hi = np.percentile(flat, (1 + prob) / 2 * 100, axis=1)
    return lo, hi

alpha_lo, alpha_hi = credible_interval(alpha_samples)
beta_lo, beta_hi = credible_interval(beta_samples)
sigma_lo, sigma_hi = credible_interval(sigma_samples)

print('Posterior summaries computed.')
# PyJAGS returns shape (dim, iterations, chains)
alpha_samples = samples['alpha']     # (T, 5000, 4)
beta_samples = samples['beta']       # (T, 5000, 4)
sigma_samples = samples['sigma']     # (T, 5000, 4)

# Posterior means and 90% credible intervals
alpha_mean = alpha_samples.mean(axis=(1, 2))
beta_mean = beta_samples.mean(axis=(1, 2))
sigma_mean = sigma_samples.mean(axis=(1, 2))

def credible_interval(arr, prob=0.9):
    flat = arr.reshape(arr.shape[0], -1)
    lo = np.percentile(flat, (1 - prob) / 2 * 100, axis=1)
    hi = np.percentile(flat, (1 + prob) / 2 * 100, axis=1)
    return lo, hi

alpha_lo, alpha_hi = credible_interval(alpha_samples)
beta_lo, beta_hi = credible_interval(beta_samples)
sigma_lo, sigma_hi = credible_interval(sigma_samples)

print('Posterior summaries computed.')

Posterior summaries computed.

3. The Phillips Curve in Motion¶

The static Phillips Curve is a single line. The time-varying version is a movie — the slope, intercept, and noise level all evolve year by year.

In [5]:

Copied!





fig, axes = plt.subplots(3, 1, figsize=(12, 10), sharex=True)

# Trend inflation (alpha)
axes[0].plot(years, alpha_mean, color='coral', lw=2)
axes[0].fill_between(years, alpha_lo, alpha_hi, color='coral', alpha=0.2)
axes[0].set_ylabel('Trend inflation $\\alpha_t$ (%)')
axes[0].set_title('Time-Varying Phillips Curve Parameters', fontsize=14)
axes[0].axvspan(1970, 1982, alpha=0.08, color='gray')
axes[0].axhline(0, color='gray', lw=0.5, ls='--')

# Phillips slope (beta)
axes[1].plot(years, beta_mean, color='steelblue', lw=2)
axes[1].fill_between(years, beta_lo, beta_hi, color='steelblue', alpha=0.2)
axes[1].set_ylabel('Phillips slope $\\beta_t$')
axes[1].axhline(0, color='gray', lw=0.5, ls='--')
axes[1].axvspan(1970, 1982, alpha=0.08, color='gray')
axes[1].annotate('Slope near zero\n(stagflation)', xy=(1975, 0), fontsize=10,
                 ha='center', color='gray')

# Stochastic volatility (sigma)
axes[2].plot(years, sigma_mean, color='seagreen', lw=2)
axes[2].fill_between(years, sigma_lo, sigma_hi, color='seagreen', alpha=0.2)
axes[2].set_ylabel('Volatility $\\sigma_t$ (%)')
axes[2].set_xlabel('Year')
axes[2].axvspan(1970, 1982, alpha=0.08, color='gray')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'tvp_parameters.png', dpi=300, bbox_inches='tight')
plt.show()
fig, axes = plt.subplots(3, 1, figsize=(12, 10), sharex=True)

# Trend inflation (alpha)
axes[0].plot(years, alpha_mean, color='coral', lw=2)
axes[0].fill_between(years, alpha_lo, alpha_hi, color='coral', alpha=0.2)
axes[0].set_ylabel('Trend inflation $\\alpha_t$ (%)')
axes[0].set_title('Time-Varying Phillips Curve Parameters', fontsize=14)
axes[0].axvspan(1970, 1982, alpha=0.08, color='gray')
axes[0].axhline(0, color='gray', lw=0.5, ls='--')

# Phillips slope (beta)
axes[1].plot(years, beta_mean, color='steelblue', lw=2)
axes[1].fill_between(years, beta_lo, beta_hi, color='steelblue', alpha=0.2)
axes[1].set_ylabel('Phillips slope $\\beta_t$')
axes[1].axhline(0, color='gray', lw=0.5, ls='--')
axes[1].axvspan(1970, 1982, alpha=0.08, color='gray')
axes[1].annotate('Slope near zero\n(stagflation)', xy=(1975, 0), fontsize=10,
                 ha='center', color='gray')

# Stochastic volatility (sigma)
axes[2].plot(years, sigma_mean, color='seagreen', lw=2)
axes[2].fill_between(years, sigma_lo, sigma_hi, color='seagreen', alpha=0.2)
axes[2].set_ylabel('Volatility $\\sigma_t$ (%)')
axes[2].set_xlabel('Year')
axes[2].axvspan(1970, 1982, alpha=0.08, color='gray')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'tvp_parameters.png', dpi=300, bbox_inches='tight')
plt.show()

How well does the model fit?¶

In [6]:

Copied!





fitted = alpha_mean + beta_mean * unemployment

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(years, inflation, 'o', color='coral', markersize=5, label='Observed inflation')
ax.plot(years, fitted, '-', color='steelblue', lw=2, label='TVP model fit')

# 90% posterior predictive band
fitted_all = alpha_samples.reshape(T, -1) + beta_samples.reshape(T, -1) * unemployment[:, None]
noise = sigma_samples.reshape(T, -1) * np.random.randn(T, fitted_all.shape[1])
pred = fitted_all + noise
pred_lo = np.percentile(pred, 5, axis=1)
pred_hi = np.percentile(pred, 95, axis=1)
ax.fill_between(years, pred_lo, pred_hi, alpha=0.15, color='steelblue',
                label='90% predictive interval')

ax.set_xlabel('Year')
ax.set_ylabel('Inflation (%)')
ax.set_title('Time-Varying Phillips Curve: Fit vs Data', fontsize=13)
ax.legend(fontsize=10)
ax.axvspan(1970, 1982, alpha=0.05, color='gray')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'model_fit.png', dpi=300, bbox_inches='tight')
plt.show()
fitted = alpha_mean + beta_mean * unemployment

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(years, inflation, 'o', color='coral', markersize=5, label='Observed inflation')
ax.plot(years, fitted, '-', color='steelblue', lw=2, label='TVP model fit')

# 90% posterior predictive band
fitted_all = alpha_samples.reshape(T, -1) + beta_samples.reshape(T, -1) * unemployment[:, None]
noise = sigma_samples.reshape(T, -1) * np.random.randn(T, fitted_all.shape[1])
pred = fitted_all + noise
pred_lo = np.percentile(pred, 5, axis=1)
pred_hi = np.percentile(pred, 95, axis=1)
ax.fill_between(years, pred_lo, pred_hi, alpha=0.15, color='steelblue',
                label='90% predictive interval')

ax.set_xlabel('Year')
ax.set_ylabel('Inflation (%)')
ax.set_title('Time-Varying Phillips Curve: Fit vs Data', fontsize=13)
ax.legend(fontsize=10)
ax.axvspan(1970, 1982, alpha=0.05, color='gray')

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'model_fit.png', dpi=300, bbox_inches='tight')
plt.show()

4. Information-Theoretic Diagnostics¶

Now we apply the divergence package's diagnostic toolkit to the TVP model. We need to convert the PyJAGS output to an ArviZ DataTree, then compute information gain, uncertainty decomposition, and Bayesian surprise.

In [7]:

Copied!





from scipy.stats import norm as normal_dist
import xarray as xr

# Build ArviZ DataTree from PyJAGS samples
# PyJAGS shape: (param_dim, iterations, chains) -> ArviZ needs (chains, draws, ...)

# Posterior: rearrange to (chains, draws, T) for vector params
post_dict = {}
for name, arr in [('alpha', alpha_samples), ('beta', beta_samples), ('sigma', sigma_samples)]:
    # (T, iter, chains) -> (chains, iter, T)
    post_dict[name] = np.transpose(arr, (2, 1, 0))

# Scalar hyperparameters
for name in ['tau_alpha', 'tau_beta', 'tau_h']:
    # (1, iter, chains) -> (chains, iter)
    post_dict[name] = samples[name].squeeze().T

# Prior samples (generate from priors for information_gain)
rng = np.random.default_rng(42)
n_prior = post_dict['alpha'].shape[1]
n_chains = 4
prior_dict = {
    'alpha': rng.normal(3.0, 5.0, (n_chains, n_prior, T)),
    'beta': rng.normal(-0.5, 2.0, (n_chains, n_prior, T)),
}

# Log-likelihood for Bayesian surprise
# shape: (chains, draws, T)
log_lik = np.zeros((n_chains, n_prior, T))
for c in range(n_chains):
    for d in range(n_prior):
        mu = post_dict['alpha'][c, d] + post_dict['beta'][c, d] * unemployment
        sig = post_dict['sigma'][c, d]
        log_lik[c, d] = normal_dist.logpdf(inflation, loc=mu, scale=sig)

# Posterior predictive for uncertainty decomposition
pred_samples = np.zeros((n_chains, n_prior, T))
for c in range(n_chains):
    for d in range(n_prior):
        mu = post_dict['alpha'][c, d] + post_dict['beta'][c, d] * unemployment
        sig = post_dict['sigma'][c, d]
        pred_samples[c, d] = rng.normal(mu, sig)

# Build the DataTree
idata = az.from_dict({
    'posterior': {k: v for k, v in post_dict.items()},
    'prior': prior_dict,
    'log_likelihood': {'inflation': log_lik},
    'posterior_predictive': {'inflation': pred_samples},
})

print(f'ArviZ DataTree created with groups: {list(idata.children)}')
from scipy.stats import norm as normal_dist
import xarray as xr

# Build ArviZ DataTree from PyJAGS samples
# PyJAGS shape: (param_dim, iterations, chains) -> ArviZ needs (chains, draws, ...)

# Posterior: rearrange to (chains, draws, T) for vector params
post_dict = {}
for name, arr in [('alpha', alpha_samples), ('beta', beta_samples), ('sigma', sigma_samples)]:
    # (T, iter, chains) -> (chains, iter, T)
    post_dict[name] = np.transpose(arr, (2, 1, 0))

# Scalar hyperparameters
for name in ['tau_alpha', 'tau_beta', 'tau_h']:
    # (1, iter, chains) -> (chains, iter)
    post_dict[name] = samples[name].squeeze().T

# Prior samples (generate from priors for information_gain)
rng = np.random.default_rng(42)
n_prior = post_dict['alpha'].shape[1]
n_chains = 4
prior_dict = {
    'alpha': rng.normal(3.0, 5.0, (n_chains, n_prior, T)),
    'beta': rng.normal(-0.5, 2.0, (n_chains, n_prior, T)),
}

# Log-likelihood for Bayesian surprise
# shape: (chains, draws, T)
log_lik = np.zeros((n_chains, n_prior, T))
for c in range(n_chains):
    for d in range(n_prior):
        mu = post_dict['alpha'][c, d] + post_dict['beta'][c, d] * unemployment
        sig = post_dict['sigma'][c, d]
        log_lik[c, d] = normal_dist.logpdf(inflation, loc=mu, scale=sig)

# Posterior predictive for uncertainty decomposition
pred_samples = np.zeros((n_chains, n_prior, T))
for c in range(n_chains):
    for d in range(n_prior):
        mu = post_dict['alpha'][c, d] + post_dict['beta'][c, d] * unemployment
        sig = post_dict['sigma'][c, d]
        pred_samples[c, d] = rng.normal(mu, sig)

# Build the DataTree
idata = az.from_dict({
    'posterior': {k: v for k, v in post_dict.items()},
    'prior': prior_dict,
    'log_likelihood': {'inflation': log_lik},
    'posterior_predictive': {'inflation': pred_samples},
})

print(f'ArviZ DataTree created with groups: {list(idata.children)}')

ArviZ DataTree created with groups: ['posterior', 'prior', 'log_likelihood', 'posterior_predictive']

What did the data teach us?¶

Information gain (KL divergence from prior to posterior) measures how much the 64 years of data updated our beliefs about each parameter.

In [8]:

Copied!





from divergence import information_gain

ig = information_gain(idata, var_names=['alpha', 'beta'])

print('Information gain (nats):')
for name, val in ig.items():
    if hasattr(val, '__len__') and len(val) > 1:
        print(f'  {name}: mean={np.mean(val):.2f}, total={np.sum(val):.2f} nats across {len(val)} time points')
    else:
        print(f'  {name}: {float(val):.2f} nats')
from divergence import information_gain

ig = information_gain(idata, var_names=['alpha', 'beta'])

print('Information gain (nats):')
for name, val in ig.items():
    if hasattr(val, '__len__') and len(val) > 1:
        print(f'  {name}: mean={np.mean(val):.2f}, total={np.sum(val):.2f} nats across {len(val)} time points')
    else:
        print(f'  {name}: {float(val):.2f} nats')

Information gain (nats):
  alpha: mean=1.07, total=68.68 nats across 64 time points
  beta: mean=1.64, total=104.67 nats across 64 time points

Which years surprised the model?¶

Bayesian surprise measures how much each observation deviates from the model's expectations. High surprise means the model didn't see it coming.

In [9]:

Copied!





from divergence import bayesian_surprise

bs = bayesian_surprise(idata)
surprise = bs['inflation']

# Plot
fig, ax = plt.subplots(figsize=(12, 4))
colors = ['coral' if 1970 <= y <= 1982 else
          'goldenrod' if y >= 2020 else
          'steelblue' for y in years]
ax.bar(years, surprise, color=colors, edgecolor='white', linewidth=0.5)
ax.set_xlabel('Year')
ax.set_ylabel('Bayesian surprise (nats)')
ax.set_title('Which years surprised the TVP model most?', fontsize=13)
ax.axvspan(1970, 1982, alpha=0.08, color='coral', label='Stagflation era')
ax.axvspan(2020, 2023, alpha=0.08, color='goldenrod', label='Post-COVID')
ax.legend(fontsize=10)

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'bayesian_surprise.png', dpi=300, bbox_inches='tight')
plt.show()

# Top 10 most surprising years
top_idx = np.argsort(surprise)[::-1][:10]
print('Top 10 most surprising years:')
for idx in top_idx:
    print(f'  {years[idx]}: {surprise[idx]:.2f} nats')
from divergence import bayesian_surprise

bs = bayesian_surprise(idata)
surprise = bs['inflation']

# Plot
fig, ax = plt.subplots(figsize=(12, 4))
colors = ['coral' if 1970 <= y <= 1982 else
          'goldenrod' if y >= 2020 else
          'steelblue' for y in years]
ax.bar(years, surprise, color=colors, edgecolor='white', linewidth=0.5)
ax.set_xlabel('Year')
ax.set_ylabel('Bayesian surprise (nats)')
ax.set_title('Which years surprised the TVP model most?', fontsize=13)
ax.axvspan(1970, 1982, alpha=0.08, color='coral', label='Stagflation era')
ax.axvspan(2020, 2023, alpha=0.08, color='goldenrod', label='Post-COVID')
ax.legend(fontsize=10)

plt.tight_layout()
plt.savefig(FIGURES_DIR / 'bayesian_surprise.png', dpi=300, bbox_inches='tight')
plt.show()

# Top 10 most surprising years
top_idx = np.argsort(surprise)[::-1][:10]
print('Top 10 most surprising years:')
for idx in top_idx:
    print(f'  {years[idx]}: {surprise[idx]:.2f} nats')

Top 10 most surprising years:
  1974: 3.01 nats
  2022: 2.95 nats
  1975: 1.86 nats
  2021: 1.57 nats
  2023: 1.40 nats
  1973: 1.37 nats
  1976: 1.29 nats
  1979: 1.28 nats
  1972: 1.26 nats
  1978: 0.96 nats

Interpretation¶

The TVP model can adapt — its parameters drift to track the changing economy. So the years that still surprise it after adaptation are the ones where the change was too sudden or too large for a random walk to follow. These are genuine structural breaks, not just noise.

Compare this to the static model in Notebook 7, where the 1970s dominate the surprise chart because the fixed model has no way to accommodate them. The TVP model partially absorbs the 1970s shift through its drifting parameters, so whatever surprise remains represents residual anomaly — the part of stagflation that even a flexible model struggles to explain.

5. The Story the Parameters Tell¶

Look at the three panels of time-varying parameters:

Trend inflation ($\alpha_t$) peaked in the late 1970s and has been declining since — the Great Moderation, anchored inflation expectations, and central bank credibility.
The Phillips slope ($\beta_t$): during stagflation, the credible interval includes zero — the model can no longer confidently detect the tradeoff. This may partly reflect omitted supply shocks (oil, food, price controls) that a bivariate model can't separate from a genuine structural change. But the pattern is consistent with what Friedman and Phelps argued theoretically: the short-run tradeoff is unstable when expectations de-anchor.
Stochastic volatility ($\sigma_t$) peaked in the 1970s and again in the 2020s — the two major inflationary episodes. The Great Moderation (1990s-2000s) shows minimal volatility. The model learned that uncertainty itself is time-varying.

These three paths tell the macroeconomic history of the United States more clearly than any single number. A correlation coefficient can't show you a slope that weakens. A fixed-variance model can't show you volatility that changes. The time-varying Bayesian model, equipped with information-theoretic diagnostics, reveals the structure that mechanical thinking misses.

The Divergence Notebook Series¶

#	Notebook	What it covers
1	Divergence	Shannon's foundations: entropy, cross entropy, KL divergence, Jensen-Shannon, mutual information, joint and conditional entropy
2	Beyond KL	f-divergences (TV, Hellinger, chi-squared, Jeffreys, Cressie-Read) and the Rényi family
3	Distances and Testing	Sample-based methods: Wasserstein, Sinkhorn, energy distance, MMD, kNN estimators, two-sample permutation tests
4	Dependence and Causality	Multivariate dependence (TC, NMI, VI) and directed information flow (transfer entropy)
5	Bayesian Diagnostics	End-to-end MCMC with emcee on the Nile change-point — convergence diagnostics, information gain, Bayesian surprise
6	Real-World Applications	Stock market contagion, crop yields, Phillips Curve — real data, real stakes
7	Score-Based Divergences: Fisher and Stein	Fisher divergence and kernel Stein discrepancy
8	Did My Sampler Find the Truth?	KSD as convergence diagnostic with NumPyro: NUTS vs VI, the 250-year journey from Bayes to Stein
9	Phillips Curve TVP (this notebook)	Time-varying Phillips Curve with PyJAGS Gibbs sampling — stagflation as a structural break