Skip to contents

Introduction

The random coefficients (RC) logit – also known as the mixed logit or BLP model – is the workhorse of modern demand estimation for differentiated products. Unlike the plain logit, it generates flexible substitution patterns that are not subject to the IIA property.

This vignette covers:

  1. The model and its parameters (σ\sigma, π\pi)
  2. Integration methods for simulating the mixing distribution
  3. Estimation with and without demographics
  4. GMM methods and convergence diagnostics

For the plain logit and nested logit, see vignette("logit-nested-logit"). For post-estimation analysis, see vignette("post-estimation-mergers").

The Random Coefficients Model

Utility Specification

Consumer ii in market tt receives utility from product jj:

uijt=δjt+μijt+εijtu_{ijt} = \delta_{jt} + \mu_{ijt} + \varepsilon_{ijt}

where:

  • δjt=xjtβ+αpjt+ξjt\delta_{jt} = x_{jt}'\beta + \alpha p_{jt} + \xi_{jt} is the mean utility (common to all consumers)
  • μijt=kxjkt(σkνik+πkdDid)\mu_{ijt} = \sum_k x_{jkt}(\sigma_k \nu_{ik} + \pi_{kd} D_{id}) is the individual deviation from the mean
  • εijt\varepsilon_{ijt} is a Type I EV idiosyncratic shock

The individual-specific component μijt\mu_{ijt} has two sources of heterogeneity:

  • Unobserved heterogeneity (σ\sigma): νikN(0,1)\nu_{ik} \sim N(0, 1) are standard normal draws, and σk\sigma_k captures the dispersion of tastes for characteristic kk
  • Observed heterogeneity (π\pi): DidD_{id} are observed demographics (income, age, etc.), and πkd\pi_{kd} captures how demographic dd shifts preferences for characteristic kk

The Sigma Matrix

The σ\sigma parameter is a K2×K2K_2 \times K_2 lower-triangular matrix (Cholesky root). In many applications, σ\sigma is restricted to be diagonal, meaning that unobserved taste shocks for different characteristics are independent.

Convention: when specifying starting values for sigma:

  • Zero on the diagonal means that parameter is fixed at zero (not estimated)
  • Non-zero values are starting values for optimization

For example, with K2=4K_2 = 4 characteristics (intercept, prices, sugar, mushy):

# Diagonal sigma: all four RC variances are free
sigma <- diag(c(0.5, 3.0, 0.01, 0.1))

# Fix sugar and mushy RC at zero, estimate only intercept and price RC
sigma <- diag(c(0.5, 3.0, 0, 0))

The Pi Matrix

When demographics are available, the π\pi matrix is K2×DK_2 \times D, where DD is the number of demographic variables. Element πkd\pi_{kd} captures how demographic dd shifts the taste coefficient for characteristic kk.

Integration Methods

The RC logit requires numerical integration over the distribution of unobserved heterogeneity ν\nu. The choice of integration method affects both accuracy and computation time.

Product Rule (Gauss-Hermite)

The product rule uses Gauss-Hermite quadrature nodes, taking the tensor product across dimensions. With size nodes per dimension and K2K_2 random coefficients, this yields sizeK2\text{size}^{K_2} integration points per market.

# 5 nodes per dimension, K2=4 -> 5^4 = 625 points per market
int_product <- blp_integration("product", size = 5)

# 3 nodes per dimension (faster, less accurate)
int_small <- blp_integration("product", size = 3)  # 3^4 = 81 points

The product rule is deterministic (no simulation noise) and provides exact integration for polynomials up to degree 2×size12 \times \text{size} - 1. However, it suffers from the curse of dimensionality: the number of points grows exponentially with K2K_2.

Monte Carlo

Standard Monte Carlo draws random points from the mixing distribution. With size draws:

# 1000 Monte Carlo draws per market
int_mc <- blp_integration("monte_carlo", size = 1000, seed = 42)

Monte Carlo integration is flexible but introduces simulation noise that can cause roughness in the GMM objective function, making optimization difficult.

Halton Sequences

Halton sequences are quasi-random low-discrepancy sequences that provide more uniform coverage than pseudo-random draws:

# 100 Halton draws per market (requires randtoolbox package)
int_halton <- blp_integration("halton", size = 100, seed = 42)

Recommendation

For most applications:

  • Small K2K_2 (2–4): product rule with size = 5 or size = 7
  • Moderate K2K_2 (5–8): Halton sequences with 100–500 points
  • Large K2K_2 (9+): Monte Carlo with 1000+ draws

RC Logit without Demographics

The simplest RC logit has only unobserved heterogeneity (no demographics). This requires two formulations: one for the linear component (X1X_1) and one for the nonlinear component (X2X_2).

Problem Setup

library(rblp)
products <- load_nevo_products()

# X1: linear demand (intercept + prices + sugar + mushy)
f1 <- blp_formulation(~ prices + sugar + mushy)

# X2: characteristics with random coefficients (same as X1 here)
f2 <- blp_formulation(~ prices + sugar + mushy)

# Create the problem with Gauss-Hermite integration
rc_problem <- blp_problem(
  product_formulations = list(f1, f2),
  product_data = products,
  integration = blp_integration("product", size = 3)  # 3^4 = 81 points
)

cat(sprintf("K1 (linear params): %d\n", rc_problem$K1))
#> K1 (linear params): 4
cat(sprintf("K2 (random coefficients): %d\n", rc_problem$K2))
#> K2 (random coefficients): 4
cat(sprintf("Total integration nodes: %d (%d per market)\n",
            rc_problem$I, rc_problem$I / rc_problem$T))
#> Total integration nodes: 7614 (81 per market)
cat(sprintf("Total observations: %d\n", rc_problem$N))
#> Total observations: 2256
cat(sprintf("Markets: %d\n", rc_problem$T))
#> Markets: 94

The problem has 4 random coefficients (intercept, prices, sugar, mushy), so with size = 3 Gauss-Hermite nodes per dimension, we get 34=813^4 = 81 integration points per market.

Estimation

# Sigma starting values (4x4 diagonal)
initial_sigma <- diag(c(0.5, 0.5, 0.5, 0.5))

# Solve with one-step GMM
rc_results <- rc_problem$solve(
  sigma = initial_sigma,
  method = "1s",
  optimization = blp_optimization("l-bfgs-b",
    method_options = list(maxit = 200, factr = 1e7))
)

print(rc_results)

Interpretation:

  • The estimated σ̂k\hat{\sigma}_k values capture the standard deviation of taste heterogeneity for each characteristic
  • Large σ̂price\hat{\sigma}_{\text{price}} means consumers differ substantially in price sensitivity
  • These taste differences generate realistic substitution: consumers who are less price-sensitive substitute toward premium products

RC Logit with Demographics (Full Nevo Specification)

The full Nevo (2000) specification includes observed demographics (income, age, child presence) interacting with product characteristics. This is the canonical BLP specification that matches the pyblp Nevo tutorial.

Problem Setup

agents <- load_nevo_agents()

cat("Agent data columns:\n")
#> Agent data columns:
cat(paste(names(agents), collapse = ", "), "\n\n")
#> market_ids, city_ids, quarter, weights, nodes0, nodes1, nodes2, nodes3, income, income_squared, age, child
cat(sprintf("Agents: %d (%d per market)\n", nrow(agents),
            nrow(agents) / length(unique(agents$market_ids))))
#> Agents: 1880 (20 per market)

# Inspect the agent data
head(agents[, c("market_ids", "weights", "income", "income_squared", "age", "child")])
#>   market_ids weights     income income_squared          age      child
#> 1      C01Q1    0.05  0.4951235       8.331304 -0.230109009 -0.2308511
#> 2      C01Q1    0.05  0.3787622       6.121865 -2.532694102  0.7691489
#> 3      C01Q1    0.05  0.1050146       1.030803 -0.006965458 -0.2308511
#> 4      C01Q1    0.05 -1.4854809     -25.583605 -0.827946010  0.7691489
#> 5      C01Q1    0.05 -0.3165969      -6.517009 -0.230109009 -0.2308511
#> 6      C01Q1    0.05 -0.3372762      -6.878070  0.851696161 -0.2308511
# X1: linear demand with product fixed effects (absorb product dummies)
# Only price is estimated; intercept, sugar, mushy are absorbed
f1 <- blp_formulation(~ 0 + prices, absorb = ~ product_ids)

# X2: nonlinear demand (intercept + prices + sugar + mushy get RC)
f2 <- blp_formulation(~ prices + sugar + mushy)

# Agent formulation: demographics that interact with X2
f_demo <- blp_formulation(~ 0 + income + income_squared + age + child)

# Create the problem (agent data provides nodes, weights, and demographics)
demo_problem <- blp_problem(
  product_formulations = list(f1, f2),
  product_data = products,
  agent_formulation = f_demo,
  agent_data = agents
)

cat(sprintf("K1 (linear params): %d (only prices, FE absorbed)\n", demo_problem$K1))
#> K1 (linear params): 1 (only prices, FE absorbed)
cat(sprintf("K2 (random coefficients): %d\n", demo_problem$K2))
#> K2 (random coefficients): 4
cat(sprintf("D (demographics): %d\n", demo_problem$D))
#> D (demographics): 4
cat(sprintf("Agents per market (from data): %d\n", demo_problem$I))
#> Agents per market (from data): 1880
cat(sprintf("Sigma matrix: %d x %d\n", demo_problem$K2, demo_problem$K2))
#> Sigma matrix: 4 x 4
cat(sprintf("Pi matrix: %d x %d\n", demo_problem$K2, demo_problem$D))
#> Pi matrix: 4 x 4

The sigma and pi matrices encode the parameter structure:

# Sigma (4x4 diagonal): RC standard deviations
# Starting values near pyblp's converged estimates
sigma0 <- diag(c(0.558, 3.312, -0.006, 0.093))

# Pi (4x4): interactions between X2 characteristics and demographics
# Rows = X2 vars (intercept, prices, sugar, mushy)
# Cols = demographics (income, income_squared, age, child)
pi0 <- matrix(c(
  2.292, 0, 1.284, 0,        # intercept interactions
  588.3, -30.19, 0, 11.05,   # price interactions
  -0.384, 0, 0.0524, 0,      # sugar interactions
  0.748, 0, -1.354, 0        # mushy interactions
), nrow = 4, ncol = 4, byrow = TRUE)

# Solve
demo_results <- demo_problem$solve(
  sigma = sigma0,
  pi = pi0,
  method = "1s",
  optimization = blp_optimization("l-bfgs-b",
    method_options = list(maxit = 1000, factr = 1e7))
)

print(demo_results)
# Price coefficient ~ -63, matching pyblp's Nevo tutorial

Key points about the Nevo specification:

  • The agent data includes 20 agents per market with pre-computed nodes (columns nodes0nodes3), integration weights, and demographic variables
  • Product fixed effects in X1X_1 absorb brand-level unobserved quality, so only the price coefficient is estimated in the linear part
  • The large number of moment conditions (20 excluded instruments + exogenous characteristics) helps identify the many nonlinear parameters
  • The Pi matrix zeros encode economic restrictions (e.g., income_squared only interacts with the price coefficient)

Two-Step vs. One-Step GMM

One-Step GMM (method = "1s")

Uses the initial weighting matrix W=(ZZ/N)1W = (Z'Z/N)^{-1} (2SLS weight) throughout. This produces consistent estimates but is not asymptotically efficient.

Two-Step GMM (method = "2s")

  1. Step 1: Estimate with the 2SLS weight W1=(ZZ/N)1W_1 = (Z'Z/N)^{-1}
  2. Update: Compute the optimal weight W*=Ŝ1W^* = \hat{S}^{-1} from Step 1 residuals
  3. Step 2: Re-estimate with W*W^*

Two-step GMM is efficient (achieves the smallest asymptotic variance among GMM estimators) but can exhibit finite-sample bias, especially with many moment conditions.

# One-step (faster, consistent)
results_1s <- problem$solve(sigma = sigma0, method = "1s")

# Two-step (efficient, slower)
results_2s <- problem$solve(sigma = sigma0, method = "2s")

Convergence Diagnostics

The RC logit objective function can be non-convex, so checking convergence is important.

Optimizer Convergence

# Check if the optimizer converged
results$optimization_converged

# Number of iterations and function evaluations
results$optimization_iterations
results$optimization_evaluations

# Final objective value
results$objective

# Gradient at solution (should be near zero)
max(abs(results$gradient))

Fixed-Point Convergence

The BLP contraction mapping must converge in every market at every evaluation:

# Did all fixed points converge?
results$fp_converged

# Total contraction iterations (across all markets and evaluations)
results$fp_iterations

Best Practices

  1. Try multiple starting values: the GMM objective may have local minima
  2. Start from the logit solution: use logit estimates as a sanity check
  3. Use tight contraction tolerances: blp_iteration("squarem") is the default and is faster than the simple contraction
  4. Increase integration accuracy: more nodes reduce simulation error
  5. Check the gradient norm: it should be small at convergence

Optimization Options

rblp supports several optimizers:

# L-BFGS-B (default): quasi-Newton with bounds, uses gradient
opt_lbfgsb <- blp_optimization("l-bfgs-b",
  method_options = list(maxit = 1000, factr = 1e7))

# Nelder-Mead: derivative-free simplex method
opt_nm <- blp_optimization("nelder-mead",
  method_options = list(maxit = 5000))

# nlminb: PORT optimization, good with bounds
opt_nlminb <- blp_optimization("nlminb",
  method_options = list(iter.max = 1000, eval.max = 2000))

For most applications, L-BFGS-B is recommended because it uses analytical gradients (computed via the implicit function theorem) and respects parameter bounds.

Iteration Options

The BLP contraction mapping can be accelerated with SQUAREM:

# SQUAREM (default): accelerated fixed-point iteration
iter_squarem <- blp_iteration("squarem")

# Simple contraction: safer but slower
iter_simple <- blp_iteration("simple",
  list(atol = 1e-14, max_evaluations = 5000))

SQUAREM typically converges in 10–50% fewer iterations than the simple contraction, with no loss of accuracy.

Summary of Key Formulas

Component Formula
Mean utility δjt=xjtβ+αpjt+ξjt\delta_{jt} = x_{jt}'\beta + \alpha p_{jt} + \xi_{jt}
Individual deviation μijt=kxjkt(σkνik+πkdDid)\mu_{ijt} = \sum_k x_{jkt}(\sigma_k \nu_{ik} + \pi_{kd} D_{id})
Choice probability sijt=exp(δjt+μijt)1+kexp(δkt+μikt)s_{ijt} = \frac{\exp(\delta_{jt} + \mu_{ijt})}{1 + \sum_k \exp(\delta_{kt} + \mu_{ikt})}
Predicted share sjt(δ,θ)=sijtdF(ν,D)s_{jt}(\delta, \theta) = \int s_{ijt} \, dF(\nu, D)
Contraction δr+1=δr+logsobslogs(δr,θ)\delta^{r+1} = \delta^r + \log s^{\text{obs}} - \log s(\delta^r, \theta)
Moment condition E[Zjtξjt(θ)]=0E[Z_{jt}' \xi_{jt}(\theta)] = 0
GMM objective Q(θ)=g(θ)Wg(θ)Q(\theta) = g(\theta)' W g(\theta), where g=Zξ/Ng = Z'\xi / N

References

  • Berry, S., Levinsohn, J., & Pakes, A. (1995). Automobile Prices in Market Equilibrium. Econometrica, 63(4), 841-890.
  • Nevo, A. (2000). A Practitioner’s Guide to Estimation of Random-Coefficients Logit Models of Demand. Journal of Economics & Management Strategy, 9(4), 513-548.
  • Conlon, C. & Gortmaker, J. (2020). Best Practices for Differentiated Products Demand Estimation with pyblp. RAND Journal of Economics, 51(4), 1108-1161.
  • Varadhan, R. & Roland, C. (2008). Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scandinavian Journal of Statistics, 35(2), 335-353.