Skip to contents

Overview

rblp is an R implementation of the BLP (Berry, Levinsohn, Pakes 1995) framework for estimating demand models for differentiated products. It is translated from the pyblp Python package by Conlon and Gortmaker (2020).

The package supports:

  • Plain logit and nested logit demand estimation
  • Random coefficients logit (mixed logit) with agent-level heterogeneity
  • Demographics via Pi matrix interactions
  • Product fixed effects via the absorb parameter
  • GMM estimation (1-step and 2-step)
  • Post-estimation: elasticities, consumer surplus, HHI, diversion ratios
  • Simulation: generate equilibrium data and test estimation
  • Merger simulation: compute post-merger prices and welfare effects

Quick Start

library(rblp)

# Load the Nevo (2000) cereal data
products <- load_nevo_products()
head(products[, c("market_ids", "product_ids", "shares", "prices", "sugar", "mushy")])
#>   market_ids product_ids      shares     prices sugar mushy
#> 1      C01Q1       F1B04 0.012417212 0.07208794     2     1
#> 2      C01Q1       F1B06 0.007809387 0.11417849    18     1
#> 3      C01Q1       F1B07 0.012994511 0.13239066     4     1
#> 4      C01Q1       F1B09 0.005769961 0.13034408     3     0
#> 5      C01Q1       F1B11 0.017934141 0.15482331    12     0
#> 6      C01Q1       F1B13 0.026601892 0.13704921    14     0

The data has 2256 product-market observations across 94 markets. The demand_instruments0 through demand_instruments19 columns contain 20 pre-computed BLP-style excluded instruments.

Example 1: Plain Logit

The simplest demand model. We regress the log odds ratio δjt=log(sjt)log(s0t)\delta_{jt} = \log(s_{jt}) - \log(s_{0t}) on product characteristics using IV-GMM.

# Define the linear model: intercept + prices + sugar + mushy
f1 <- blp_formulation(~ prices + sugar + mushy)

# Create and solve the problem
logit_problem <- blp_problem(list(f1), products)
logit_results <- logit_problem$solve(method = "1s")

print(logit_results)
#> BLP Estimation Results
#>   Method: 1S GMM
#>   Objective: 2.047932e+02
#>   Optimization converged: TRUE
#>   FP converged: TRUE (94 total iterations)
#> 
#> Parameter Estimates:
#>  parameter   estimate   se       t_stat 
#>  (Intercept) -2.810674  0.109432 -25.684
#>  prices      -11.699731 0.858444 -13.629
#>  sugar       0.048381   0.004208 11.498 
#>  mushy       0.043127   0.052717 0.818

The price coefficient is negative as expected – higher prices reduce utility. The exogenous characteristics (sugar, mushy) serve as instruments for prices.

Example 2: Logit with Product Fixed Effects

This matches the pyblp Nevo tutorial specification. Product fixed effects absorb all time-invariant product characteristics (brand, sugar, mushy, etc.), so we only estimate the price coefficient.

# Absorb product-level fixed effects (Frisch-Waugh-Lovell demeaning)
f1_fe <- blp_formulation(~ prices, absorb = ~ product_ids)

fe_problem <- blp_problem(list(f1_fe), products)
fe_results <- fe_problem$solve(method = "1s")

print(fe_results)
#> BLP Estimation Results
#>   Method: 1S GMM
#>   Objective: 1.797148e+02
#>   Optimization converged: TRUE
#>   FP converged: TRUE (664 total iterations)
#> 
#> Parameter Estimates:
#>  parameter estimate   se       t_stat 
#>  prices    -30.420493 1.030311 -29.526

The price coefficient of -30.42 matches pyblp’s reported value of approximately -30.

Example 3: Random Coefficients Logit

The random coefficients (RC) logit allows consumer heterogeneity in preferences. Each consumer ii has individual-specific taste coefficients:

uijt=δjt+kxjkσkνik+εijtu_{ijt} = \delta_{jt} + \sum_k x_{jk} \sigma_k \nu_{ik} + \varepsilon_{ijt}

where νik\nu_{ik} are standard normal draws and σk\sigma_k captures the dispersion of tastes.

# X1: linear demand characteristics
f1_rc <- blp_formulation(~ prices + sugar + mushy)
# X2: random coefficient characteristics (same as X1 here)
f2_rc <- blp_formulation(~ prices + sugar + mushy)

rc_problem <- blp_problem(
  product_formulations = list(f1_rc, f2_rc),
  product_data = products,
  integration = blp_integration("product", size = 3)  # Gauss-Hermite quadrature
)

# Sigma: diagonal matrix with starting values (0 = fixed, non-zero = free)
initial_sigma <- diag(c(0.5, 0.5, 0.5, 0.5))

rc_results <- rc_problem$solve(
  sigma = initial_sigma,
  method = "1s",
  optimization = blp_optimization("l-bfgs-b",
    method_options = list(maxit = 200, factr = 1e7))
)

print(rc_results)

Key parameters:

  • integration = blp_integration("product", size = 3) uses Gauss-Hermite product rule with 3 nodes per dimension (34=813^4 = 81 integration points per market)
  • sigma is the K2 x K2 Cholesky root. Zeros on the diagonal fix that parameter; non-zero values are starting values for optimization
  • method = "2s" runs two-step efficient GMM (default); "1s" runs one-step

Example 4: RC Logit with Demographics

Demographics allow observed consumer characteristics to interact with product characteristics via the Pi matrix:

uijt=δjt+kxjk(σkνik+dπkdDid)+εijtu_{ijt} = \delta_{jt} + \sum_k x_{jk} (\sigma_k \nu_{ik} + \sum_d \pi_{kd} D_{id}) + \varepsilon_{ijt}

# Load agent (consumer) data with demographics
agents <- load_nevo_agents()
head(agents[, c("market_ids", "weights", "income", "age", "child")])

# Demographics formulation (no intercept)
demo_form <- blp_formulation(~ 0 + income + income_squared + age + child)

# pyblp specification: X1 = '0 + prices' with product FE absorbed
# X2 = '1 + prices + sugar + mushy' (with intercept)
demo_problem <- blp_problem(
  product_formulations = list(
    blp_formulation(~ 0 + prices, absorb = ~ product_ids),  # X1
    blp_formulation(~ prices + sugar + mushy)                # X2
  ),
  product_data = products,
  agent_formulation = demo_form,
  agent_data = agents
)

# Sigma (4x4 diagonal) and Pi (4x4, K2 x D)
# Starting values near pyblp's converged estimates
sigma0 <- diag(c(0.558, 3.312, -0.006, 0.093))
pi0 <- matrix(c(
  2.292, 0, 1.284, 0,
  588.3, -30.19, 0, 11.05,
  -0.384, 0, 0.0524, 0,
  0.748, 0, -1.354, 0
), nrow = 4, ncol = 4, byrow = TRUE)

demo_results <- demo_problem$solve(
  sigma = sigma0, pi = pi0,
  method = "1s",
  optimization = blp_optimization("l-bfgs-b",
    method_options = list(maxit = 1000, factr = 1e7))
)

print(demo_results)
# Price coefficient ~ -63, matching pyblp's Nevo tutorial

Post-Estimation

After estimation, compute policy-relevant quantities:

Elasticities

# Compute elasticities for the first market
first_market <- logit_results$problem$unique_market_ids[1]
E <- logit_results$compute_elasticities(first_market)

# Own-price elasticities (diagonal)
cat("Own-price elasticities (first 5 products):\n")
#> Own-price elasticities (first 5 products):
print(round(diag(E)[1:5], 3))
#> [1] -0.833 -1.325 -1.529 -1.516 -1.779

Consumer Surplus

cs <- logit_results$compute_consumer_surplus()
cat("Consumer surplus (first 5 markets):\n")
#> Consumer surplus (first 5 markets):
print(round(cs[1:5], 4))
#>  C01Q1  C03Q1  C04Q1  C05Q1  C07Q1 
#> 0.0503 0.0458 0.0944 0.0472 0.0772

HHI

hhi <- logit_results$compute_hhi()
cat("HHI (first 5 markets):\n")
#> HHI (first 5 markets):
print(round(hhi[1:5], 0))
#> C01Q1 C03Q1 C04Q1 C05Q1 C07Q1 
#>   711   643  1435   566  1048

Simulation

Generate synthetic equilibrium data to test the estimator:

# Create a balanced panel: 50 markets, 20 products/market, 4 firms
id_data <- build_id_data(T = 50, J = 20, F = 4)
set.seed(42)
id_data$x <- runif(nrow(id_data), 0, 1)

# True parameters: beta = (intercept=0.5, price=-2, x=0.8)
sim <- blp_simulation(
  product_formulations = list(blp_formulation(~ prices + x)),
  product_data = id_data,
  beta = c(0.5, -2, 0.8),
  xi_variance = 0.3,
  seed = 42
)

# Solve for equilibrium prices and shares
sim_results <- sim$replace_endogenous(
  iteration = blp_iteration("simple", list(atol = 1e-12, max_evaluations = 5000))
)

cat(sprintf("Price range: [%.3f, %.3f]\n", min(sim_results$prices), max(sim_results$prices)))
#> Price range: [1.550, 1.715]
cat(sprintf("Share range: [%.5f, %.5f]\n", min(sim_results$shares), max(sim_results$shares)))
#> Share range: [0.00351, 0.15537]

Now estimate back the true parameters:

sim_problem <- sim_results$to_problem()
sim_est <- sim_problem$solve(method = "1s")

cat("True vs Estimated:\n")
#> True vs Estimated:
cat(sprintf("  intercept: true=0.50, est=%.4f\n", sim_est$beta[1]))
#>   intercept: true=0.50, est=-1.1803
cat(sprintf("  price:     true=-2.00, est=%.4f\n", sim_est$beta[2]))
#>   price:     true=-2.00, est=-0.9726
cat(sprintf("  x:         true=0.80, est=%.4f\n", sim_est$beta[3]))
#>   x:         true=0.80, est=0.8334

Merger Simulation

Simulate the effect of a merger between firms 2 and 3:

new_firm_ids <- sim_results$product_data$firm_ids
new_firm_ids[new_firm_ids == 3] <- 2  # merge firm 3 into firm 2

merger <- sim_est$compute_merger(
  new_firm_ids = new_firm_ids,
  iteration = blp_iteration("simple", list(atol = 1e-12, max_evaluations = 5000))
)

cat(sprintf("Mean price change: %.2f%%\n", mean(merger$price_change_pct)))
#> Mean price change: 7.61%
cat(sprintf("Max price change:  %.2f%%\n", max(merger$price_change_pct)))
#> Max price change:  25.48%
cat(sprintf("Mean CS change:    %.4f\n", mean(merger$delta_cs)))
#> Mean CS change:    -0.0786

Key Classes and Functions

Function Purpose
blp_formulation() Define model formulas (with optional absorb for FE)
blp_problem() Create an estimation problem from data and formulations
blp_simulation() Create a simulation with known parameters
blp_integration() Configure integration (product rule, Monte Carlo, etc.)
blp_iteration() Configure fixed-point iteration (SQUAREM, simple, etc.)
blp_optimization() Configure optimization (L-BFGS-B, Nelder-Mead, etc.)
build_blp_instruments() Construct BLP-style instruments
build_differentiation_instruments() Gandhi-Houde differentiation instruments
build_id_data() Create balanced panel identifiers

Comparison with pyblp

rblp aims to replicate pyblp’s functionality in R. Key correspondence:

pyblp rblp
Formulation('prices', absorb='C(product_ids)') blp_formulation(~ prices, absorb = ~ product_ids)
Problem(formulation, product_data) blp_problem(list(f1), product_data)
problem.solve() problem$solve()
results.compute_elasticities() results$compute_elasticities()
Integration('product', size=5) blp_integration("product", size = 5)
Simulation(...) blp_simulation(...)

Further Reading

The following vignettes cover specific topics in greater depth:

  • vignette("logit-nested-logit"): Derivation and estimation of the logit and nested logit demand models, including the logit inversion, price endogeneity, and the role of instruments.
  • vignette("random-coefficients"): The random coefficients (mixed) logit model, integration methods, the sigma/pi parameterization, and convergence diagnostics.
  • vignette("post-estimation-mergers"): Post-estimation analysis including elasticities, diversion ratios, consumer surplus, markups, merger simulation, and specification tests.
  • vignette("mixtape-exercises"): Exercises inspired by the Mixtape Sessions demand estimation course, progressing from OLS logit through IV logit to random coefficients logit.
  • vignette("simulation-monte-carlo"): Using the BLPSimulation class for roundtrip validation, instrument construction, and Monte Carlo study design.

References

  • Berry, S., Levinsohn, J., & Pakes, A. (1995). Automobile Prices in Market Equilibrium. Econometrica, 63(4), 841-890.
  • Nevo, A. (2000). A Practitioner’s Guide to Estimation of Random-Coefficients Logit Models of Demand. Journal of Economics & Management Strategy, 9(4), 513-548.
  • Conlon, C. & Gortmaker, J. (2020). Best Practices for Differentiated Products Demand Estimation with pyblp. RAND Journal of Economics, 51(4), 1108-1161.