BLP Demystified: From Basics to Brain-Busting Models!

Author

Hyoungchul Kim

Install necessary julia packages for later computation:

# You might need these commented codes to install packages
# using Pkg
# Pkg.add(["DataFrames", "CSV", "GLM", "Statistics", "LinearAlgebra", "Distributions", "NLopt", "FixedEffectModels", "RegressionTables"])

1 BLP

This exercise estimates the demand-side BLP model.

1.1 Motivation

Why do this? Demand estimation is very important in IO literature because measuring market power is important in IO. How do we quantify market power? Usually we use markup as the measure. But it is hard to directly calculate markup because it depends on the cost function of the firm which is not observed. But IO theory shows that we can actually get the markup using demand elasticity. Thus estimating demand is important.

1.2 Basic: McFadden (1974) style logit model

1.2.1 Model setup

We will first estimate a basic logit model with no unobserved demand shifters and no random coefficents. But let’s just talk bit about the background of this discrete choice model. Note that most of it is from Train (2009).

Even before McFadden (1974), there has been a long history of the development of the logit model. But McFadden (1974) provides a complete, well-defined econometric model that is consistent with the utility maximization behavior of individuals.

Individual’s (\(i\)) utility maximizing behavior (indirect utility) can be specified as follows:

\[ u_{ij} = \underbrace{x_j \beta + \alpha p_j}_{\delta_j} + \varepsilon_{ij} \]

where mean utility of outside option is normalized to zero. Also, idiosyncratic shock (i.i.d) follows Type 1 Extreme Value distribution (T1EV). We also assume there are \(0, \ldots, J\) products (denote \(0\) as the outside option) where one option is outside option. We can think of \(\delta_j\) as the mean utility from the product \(j\). This is because in this parameterization, \(\delta_j\) does not depend on \(i\).

Now let’s do some math to derive the logit choice probabilities. One benefit about logit model is that we can get a close-form solution. We are going to compute the probability of individuals choosing product \(j\) given \(p_j\), and \(x_j\).

\[\begin{align} P (u_{ij} \geq \forall_{j' \neq j} u_{ij'} \mid x_j, p_j) &= P (x_j \beta + \alpha p_j + \varepsilon_{ij} \geq \forall_{j' \neq j} x_{j'}\beta + \alpha p_{j'} + \varepsilon_{ij'} \mid x_j, p_j) \\ &= P ( \varepsilon_{ij'} \leq \varepsilon_{ij} + \delta_j - \delta_{j'} \, \forall j' \neq j). \end{align}\]

If we assume that \(\varepsilon_{ij}\) is given, we can think of the last term as the cumulative distribution of the T1EV where \(F(\varepsilon_{ij}) = e^{-e^{- \varepsilon_{ij}}}\). Since we assumed i.i.d., we can express the last term as the product of the individual cumulative distributions (For brevity, we will now denote the conditional logit choice probability as \(P_{ij}\)):

\[ P_{ij} \mid \varepsilon_{ij} = \prod_{j' \neq j} e^{ - e^{-(\varepsilon_{ij} + \delta_j - \delta_{j'})}}. \]

Since \(\varepsilon_{ij}\) is not given, we need o integrate it over density of \(\varepsilon_{ij}\):

\[ P_{ij} = \int \left(\prod_{j' \neq j} e^{ - e^{-(\varepsilon_{ij} + \delta_j - \delta_{j'})}} \right) e^{- \varepsilon_{ij}} e^{-e^{\varepsilon_{ij}}} d \varepsilon_{ij}. \]

Now let’s get this into a closed-form expression:

As a result, we can get the closed-form expression:

\[ P_{ij} = \frac{e^{\delta_{ij}}}{\sum_{j'} e^{\delta_{ij'}}} \]

This could be understood as the predicted share function given the fixed values of the parameters.

Note that this is a very simple model because we are not assuming any unobserved product demand shifters that could be affected the utility gained from the product. In fact, we are assuming that econometricians can fully observe all the necessary variables that constructs the mean utility. Thus there is not much econometrics involved. You can just get the parameters as follows:

  1. Assuming you have the data on market share, you can use it to match it to \(P_{ij} \cdot M\) where \(M\)is the total market size.

  2. Then since we will get \(J\) equations using \(J\) market share, we can do simple algebra to get the mean utility \(\delta_j\).

  3. Then you can do some nonlinear least squares that minimize the sum of the differences between oberved and predicted shares of all products. This will get you the parameters that best fit the data.

1.2.2 Adding unobserved demand shifters

We can add the additional unobserved variables \(\xi_j\) which can be thought of as some demand shifter for product \(j\). This allows the model to be more flexible to incorporate the realistic situation where econometrician might not be able to observe some components that might be affecting the utility of getting some product. Thus most of what we did above does not change much. The only problem would be understanding the nature of this unobserved terms with other main parameters of interest. If there is endogeneity, we would need some IV to estimate the parameter. In this section, we will do both cases (OLS, IV).

1.2.3 Computation (Following Berry (1994))

So how can we retrieve the parameters of interest? Naive way to think about it would be doing some nonlinear least squares where you minimize the sum of differences between predicted share and observed shares of all products. The problem is that this directy way is implausible: You would need to know the \(\xi_j\). Since this is unobservable, it is problematic.

This is where Berry (1994) comes in. He introduces this clever two steps estimation process.

Step 1: Inversion

Notation: Let \(\hat{s}_j (\delta)\) be predicted shares and let \(s_j\) be observed shares.1

Then you can use the system of equations from matching actual to predicted shares and invert them to get the mean utility. For this simple case, we can get the following equations:

\[ \delta_j = \log s_j - \log \hat{s}_0, \quad j = 1, \ldots, J. \]

So this inversion gets us the value of the mean utility. Then we have the second step.

Step 2: IV estimation

By definition, we have \(\delta_j = x_j \beta + \alpha p_j + \xi_j\). So we can do the regression to retrieve the parameters. I put IV, but this could be just OLS if you can assume the unobserved term is uncorrelated with the price.

1.2.4 Coding (with Julia)

Finally we will do some coding to get the result we just talked about.

using FixedEffectModels, DataFrames, CSV, RegressionTables 

# read in the data
otc = CSV.read("data/otc.csv", DataFrame)

# Run regressions
ols1 = reg(otc, @formula(ln_mkt_share_diff ~ price + promotion)) 
ols2 = reg(otc, @formula(ln_mkt_share_diff ~ price + promotion + fe(product)))
ols3 = reg(otc, @formula(ln_mkt_share_diff ~ price + promotion + fe(product) + fe(store)))
iv1 = reg(otc, @formula(ln_mkt_share_diff ~ (price ~ cost) + promotion))
iv2 = reg(otc, @formula(ln_mkt_share_diff ~ (price ~ cost) + promotion + fe(product)))

regtable(ols1, ols2, ols3, iv1, iv2, order = ["price"], drop = ["(Intercept)"], regression_statistics = [FStatIV, Nobs, R2],
  labels = Dict(
    "price" => "Price",
    "promotion" => "Promotion",
    "ln_mkt_share_diff" => "Log Mkt share difference"
  ))
## Some R codes that I followed

# m1 <- lm(ln_mkt_share_diff ~ price + promotion , data = otc)
# m2 <- lm(ln_mkt_share_diff ~ price + promotion + factor(product), data = otc)
# m3 <- lm(ln_mkt_share_diff ~ price + promotion + factor(product) + factor(store), data = otc)
# m4 <- ivreg(ln_mkt_share_diff ~ price + promotion | . - price + cost, data = otc)
# m5 <- ivreg(ln_mkt_share_diff ~ price + promotion + factor(product) | . - price + cost, data = otc)
# stargazer(m1, m2, m3, m4, m5, 
#           omit = c("product", "store"),
#           type = "text")

-----------------------------------------------------------------------------
                                        Log Mkt share difference             
                          ---------------------------------------------------
                              (1)        (2)       (3)         (4)        (5)
-----------------------------------------------------------------------------
Price                       0.020   -0.189**   -0.145*    0.069***      0.169
                          (0.014)    (0.059)   (0.059)     (0.015)    (0.115)
Promotion                   0.121     0.187*   0.201**       0.149   0.308***
                          (0.093)    (0.074)   (0.073)     (0.093)    (0.082)
-----------------------------------------------------------------------------
product Fixed Effects                    Yes       Yes                    Yes
store Fixed Effects                                Yes                       
-----------------------------------------------------------------------------
Estimator                     OLS        OLS       OLS          IV         IV
-----------------------------------------------------------------------------
Controls                      Yes                              Yes           
-----------------------------------------------------------------------------
First-stage F statistic                                  8,147.921    394.113
N                           1,056      1,056     1,056       1,056      1,056
R2                          0.003      0.440     0.456      -0.008      0.420
-----------------------------------------------------------------------------

1.2.5 Caveats

But we don’t usually use this basic setup in IO. This is because the model is bit too simple to fully capture the reality. One of the well known problem is the Independence of irrelevant alternatives (IIA). Basically what this means is that we don’t get a realistic demand elasticities. If you want to know more about it, google the famouse Red bus, blue bus story.

1.2.6 Solutions?

There are some ways to alleviate this problem. One of them (which we will not discuss), is using nested logit. Basically we are defining certain group of products where IIA holds within the group but may not hold across the group. So for the case of red bus, blue bus, they would be in a same group.

Another way is to do enhance the random utility model into logit model with random coefficients. In essence, this is sort of introducing preference heterogeneity of consumers into the model. This is done by interacting consumer preferences with product characteristics. The nuisance with this case is that now closed-form expression for choice probability is not obtainable. We need to do some numerical computation.

1.3 Advanced: Berry, Levinsohn, and Pakes (1995) (Random coefficients logit model)

We again start with the individual utility function. But now something is added (we will now also explicitly denote markets as \(t\)):

\[ u_{ijt} = x_{jt} \beta_{it} + \alpha p_{jt} + \xi_{jt} + \varepsilon_{ijt} \]

The difference is that slope coefficients can now vary across individuals \(i\). For now, we will assume \(\beta_{it}^k = \beta_0^k + \sigma_{kt} v_{it}^k\). We now have \(k\) which is the dimension of \(\beta\). \(\beta_0^k\) are fixed taste for characteristics \(k\) and \(v_{it}^k\) are random tastes that follow standard normal distribution.

Now we can expand the model:

\[\begin{align} u_{ijt} &= (x_{j1t}, \ldots, x_{jKt}) \cdot (\beta_{0}^1 + \sigma_1 v_{it}^1, \ldots, \beta_{0}^K + \sigma_K v_{it}^K)^T + \alpha p_{jt} + \xi_{jt} + \varepsilon_{ijt}\\ &= x_{jt}\beta_{it} + \sum_k x_{jkt} \sigma_{k}v_{ikt} + \alpha p_{jt} + \xi_{jt} + \varepsilon_{ijt}\\ &= \underbrace{x_{jt}\beta_{it} + \alpha p_{jt} + \xi_{jt}}_{\delta_{jt}} + \underbrace{\sum_k x_{jkt} \sigma_{k}v_{ikt}}_{\mu_{ijt}} + \varepsilon_{ijt}. \end{align}\]

We can easily see that this is just an extension of what we did for the basic random utility model. Indirect utility is made up of mean utility \(\delta_{jt}\) and random coefficient term \(\mu_{ijt} + \varepsilon_{ijt}\).

Now we will make some simplication. We will assume that characteristics dimension of individual is one: \(K = 1\). Using this simplication, we can again use the assumption that idiosyncratic shock follows T1EV to get aggregate share:

\[ s_{jt} = \int \frac{\exp(\delta_{jt} + x_{jt} \sigma_t v_{it})}{1 + \sum_j \exp(\delta_{jt} + x_{jt} \sigma_t v_{it})} f(v_i)dv_i \]

The integral has no analytical solution in the random coefficient model, so we need to compute the integral by simulation. One way to do it is as follows:

\[ \hat{s}_{jt} = \frac{1}{ns} \sum_{i=1}^{ns} \frac{\exp(\delta_{jt} + x_{jt} \sigma_t v_{it})}{1 + \sum_j \exp(\delta_{jt} + x_{jt} \sigma_t v_{it})} \]

where \(ns\) is number of random draws from \(v_i\).

Now we can see the inversion method we did before is not easy to implement. This is because we now have additional parameters that we do not know the values.

So in BLP, we need to do nested estimation algorithm.

  1. In the outer loop, we iterate over different values of the parameters.

  2. In the inner loop, for a given parameter value, we do the inversion to get the mean utility and estimate the GMM objective function.

  3. We keep on doing this iteration until we get the parameters that minimize the GMM function.

Now let’s do some coding!

1.3.1 Another coding (with Julia)

This portion of the code is from here

###############################################################################
####  BLP fixed-point algorithm, inverting mkt shares to get mean utility  ####
###############################################################################
using CSV
using DataFrames
using GLM
using Statistics
using LinearAlgebra
using Distributions
using NLopt
otc = CSV.read("data/otc.csv", DataFrame)

ns = 500;
nmkt = maximum(otc.mkt);
mkt = unique(otc.mkt);
nprod = maximum(otc.product);

vi = quantile.(Normal(), collect(range(0.5/ns, step = 1/ns, length = ns)));
sigma = 1;

function calc_mkt_share_t(delta_t, sigma_t, x_t, vi_t)
    # Dimension: delta_t 11*1, simga_t 1*1, x_t 11*1
    delta_t = delta_t .* ones(nprod, ns)
    mu_t = x_t*sigma_t*vi_t'
    numerator = exp.(delta_t .+ mu_t)
    denominator = ones(nprod, ns) .+ sum(numerator, dims = 1)
    mkt_share_t = mean(numerator./denominator, dims = 2)
end

function contraction_t(d0, sigma_t, x_t, vi_t, mkt_t, tol = 1e-5, maxiter = 1e5)
    obs_mkt_share_t = mkt_t.mkt_share
    d_old = d0
    normdiff = Inf
    iter = 0
    while normdiff > tol && iter <= maxiter
        model_mkt_share_t = calc_mkt_share_t(d_old, sigma_t, x_t, vi_t)
        d_new = d_old .+ log.(obs_mkt_share_t) .- log.(model_mkt_share_t)
        normdiff = maximum(norm.(d_new .- d_old))
        d_old = d_new
        iter += 1
    end
    return d_old
end

function calc_delta(sigma)
    delta_fp = zeros(nprod, nmkt);
    for t in mkt
        mkt_t = otc[otc.mkt .== t, :];
        x_t = ones(nprod, 1);
        delta_t = zeros(nprod, 1);
        sigma_t = sigma;
        vi_t = vi;
        delta_fp[:, t] = contraction_t(delta_t, sigma_t, x_t, vi_t, mkt_t);
    end
    return vec(delta_fp);
end

@time delta_fp = calc_delta(sigma);
mean(delta_fp)
std(delta_fp)
  0.104584 seconds (68.19 k allocations: 174.658 MiB, 8.87% gc time, 31.12% compilation time)
0.8537059827877238
################################################################
#### Estimate beta and sigma using GMM (cost as instrument) ####
################################################################
X = hcat(ones(nprod*nmkt, 1),
         otc.price, otc.promotion,
         otc.product_2, otc.product_3, otc.product_4, otc.product_5,
         otc.product_6, otc.product_7, otc.product_8, otc.product_9,
         otc.product_10, otc.product_11);
z = hcat(X, otc.cost);
Phi = z'*z/1056;
inv_Phi = inv(Phi);

function GMMObjFunc(theta2::Vector, grad::Vector)
    sigma = theta2[1]
    delta = calc_delta(sigma)
    theta1 = inv(X'*z*inv_Phi*z'*X)*X'*z*inv_Phi*z'*delta
    error = delta - X*theta1
    obj = error'*z*inv_Phi*z'*error
    return obj
end

opt = Opt(:LN_COBYLA, 1)
opt.xtol_rel = 1e-4
opt.lower_bounds = [0.00001]
opt.min_objective = GMMObjFunc
@time (minf,minx,ret) = optimize(opt, [1])

@show sigma = minx[1]
delta = calc_delta(sigma[1]);
theta1 = inv(X'*z*inv_Phi*z'*X)*X'*z*inv_Phi*z'*delta
 21.530331 seconds (19.91 M allocations: 109.851 GiB, 14.45% gc time, 0.19% compilation time)
sigma = minx[1] = 28.720209709193362
13-element Vector{Float64}:
 -71.52085329511755
  -1.0852292863458644
   0.22206483339059624
   1.903952993172239
   3.990433860485858
  -0.6194995526125144
   1.2498418962356002
   3.9120756908464607
  -2.12500554825884
  -1.1737916667530266
   0.2462099284926299
  -2.7352849052497947
   0.011924728064080292

1.3.2 My personal coding (again Julia)

I will try to write my personal code using Julia following the exercises in Mixtape session for Demand Estimation. Note that I will be just copying the exercise questions from that site below. Also, I am going to use Julia.

2 Exercise 1

2.1 Introduction

This is the first of three exercises that will give you a solid foundation for doing BLP-style estimation. The running example is the same as in lecture: what if we halved an important product’s price?

2.2 Setup

Most of the computational heavy-lifting in these exercises will be done by the open-source Python package PyBLP. It is easiest to use PyBLP in Python, and the hints/solutions for the exercises will be given in Python. But for those who are more familiar with R, it is straightforward to call PyBLP from R with the reticulate package. It is technically possible to call PyBLP from other languages like Julia and MATLAB, but most users either use Python or R.

You should install PyBLP on top of the Anaconda Distribution. Anaconda comes pre-packaged with all of PyBLP’s dependencies and many more Python packages that are useful for statistical computing. Steps:

  1. Install Anaconda if you haven’t already. You may wish to create a new environment for just these exercises, but this isn’t strictly necessary.
  2. Install PyBLP. On the Anaconda command line, you can run the command pip install pyblp.

If you’re using Python, you have two broad options for how to do the coding exercises.

  • Use a Jupyter Notebook. The solutions to each exercise will be in a notebook. In general, notebooks are a good way to weave text and code for short exercises, and to distribute quick snippets of code with others.
  • Use an integrated development environment (IDE). Once you get beyond a few hundred lines of code, I strongly recommend using an IDE and not notebooks. For Python, I recommend VS Code or PyCharm. The former is free and the latter has a free community edition with all the features you’ll need for standard Python development. Both integrate well with Anaconda.

If using a notebook, you can right click and save the following notebook template: notebook.ipynb. If using an IDE, you can right click and save the following script template: script.py. Both import various packages used throughout the exercise.

#| eval: false
 
import pyblp
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

The notebook additionally configures these packages to reduce the amount of information printed to the screen.

#| eval: false
 
pyblp.options.digits = 3
pyblp.options.verbose = False
pd.options.display.precision = 3
pd.options.display.max_columns = 50

import IPython.display
IPython.display.display(IPython.display.HTML('<style>pre { white-space: pre !important; }</style>'))

Finally, both show how to load the data that we’ll be using today.

2.3 Data

Today, you’ll use the products.csv dataset, which is a simplified version of Nevo’s (2000) fake cereal data with less information and fewer derived columns. The data were motivated by real grocery store scanner data, but due to the proprietary nature of this type of data, the provided data are not entirely real. This dataset has been used as a standard example in much of the literature on BLP estimation.

Compared to typical datasets you might use in your own work, the number of observations in this example dataset is quite small. This helps with making these exercises run very fast, but in practice one would want more data than just a couple thousand data points to estimate a flexible model of demand. Typical datasets will also include many more product characteristics. This one only includes a couple to keep the length of the exercises manageable.

The data contains information about 24 breakfast cereals across 94 markets. Each row is a product-market pair. Each market has the same set of breakfast cereals, although with different prices and quantities. The columns in the data are as follows.

Column Data Type Description
market String The city-quarter pair that defines markets \(t\) used in these exercises. The data were motivated by real cereal purchase data across 47 US cities in the first 2 quarters of 1988.
product String The firm-brand pair that defines products \(j\) used in these exercises. Each of 5 firms produces between 1 and 9 brands of cereal.
mushy Binary A dummy product characteristic equal to one if the product gets soggy in milk.
servings_sold Float Total quantity \(q_{jt}\) of servings of the product sold in a market, which will be used to compute market shares.
city_population Float Total population of the city, which will be used to define a market size.
price_per_serving Float The product’s price \(p_{jt}\) used in these exercises.
price_instrument Float An instrument to handle price endogeneity in these exercises. Think of it as a cost-shifter, a Hausman instrument, or any other valid IV that we discussed in class.

Throughout the exercises, we use these data to estimate an increasingly flexible BLP-style model of demand for cereal. We will use predictions from this model to see how our running example, cutting the price of one cereal, affects demand for that cereal and for its substitutes.

2.4 Questions

2.4.1 1. Describe the data

You can download products.csv from this link. To load it, you can use pd.read_csv. To look at a random sample of its rows, you can use .sample. To compute summary statistics for different columns, you can use .describe. Throughout these exercises, you’ll be given links to functions and methods that can be used to answer the questions. If you’re unsure about how to use them, you should click on the link, where there is typically example code lower down on the page.

2.4.2 2. Compute market shares

To transform observed quantities \(q_{jt}\) into market shares \(s_{jt} = q_{jt} / M_t\), we first need to define a market size \(M_t\). We’ll assume that the potential number of servings sold in a market is the city’s total population multiplied by 90 days in the quarter. Create a new column called market_size equal to city_population times 90. Note that this assumption is somewhat reasonable but also somewhat arbitrary. Perhaps a sizable portion of the population in a city would never even consider purchasing cereal. Or perhaps those who do tend to want more than one serving per day. In the third exercise, we’ll think more about how to discipline our market size assumption with data.

Next, compute a new column market_share equal to servings_sold divided by market_size. This gives our market shares \(s_{jt}\). We’ll also need the outside share \(s_{0t} = 1 - \sum_{j \in J_t} s_{jt}\). Create a new column outside_share equal to this expression. You can use .groupby to group by market and .transform('sum') to compute the within-market sum of inside shares. Compute summary statistics for your inside and outside shares. If you computed market shares correctly, the smallest outside share should be \(s_{0t} \approx 0.305\) and the largest should be \(s_{0t} \approx 0.815\).

2.4.3 3. Estimate the pure logit model with OLS

Recall the pure logit estimating equation: \(\log(s_{jt} / s_{0t}) = \delta_{jt} = \alpha p_{jt} + x_{jt}' \beta + \xi_{jt}\). First, create a new column logit_delta equal to the left-hand side of this expression. You can use np.log to compute the log.

Then, use the package of your choice to run an OLS regression of logit_delta on a constant, mushy, and price_per_serving. There are many packages for running OLS regressions in Python. One option is to use the formula interface for statsmodels. To use robust standard errors, you can specify cov_type='HC0' in OLS.fit.

Interpret your estimates. Your coefficient on price_per_serving should be around -7.48. In particular, can you re-express your estimate on mushy in terms of how much consumers are willing to pay for mushy, using your estimated price coefficient?

2.4.4 4. Run the same regression with PyBLP

For the rest of the exercises, we’ll use PyBLP do to our demand estimation. This isn’t necessary for estimating the pure logit model, which can be done with linear regressions, but using PyBLP allows us to easily run our price cut counterfactual and make the model more flexible in subsequent days’ exercises.

PyBLP requires that some key columns have specific names. You can use .rename to rename the following columns so that they can be understood by PyBLP.

  • market –> market_ids
  • product –> product_ids
  • market_share –> shares
  • price_per_serving –> prices

By default, PyBLP treats prices as endogenous, so it won’t include them in its matrix of instruments. But the “instruments” for running an OLS regression are the same as the full set of regressors. So when running an OLS regression and not account for price endogeneity, we’ll “instrument” for prices with prices themselves. We can do this by creating a new column demand_instruments0 equal to prices. PyBLP will recognize all columns that start with demand_instruments and end with 0, 1, 2, etc., as “excluded” instruments to be stacked with the exogenous characteristics to create the full set of instruments.

With the correct columns in hand, we can initialize our pyblp.Problem. To specify the same R-style formula for our regressors, use pyblp.Formulation. The full code should look like the following.

#| eval: false
 
ols_problem = pyblp.Problem(pyblp.Formulation('1 + mushy + prices'), product_data)

If you print(ols_problem), you’ll get information about the configured problem. There should be 94 markets (T), 2256 observations (N), 3 product characteristics (K1), and 3 total instruments (MD). You can verify that these instruments are simply the regressors by looking at ols_problem.products.X1 and ols_problem.products.ZD, comparing these with mushy and prices in your dataframe. For the full set of notation used by PyBLP, which is very close to the notation used in the lectures, see this page.

To estimate the configured problem, use .solve. Use method='1s' to just do 1-step GMM instead of the default 2-step GMM. In this case, this will just run a simple linear OLS regression. The full code should look like the following.

#| eval: false
 
ols_results = ols_problem.solve(method='1s')

Again, if you print(ols_results), you’ll get estimates from the logit model. Make sure that your estimates are the same as those you got from your OLS regression. If you used 'HC0' standard errors like suggested above, you standard errors should also be the same.

2.4.5 5. Add market and product fixed effects

Since we expect price \(p_{jt}\) to be correlated with unobserved product quality \(\xi_{jt}\), we should be worried that our estimated \(\hat{\alpha}\) on price is biased. Since we have multiple observations per market and product, and prices vary both across and within markets, it is feasible for us to add both market and product fixed effects. If \(\xi_{jt} = \xi_j + \xi_t + \Delta\xi_{jt}\) and most of the correlation between \(p_{jt}\) and \(\xi_{jt}\) is due to correlation between \(p_{jt}\) and either \(\xi_j\) (product fixed effects) or \(\xi_t\) (market fixed effects), then explicitly accounting for these fixed effects during estimation should help reduce the bias of our \(\hat{\alpha}\).

The simplest way to add fixed effects is as dummy variables. We won’t do this today, but for your own reference, you could do this either by making a separate column for each possible market and product fixed effects and adding these to your formulation, or you could use the shorthand mushy + prices + C(market_ids) + C(product_ids). See pyblp.Formulation for different shorthands you can use. Since there are only 24 products and 94 markets for a total of 118 fixed effects, this approach is actually feasible in this case. But in a more realistic dataset with hundreds or thousands of products and markets, running an OLS regression with this many dummy variables starts to become computationally infeasible.

The alternative, which we’ll do today, is to “absorb” the fixed effects. For a single fixed effect, we could just de-mean our outcome variable and each of our regressors within the fixed effects levels, and then run our regression. For multiple fixed effects, we need to iteratively de-mean. PyBLP does this automatically if you specify absorb='C(market_ids) + C(product_ids)' in your formulation instead of adding these as dummy variables.

Since mushy is always either 1 or 0 for the same product across different markets, it’s collinear with product fixed effects, and you can drop it from your formula. Similarly, you can drop the constant. After dropping these, re-create your problem with absorbed fixed effects and re-solve it. Compare the new \(\hat{\alpha}\) estimate with the last one. You should now be getting a coefficient on price of around -28.6. Does its change suggest that price was positively or negatively correlated with unobserved product-level/market-level quality?

2.4.6 6. Add an instrument for price

Adding market and product fixed effects can be helpful, but since unobserved quality typically varies by both product and market, we really want to instrument for prices. The data comes with a column price_instrument that we should interpret as a valid instrument for price that satisfies the needed exclusion restriction. It could be a cost-shifter, a valid Hausman instrument, or similar.

Before using it, we should first run a first-stage regression to make sure that it’s a relevant instrument for price. To do so, use the same package you used above to run an OLS regression to run a second OLS regression of prices on price_instrument and your market and product fixed effects. If using the formula interface for statsmodels, you can use the same fixed effect shorthand as in PyBLP, with your full formula looking like prices ~ price_instrument + C(market_ids) + C(product_ids). Does price_instrument seem like a relevant instrument for prices?

Now that we’ve checked relevance, we can set our demand_instruments0 column equal to price_instrument, re-create the problem, and re-solve it. You should get a new coefficient on price of around -30.6. Does the change in \(\hat{\alpha}\) suggest that price was positively or negatively correlated with \(\Delta\xi_{jt}\) in \(\xi_{jt} = \xi_j + \xi_t + \Delta\xi_{jt}\)?

2.4.7 7. Cut a price in half and see what happens

Now that we have our pure logit model estimated, we can run our counterfactual of interest: what if we halved an important product’s price? We’ll select a single market, the most recent quarter in the first city: C01Q2. Create a new dataframe called counterfactual_data by selecting data for just that market and inspect the data. We’ll pretend that we’re firm one, and deciding whether we want to cut the price of our brand four’s product F1B04. In particular, we might be worried about cannibalization, i.e. how much this price cut will result in consumers of our other 8 brands of cereal in this market just substituting from their old choice to the new, cheaper cereal. Alternatively, we could be a regulator or academic interested in how taxing that product would affect demand in the market.

In your new dataframe with just data from C01Q2, create a new_prices column that is the same as prices but with the price of F1B04 cut in half. To do this, you could use DataFrame.loc. Then, use .compute_shares on your results from the last question, passing market_id='C01Q2' to only compute new market shares for our market of interest, and passing prices=counterfactual_data['new_prices'] to specify that prices should be set to the new prices. This function will re-compute market shares at the changed prices implied by the model’s estimates. Store them in a new_shares column.

Compute the percent change in shares for each product in the market. From firm one’s perspective, do the estimates of cannibalization make sense. That is, do the signs on the percent changes for product F1B04 and for other products make sense? Would you normally expect percent changes for other products to be different depending on how other products compare to the one whose price is being changed?

2.4.8 8. Compute demand elasticities

To better understand what’s going on, use .compute_elasticities, again specifying market_id='C01Q2', to compute price elasticities for our market of interest. These measure what the model predicts will happen to demand in percentage terms when there’s a 1% change in price of a product. The diagonal elements are own-price elasticities and the off-diagonal elements are cross-price elasticities. Does demand seem very elastic? Do the cross-price elasticities seem particularly reasonable?

2.5 Supplemental Questions

These questions will not be directly covered in lecture, but will be useful to think about when doing BLP-style estimation in your own work.

2.5.1 1. Try different standard errors

By default, PyBLP computed standard errors that are robust to heteroskedasticity. But we may be concerned that unobserved quality \(\xi_{jt}\) is systematically correlated across markets for a given product \(j\), or across products for a given market \(t\). Choose which one you think is more likely and try clustering your standard errors by that dimension. You can do this with se_type='clustered' in .solve, for which you’ll need a clustering_ids column in your product data. See how your standard error for \(\alpha\) changes.

2.5.2 2. Compute confidence intervals for your counterfactual

Your estimate of \(\hat{\alpha}\) comes with a standard error, but your counterfactual demand predictions don’t. Ideally we’d like to not only have a point estimate for a counterfactual prediction, but also a measure (up to model misspecification) of how confident we are in these predictions. The easiest way to do this is with a “parametric bootstrap.” The intuition is we can draw from the estimated asymptotic distribution of our \(\hat{\alpha}\), and for each draw, re-compute demand, and see how demand responds to the same price cut.

You can do a parametric bootstrap with the .bootstrap method. Start with just a few draws (e.g., draws=100) and remember to set your seed so that you get the same draws every time you run the code. When new parameters are drawn, you get new .boostrapped_shares, which take the place of your old shares. You can use the same .compute_shares method on the BootstrapedResults class, although you’ll have to pass a prices argument with prices replicated along a new axis by as many draws as you have.

Once you have some bootstrapped shares, compute the same percent changes, and compute the 2.5th and 97.5th percentiles of these changes for each product. Are these 95% confidence intervals for your predictions particularly wide?

2.5.3 3. Impute marginal costs from pricing optimality

The canonical demand side of the BLP model assumes firms set prices in static Bertrand-Nash equilibrium. See this section for a quick summary using PyBLP notation. Given an estimated demand model and such assumptions about pricing, we can impute marginal costs c_{jt}.

To do so, you first need to tell PyBLP what firms own what products. Create a new firm_ids column in your data, re-initialize your problem, and re-solve it. Then, you should be able to run the .compute_costs method to impute firms’ marginal cost of producing each cereal. Do these marginal costs look particularly reasonable? How might limitations of your demand model and supply model bias them? What would they and observed prices imply about firms’ markups and economic profits?

2.5.4 4. Check your code by simulating data

Even experienced software developers make a lot of mistakes when writing code. Writing “unit tests” or “integration tests” that check whether the code you’ve written seems to be working properly is incredibly important when writing complicated code to estimate demand. Perhaps the most useful test you can write when doing demand estimation (or most other types of structural estimation) is the following.

  1. Simulate fake data under some true parameters.
  2. Estimate your model on the simulated data and make sure that you can recover the true parameters, up to sampling error.

If you do these steps many times, the resulting Monte Carlo experiment will also give you a good sense for the finite sample statistical properties of your estimator.

PyBLP’s Simulation class makes simulating data fairly straightforward. Its interface is similar to Problem, but you also specify your parameter estimates and structural errors. In addition to checking your code, you can also use this class for more complicated counterfactuals. After initializing your simulation, you can use .replace_endogenous to have PyBLP replace the prices \(p_{jt}\) and market shares \(s_{jt}\) with those that rationalize the chosen true parameters and other parts of the simulation. It does so by solving the pricing equilibrium. You’ll have to pass your imputed marginal costs via the costs argument.

Initialize a simulation of the pure logit model with the same product_data and the same estimated xi but with an \(\alpha\) somewhat different than the one you estimated. Make sure your chosen \(\alpha\) is negative, otherwise demand will be upward sloping and PyBLP will have trouble solving for equilibrium prices. To the estimated .xi you can add the estimated fixed effects .xi_fe, since the simulation class does not support fixed effects absorption.

Have PyBLP solve for prices and market shares, and use the resulting data to re-estimate your pure logit regression. See if you can get an estimated \(\hat{\alpha}\) close to the true one.

3 Exercise 2

3.1 Introduction

This is the second of three exercises that will give you a solid foundation for doing BLP-style estimation. We’ll continue with the same running example: what if we halved an important product’s price? Our goal today is to relax some of the unrealistic substitution patterns implied by the pure logit model by incorporating preference heterogeneity. To do so, we will use cross-market variation in our product and some new demographic data to estimate parameters that govern preference heterogeneity.

3.2 Setup

We’ll be continuing where we left off after the first exercise. You should just keep adding to your code, using its solutions if you had trouble with some parts.

3.3 Data

Today, you’ll incorporate demographics.csv into estimation, which again is a simplified version of Nevo’s (2000) demographic data with less information and fewer derived columns. The data were originally draws from the Current Population Survey.

In your own work, when incorporating demographic data into estimation, you will want to sample from the whole Current Population Survey (or whatever survey/census data you are using), not just from a subset of it. The small size of today’s demographic data helps with distributing the data, but in practice you should ideally be sampling from a much larger dataset of demographic information. In your own work you will also want to incorporate more demographic variables than the one included in this dataset. Like the product data, in these exercises we only consider a few columns to keep the exercises a manageable length.

The demographic dataset contains information about 20 individuals drawn from the Current Population Survey for each of the 94 markets in the product data. Each row is a different individual. The columns in the data are as follows.

Column Data Type Description
market String The city-quarter pair that defines markets \(t\) used in these exercises. The data were motivated by real cereal purchase data across 47 US cities in the first 2 quarters of 1988.
quarterly_income Float The quarterly income of the individual in dollars.

In today and tomorrow’s exercises, we will use these demographic data to introduce income-specific preference heterogeneity into our BLP model of demand for cereal and see how our counterfactual changes. By incorporating income, we will also be able to speak to distributional concerns: how will counterfactual changes in the market differentially affect high- and low-income consumers?

3.4 Questions

3.4.1 1. Describe cross-market variation

To get a sense for what types of preference heterogeneity we can feasibly estimate with variation in our product and demographic data, recall the linear regression intuition about identification from the lecture. To add parameters in \(\Sigma\) we want cross-market choice set variation. To add parameters in \(\Pi\) we want cross-market demographic variation.

You should already have products.csv from the last exercise. You can download demographics.csv from this link. Quarterly income has a long right tail that makes summarizing it difficult, so you should create a new column log_income equal to the log of quarterly_income.

Across both datasets, describe the amount of cross-market variation in the number of products, mushy, prices, and log_income. You can use .groupby and .agg to compute market-level statistics for these variables (e.g., counts, means, and standard deviations), and you can then use .describe to compute summary statistics for these market-level statistics.

Which variables have any cross-market variation? In other words, referring back to the linear regression intuition about identification, which preference heterogeneity parameters do we have any hope of credibly estimating? Recall that we have both product and market fixed effects, which will be collinear with some regressors on potential parameters in \(\Sigma\) and \(\Pi\) in the approximate linear regression.

3.4.2 2. Estimate a parameter on mushy \(\times\) log income

To estimate a parameter in \(\Pi\) on the interaction between mushy and log_income, we first need to define a datset of consumer types that PyBLP can use for preference heterogeneity. From each market \(t\), take \(I_t = 1,000\) draws with replacement from the demographic data, and call the resulting dataframe agent_data. Each of its rows will be a consumer type \(i \in I_t\). You can do this with .groupby and .sample. You can use as_index=False when grouping to keep your market column. Remember to set your seed with the random_state argument when sampling from each market.

Like with product data, to get PyBLP to recognize the market column as markets \(t\), you’ll need to rename it to market_ids. You’ll also need to specify consumer type shares / sampling weights \(w_{it}\). Since we took each draw from the demographic data with equal probability, these should be uniform weights \(w_{it} = 1 / 1,000\). Make a new column weights equal to 1 / 1000.

Finally, later we’ll be adding some dimensions of unobserved preference heterogeneity in \(\nu_{it}\). PyBLP recognizes these with the column names nodes0, nodes1, nodes2, etc. Make three columns, nodes0, nodes1, and nodes2, each with a different set of draws from the standard normal distribution. You can do this with np.random.default_rng, remembering to set your seed, and .normal, using size=(len(agent_data), 3). Print some rows from your agent_data to make sure it looks like you’d expect.

To identify our new parameter, we need a new instrument. We’ll use the recommendation from the lecture to use mean income \(m_t^y\) interacted with mushy in \(x_{jt}\). To merge in the mean market-level income from the first question into the product data, you can use .merge. Recall that you already have one column demand_instruments0 equal to your price instrument. To add a second instrument, create a new column demand_instruments1 equal to the interaction between your merged-in market-level mean income and mushy.

When initializing your new pyblp.Problem, you’ll need two new pyblp.Formulation instances to model the interaction between mushy and log_income. First, replace your old formulation with a tuple

#| eval: false
 
product_formulations = (pyblp.Formulation('0 + prices', absorb='C(market_ids) + C(product_ids)'), pyblp.Formulation('0 + mushy'))

This defines, in PyBLP lingo, the formulations for the X1 and X2 matrices. The X1 matrix is the one with beta coefficients. The X2 matrix is interacted with consumer type-specific variables like your demographics \(y_{it}\). There 0 values in the formulations guarantee that they won’t have constant terms (by default a constant is added, unless there are absorbed fixed effects). You’ll also need a new formulation for consumer demographics.

#| eval: false
 
agent_formulation = pyblp.Formulation('0 + log_income')

With these in hand, we can define the new problem. See the pyblp.Problem documentation for the ordering and names of arguments.

#| eval: false
 
pyblp.Problem(product_formulations, product_data, agent_formulation, agent_data)

Now that we want to set up a nonlinear GMM problem, we need to configure some nonlinear optimization arguments in .solve. First, we should configure our optimizer. We’ll use the lecture’s recommendation to use SciPy’s trust region algorithm trust-constr. And we’ll be explicit about using its default termination tolerances of 1e-8. You can do this by setting optimization=pyblp.Optimization('trust-constr', method_options={'gtol': 1e-8, 'xtol': 1e-8}). In pyblp.Optimization, method_options is a dictionary mapping trust-constr configuration options to their values.

Second, we need to specify which parameters in \(\Sigma\) and \(\Pi\) to optimize over, and what their initial values will be. Since we’re starting without any parameters in \(\Sigma\), we’ll set sigma=0. Zeros in PyBLP mean that PyBLP should have the parameter always be set to zero, i.e. not there. There’s only one new parameter in \(\Pi\) (there’s only one column in your X2 formulation, mushy, and only one demographic in your agent formulation, log_income), and we’ll just set it to an arbitrary nonzero value for now (indicating that it should be optimized over), say pi=1.

Before and after you solve the problem, you can set pyblp.options.verbose = True and pyblp.options.verbose = False. Make sure that the gradient norm (the optimization problem’s first-order conditions) is getting iteratively closer to zero. We call this “marching down the gradient” and not seeing it near the end of optimization indicates a problem. Since we have exactly as many parameters as moments/instruments, we’re just identified and should also have an approximately zero objective at the optimum, so make sure you see that as well. At the optimum, in addition to a near-zero objective and gradient norm, the Hessian (in this case just a single value) should be positive, indicating that the optimization’s second-order conditions are satisfied. The other columns outputted in the optimization progress have information about the inner loop, e.g. how many iterations it takes to solve it and how long this takes.

Once you’re satisfied that optimization seems to have worked well, interpret the sign of your estimated parameter in \(\Pi\). If you’ve done all of the above correctly, you should get an estimate of around 0.251. Can you use \(\hat{\alpha}\) to interpret it in dollar terms, i.e. how much more a person with 1% higher income is willing to pay for mushyness?

3.4.3 3. Make sure you get the same estimate with random starting values

We can be fairly confident with our objective/gradient/Hessian checks that we’re at the global optimum with just one parameter in a just identified model. But with more complicated optimization problems in more realistic problems, we really want to try different starting values to make sure we always get the same estimates. Let’s try doing that here.

First, configure some bounds for our new \(\Pi\) parameter, say pi_bounds = (-10, +10). Then in a for loop over different random number generator seeds, randomly draw an initial starting value from within these bounds (you can use .uniform) and re-optimize. You may want to actually impose your bounds during optimization using the pi_bounds argument to .solve.

Do you get the same estimate for each random starting value? If not, there may be an issue with optimization or your problem configuration. If you have a more complicated model with many parameters and many instruments, you may often get a global minimum, and sometimes get a local minimum. Optimizers aren’t perfect, and sometimes terminate prematurely, even with tight termination conditions. You should select the global one for your final estimates.

3.4.4 4. Evaluate changes to the price cut counterfactual

Using the new estimates, re-run the same price cut counterfactual from last exercise. Re-compute percent changes and compare with those from the pure logit model. Are there any (small) differences, particularly for substitution from other products? Explain how the introduction of the new parameter induced these changes. Do cannibalization estimates seem more reasonable than before?

3.4.5 5. Estimate parameters on price \(\times\) log income and unobserved preferences

Like for mushy, our cross-market income variation allows us to estimate a second parameter in \(\Pi\) on prices \(\times\) log_income. Unlike mushy, which doesn’t vary across markets, we actually have cross-market variation in prices, which will allow us to potentially estimate a parameter in \(\Sigma\) on prices.

To add these two new parameters, we’ll need two new instruments. Since we can’t have endogenous prices in our instruments, we’ll first set a new column predicted_prices equal to the fitted values from the price IV first stage regression that we ran yesterday. If you used statsmodels, you can just get these from .fittedvalues of your regression results object. Verify that prices and predicted_prices are strongly correlated, for example with .corr.

To target the new parameter in \(\Pi\), we’ll follow the lecture’s recommendation and add a new demand_instruments2 equal to the interaction between the log income mean and predicted_prices. To target the new parameter in \(\Sigma\), we’ll also follow the lecture’s recommendation and add a new demand_instruments3 equal to the sum of squared distances between predicted_prices and all other predicted_prices in the same market. You could construct this market-by-market by using .groupby and .transform with a custom function that accepts a market’s predict_prices as an argument, computes a matrix of all pairwise differences between these values (e.g., with x[:, None] - x[None, :]), squares them, and sums them across columns.

When initializing the problem, we’ll need to add a new prices term in the X2 formulation: 0 + mushy + prices. Otherwise, with the updated product data, initializing the problem is the same. When solving the problem, however, the extra column in the X2 formulation means that we need an extra row in our configuration for sigma and pi. First, sigma should be a \(2 \times 2\) matrix corresponding to the two columns in X2. All elements will be zero (indicating that the corresponding elements in \(\Sigma\) will be fixed to zero), except for the bottom-right value, which we’ll set to some arbitrary non-zero starting values, say 1. Similarly, pi should be a \(2 \times 1\) matrix corresponding to the two columns in X2 and the one column in the agent formulation. We’ll set the top element equal to some starting value that’s close to our last estimate, say 0.2, and the bottom element equal to something arbitrary for the new parameter, say 1 again. Your sigma and pi arguments should look like

#| eval: false
 
sigma=[
    [0, 0],
    [0, 1],
], 
pi=[
    [0.2],
    [1],
]

When you solve the problem, there are two more parameters, so it will take a bit longer. As the optimizer nears convergence, we should again see “marching down the gradient” and, since we’re still just identified, an objective approaching zero. At the optimum, verify that the gradient norm is again near zero and that the Hessian’s eigenvalues are all positive. Usually, we would again draw multiple starting values to be sure everything’s working fine, but from here on out, to save time we’ll just use one set of starting values. Your sigma estimate should be around 6.02.

Note that your new random coefficient on price is \(\alpha_{it} = \alpha + \sigma_p \nu_{1it} + \pi_p y_{it}\) where \(\nu_{1it}\) is nodes0 in your agent_data and \(y_{it}\) is log_income. Compute the average log income \(\bar{y}\) in your demographic data to verify that your estimate of the average price coefficient \(\alpha + \sigma_p \times 0 + \pi \times \bar{y}\) is close to your old \(\hat{\alpha}\). Using this average price sensitivity, interpret the two new parameters. Does price sensitivity vary a lot or a little with income? Compare to heterogeneity induced by income, is there a lot or a little unobserved heterogeneity in price sensitivity?

3.4.6 6. Evaluate changes to the price counterfactual

Re-run the price counterfactual and discuss new differences. Do substitution and cannibalization seem more reasonable than before? Does the price change seem to differentially affect high- vs. low-income consumers? Is there anything that we were unable to estimate with cross-market product-level variation that you think could have helped further improve substitution patterns?

3.5 Supplemental Questions

These questions will not be directly covered in lecture, but will be useful to think about when doing BLP-style estimation in your own work.

3.5.1 1. Try using different numbers of Monte Carlo draws

Above, you took \(I_t = 1,000\) draws per market when constructing your agent data. Particularly for simple models like this, this is usually a sufficient number of draws. In practice, however, when there are many more markets and dimensions of heterogeneity, you may want to start with a smaller number like \(I_t = 100\) to get started, and then increase this number until your estimates stop changing and optimization stops having any issues. Try re-estimating the model with 10, 100, 500, and 2,000 draws, and see how your estimates change, if at all.

3.5.2 2. Try using scrambled Halton sequences

Using a random number generator like np.random.default_rng is perhaps the simplest approach to approximate an integral over a distribution like standard normals. However, using quasi-Monte Carlo sequences can do better with fewer draws. Instead of using \(N(0, 1)\) draws from a random number generator, try using scrambled Halton draws.

You can do so with scipy.stats.qmc.Halton or pyblp.build_integration with specification='halton' in pyblp.Integration. Again, estimate the model with 10, 100, 500, 1,000, and 2,000 quasi-Halton numbers per market, and see how the stability of your estimates compares to when you use simple Monte Carlo draws.

3.5.3 3. Try using quadrature

A particularly computationally-efficient approach to approximating a Gaussian distribution is with quadrature. If you’re working with a model where consumer preference heterogeneity is not particularly complicated, you may want to try to replace Monte Carlo or quasi-Monte Carlo draws with many fewer nodes/weights that very well approximate the distribution.

You can construct quadrature nodes and weights with pyblp.build_integration with specification='product' in pyblp.Integration. The 'product' specification means that PyBLP will take the cross-product of the quadrature nodes and weights for each univarate \(N(0, 1)\). If you have, say, size=7 nodes/weights per market for each \(N(0, 1)\) but 5 dimensions of heterogeneity (i.e., nodes0, nodes1, nodes2, nodes3, and nodes4), this will give \(7^5 = 16,807\) consumer types per market. This is known as the curse of dimensionality, which Monte Carlo integration does not suffer from as much. With this many dimensions of heterogeneity, you could either switch to back to Monte Carlo integration or try using sparse grid integration, which more carefully chooses quadrature nodes/weights in higher dimensions. You can do this with specification='grid'.

In our setting, we are only using one set of \(N(0, 1)\) draws for unobserved preference heterogenity for price. We can use specification='product', size=7, and dimensions=2 to construct two \(N(0, 1)\) draws, the second of which we’ll want to convert into income draws. We can do this by estimating a lognormal distribution for income in each market (you’ll need to compute market-specific means and standard deviations of log income), and transforming the second column of \(N(0, 1)\) draws into log income draws. Try doing this and see if your estimates (and compute time) differs much from before. If they do, this indicates that a lognormal distribution for income may not be a great parametric assumption, and a Monte Carlo approach may have made more sense than quadrature.

3.5.4 4. Approximate the optimal instruments

After estimating your model, let PyBLP approximate the optimal instruments with .compute_optimal_instruments. You can then use the method .to_problem to automatically update your problem to include the optimal instruments. Then you can re-solve it like before. Do your estimates change much? If they don’t what does this suggest?

3.5.5 5. Add another instrument and compute the optimal weighting matrix

So far, we have been working with just-identified models with exactly as many moments as parameters. This means that in theory, the weighting matrix \(W\) shouldn’t matter because we should always be able to find parameter values that set the objective exactly equal to zero, regardless of how the different components of the objective are weighted.

But in some cases, you may want to have an over-identified model. Multiple instruments can increase the precision of estimates, and can also allow for overidentification tests. Try adding an additional instrument demand_instruments4 equal to the interaction between the log income standard deviation and your differentiation instrument constructed from predicted_prices. Recall the linear regression intuition: this leverages cross-market variation in the standard deviation of income to target the parameter in \(\Pi\) on prices and log income.

First re-estimate the first step of your model. At the optimum, your objetive will now be nontrivially far from zero, but you should still verify that the other convergence checks pan out. Do your first-stage estimates look any different?

Then re-run estimation but pass .updated_W to the W argument of .solve. You should still be using method='1s'. Technically, PyBLP will do the two steps for you with method='2s'. However, if this was a real problem and not just an exercise, you would want to again try optimization with multiple starting values, which is something that PyBLP doesn’t automatically do.

Compare you first and second-stage estimates. Do they look particularly different? What about your standard errors?

3.5.6 6. Incorporate supply-side restrictions into estimation

In the supplemental exercises of the first exercise, we used a canonical assumption about how firms set prices to impute marginal costs of producing each cereal from pricing optimality conditions. If we are further willing to model the functional form of marginal costs, for example as \(c_{jt} = x_{jt}'\gamma + \omega_{jt}\), we can stack our current moment conditions \(\mathbb{E}[\xi_{jt} z_{jt}^D]\) with additional supply-side moment conditions \(\mathbb{E}[\omega_{jt} z_{jt}^S]\). These will allow us to efficiently estimate \(\gamma\), and in some cases can also help to add precision to our demand-side estimates.

Like the the first exercise, you will need a column of firm_ids to tell PyBLP which firms produce what cereals. To impose the assumption that \(c_{jt} = x_{jt}'\gamma + \omega_{jt}\) for some characteristics \(x_{jt}\), you’ll need to specify a third formulation in your product_formulations tuple. For simplicity, try assuming that the only observed characterstics that affect marginal costs are a constant and the mushy dummy.

#| eval: false
 
product_formulations = (
    pyblp.Formulation('0 + prices', absorb='C(market_ids) + C(product_ids)'), 
    pyblp.Formulation('0 + mushy + prices'), 
    pyblp.Formulation('1 + mushy'),
)

By default, PyBLP knows that a constant and mushy are exogenous variables, so it will add them to the set of supply-side instruments \(z_{jt}^S\). If you wanted to model non-constant returns to scale, for example by including I(market_size * shares) to have a coefficient on quantities \(q_{jt} = M_t \cdot s_{jt}\), PyBLP would recognize that this term included endogenous market shares and not include it in \(z_{jt}^S\). Instead, you would have to specify a supply_instruments0 column in your product data to give a valid demand-shifter for market shares. Ideally, you would want something that affects demand but that is uncorrelated with the unobserved portion of marginal costs \(\omega_{jt}\).

Try estimating the model with supply-side restrictions. When calling .solve, you’ll need to specify an initial value for \(\alpha\) with the beta argument because after adding a supply side, we can no longer “concentrate out” \(\alpha\) as it’s needed to impute marginal costs. PyBLP will automatically concentrate out \(\gamma\).

Interpret your new supply-side estimates. Your demand-side estimates should be essentially the same because you’re using the same moments to estimate them as before. When adding a supply side, demand-side estimates tend to become more precise when there are supply-side instruments that are relevant for demand-side parameters (e.g., optimal instruments).

4 Exercise 3

4.1 Introduction

This is the last of three exercises that will give you a solid foundation for doing BLP-style estimation. We’ll continue with the same running example: what if we halved an important product’s price? Our goal today is to further relax some of the unrealistic substitution patterns that we were forced to bake-in to our model because of limited cross-market variation. To do so, we will use results from consumer surveys to introduce within-market variation.

4.2 Setup

We’ll be continuing where we left off after the second exercise. You should just keep adding to your code, using its solutions if you had trouble with some parts.

4.3 Data

Today, you won’t be including any new datasets into estimation, just a few carefully-chosen summary statistics from a couple of imaginary consumer surveys. These could have been from industry reports, from surveys administered by you, the researcher, or from any number of other places.

The first survey randomly sampled consumers who purchased breakfast cereal in markets C01Q1 and C01Q2, i.e. during the first two quarters in the first city covered by the product data. It elicited information about income. The second survey was another random sample of the same consumers, but it asked questions about second choice diversion.

Survey Name Observations Statistics
“Income” 100 Estimated mean log of quarterly income was 7.9.
“Diversion” 200 When asked what they would have done had their chosen cereal not been available, 28% said they would have purchased no cereal in the covered product data. Out of all respondents, 31% both purchased a mushy cereal and would have purchased another mushy cereal had it not been available.

In today’s exercise, we will match these three statistics to introduce some additional dimensions of heterogeneity into our BLP model of demand for cereal and see how our counterfactual changes.

4.4 Questions

4.4.1 1. Use the income statistic to match a parameter on log income

In the last exercise, we were unable to estimate a parameter on log income \(y_{it}\) alone because market fixed effects eliminate needed cross-market income variation. Instead, we simply assumed that this parameter is zero, i.e. that income does not shift individuals’ preference for cereal in general one way or another. Today, we’ll incorporate a micro moment \(\mathbb{E}[y_{it} | j > 0] = 7.9\) to estimate this parameter and see whether this assumption was reasonable.

To do so, we’ll need first re-create our problem with a constant in our formulation for X2. This is because the parameter we want to add will be on the interaction between a constant from the product data and log income in the agent data. Your X2 formulation should now be 1 + mushy + prices.

After re-initializing our problem with the extra constant, we need to configure our micro moment. To do so, you’ll have to configure three objects in the following order.

  1. Define a pyblp.MicroDataset \(d\).
  2. Define a pyblp.MicroPart \(p\).
  3. Define a pyblp.MicroMoment \(m\).

First, define a pyblp.MicroDataset \(d\) to represent the “income” survey. Choose a useful name, set the number of observations=100 from the survey, and set the market_ids to the list of market IDs that the survey covers. You also need to specify a function compute_weights, which defines how to compute sampling weights \(w_{dijt}\). You should look at the pyblp.MicroDataset documentation to understand how this function works. Within the two surveyed markets, the survey did not differentially select consumers except to only select those who purchased a cereal, \(j \neq 0\). So our sampling weights should be \(w_{dijt} = 1\\{j \neq 0\\}\). To implement this, you should define a function that for market \(t\) returns a \(I_t \times J_t\) matrix of ones. Equivalently, your function could return a \(I_t \times (1 + J_t)\) matrix with the first column being zeros, corresponding to \(j = 0\), and the final \(J_t\) columns being ones, corresponding to \(j \neq 0\). Not having a first column is just convenient PyBLP shorthand for setting it to zeros. Your function should look like

#| eval: false
 
compute_weights=lambda t, p, a: np.ones((a.size, p.size))

The arguments required for your function are t, the market ID, p, the Products subsetted to this market, and a, the Agents subsetted to this market. We’ll just return a matrix of the size required by the compute_weights argument.

Next, define a pyblp.MicroPart \(p\) to represent the expectation \(\mathbb{E}[y_{it} | j > 0]\). Choose a useful name, specify the configured dataset, and also specify a second function compute_values, which defines how to compute micro values \(v_{pijt}\). The function has the same arguments and output size as compute_weights above, except here we want to have a matrix where the same \(y_{it}\) value is repeated \(J_t\) times for each column. One convenient way to do this is with the np.einsum function, although there are many other ways, such as using np.repeat or np.tile.

#| eval: false
 
compute_values=lambda t, p, a: np.einsum('i,j', a.demographics[:, 0], np.ones(p.size))

Here, we’re just selecting the first (and only) column of demographics, log income, and repeating it for as many products as there are in the market.

Finally, define a pyblp.MicroMoment \(m\). Choose a useful name, specify the observed value=7.9 that we want to match, and for the parts argument, you can just pass the above configured micro part. This will specify the identity function \(f_m(v) = v\). You would use the other arguments compute_value and compute_gradient if you wanted to specify a more complicated function \(f_m(\cdot)\) and its gradient. In this case, you could specify a list of micro parts and these additional functions would select from these.

Given our micro moment, say income_moment, we can just pass a list with it as the only element to micro_moments=[income_moment] in .solve. Since our X2 configuration has an additional column for the constant, our sigma matrix needs another column/row, and our pi matrix needs another row. Set the new elements in both equal to zeros, except for the one in \(\Pi\) corresponding to the interaction between the new constant in X2 and log income, which you can set to some nonzero initial value, say 1. In practice, you’ll want to try out multiple random starting values.

Again, we’re just identified, so we should get an approximately zero objective at the optimum. If we saw “marching down the gradient” and have a near-zero gradient norm and positive Hessian eigenvalues at the optimum, we can look at our estimates. You should get a new parameter estimate of around -0.331. Does the new parameter estimate suggest that the original assumption of it being zero was fairly okay, or not?

4.4.2 2. Use the diversion statistics to estimate unobserved preference heterogeneity for a constant and mushy

In the last exercise, we were also unable to estimate parameters in \(\Sigma\) on a constant and mushy because there was no cross-market variation in the number of products or mushy. Instead, we again simply assumed these parameters were zero, and all the preference heterogeneity was from income. We’ll incorporate the second choice moments \(\mathbb{P}(k = 0 | j > 0) = 0.28\) and \(\mathbb{P}(\text{mushy}_j \text{ and } \text{mushy}_k | j > 0) = 0.31\) to estimate these parameters.

Since these numbers come from a different survey, you should define a new pyblp.MicroDataset. If the latter two statistics were instead based on the same responses, we should be defining all our micro moments on the same dataset. When defining compute_weights, the one difference from the income micro moment is that to include information about second choices, your function now needs to return an array with three dimensions, not a two-dimensional matrix, in order to define weights \(w_{dijkt}\), which now have an additional index \(k\). The last dimension is what defines \(k\). To allow for selecting individuals who select \(k = 0\), the final dimension should be of size \(1 + J_t\), for an array of dimensions \(I_t \times J_t \times (1 + J_t)\). Again, it should just be an array of all ones, since beyond the two markets, there was no differential selection of consumers, except that they chose cereal.

Then, you should define two new pyblp.MicroParts, one for each of the statistics, and two new pyblp.MicroMoments in the same way as before. Both are also just averages on the micro dataset, not complicated functions, so these should look similar to the last micro moment. The main difference is that compute_values will now return a \(I_t \times J_t \times (1 + J_t)\) array, with ones and zeros to choose micro values \(v_{pijkt}\) that implement the desired statistics.

When solving the problem, we just append the two new micro moments to our micro_moments list, set the new parameters in sigma to nonzero initial values, and re-optimize. Second choice computations can take some time, up to a few minutes.

After confirming that optimization seemed to have been successful, interpret the new parameters. How large is the unobserved preference heterogeneity for the inside (or equivalently, the outside) option. How large is it for the mushy characteristic?

4.4.3 3. Evaluate changes to the price cut counterfactual

Using the new estimates, re-run the same price cut counterfactual from the last two exercises. Re-compute percent changes and compare with those from day 2. Do substitution patterns and cannibalization estimates now look more reasonable?

4.5 Supplemental Questions

These additional questions will go beyond just defining micro moments, and will be useful to think about when doing BLP-style estimation in your own work.

4.5.1 1. See how your market size assumption affects results

In the first exercise, we made a somewhat arbitrary assumption about the size of the market. Vary this assumption, for example by assuming that the potential market size is two servings per day per individual in the city, instead of just one. Re-compute your market shares and re-estimate the model. See how the results of your price counterfactual change when you have a parameter in \(\Sigma\) on the constant, and when you assume that parameter is zero. In particular, compute percent changes to the counterfactual outside share \(s_{0t}\) and see how that changes.

In general, outside diversion will scale with the assumed potential market size, unless we include sufficient preference heterogeneity, particularly for the outside option. Directly matching a moment to pin down diversion to the outside option is a fairly clear way to estimate what diversion to the outside option should look like in a counterfactual.

4.5.2 2. Simulate some micro data and use it to match optimal micro moments

This exercise doesn’t come with a full dataset of consumer demographics and their choices, only with a few summary statistics, but we can simulate some to get a sense for how one might use all the information in a full micro dataset via optimal micro moments. To simulate some micro data, take one of your estimated models and use the .simulate_micro_data method, using your configured “Income” dataset and setting a seed. You may want to use pyblp.data_to_dict to get the simulated data into a format that you can pass to create a more easily-usable pd.DataFrame.

The simulated data should look like the assumed micro data in the “Income” survey, with one exception. The agent_indices column corresponds to the within-market row index of individual types \(i_n\) in your agent_data. This includes information about unobserved preference heterogenity, which the real micro dataset wouldn’t have. Compute the same agent_indices in your agent data using .groupby and .cumcount, and merge log_income into your simulated micro dataset. You can then drop the agent_indices column. The choice_indices column just represents the within-market row index of the respondent’s choice \(j_nt\), which is presumably observed in the “Income” dataset.

The resulting data should be in the same format as needed by .compute_micro_scores. You’ll just need to tell PyBLP how to over the unobserved preference heterogeneity you just dropped. One option is to use the integrate argument, but you can also replicate each row in your micro data for as many draws as you want, add a weights column equal to one over the number of draws, and add nodes0, nodes1, and nodes2 columns with standard normal draws. These two options will do the same thing, if you use the monte_carlo specification when configuring your integration configuration.

Given an estimated model and some micro data, this function computes the score (the derivative of the log likelihood of each micro observation with respect to the model’s nonlinear parameters) of each micro data observation. Specifically, the scores are returned as a list, one element for each parameter in .theta. For each element of theta, create a new micro moment that matches the mean score in the micro data.

In order to do so, you’ll have to specify compute_values in the corresponding pyblp.MicroPart. Each \(v_{pijt}\) should equal the score of a consumer of type \(i\) who chooses \(j\) in market \(t\). PyBLP has another convenient function for computing these: .compute_agent_scores. After passing your micro dataset to this function (and also configuring integration), it also return a list, one for each element of theta. Each element in the list is a dictionary mapping market IDs to the matrix that you need to pass to compute_values. In this function, you can use the t argument to directly select the right matrix.

One approach would be to replace the single \(\mathbb{E}[y_{it} | j > 0]\) micro moment from this dataset with all the optimal micro moments from this dataset. But to keep results comparable with before (and to maintain a just-identified model), try replacing this sub-optimal micro moment with the optimal micro moment based on scores for the parameter in \(\Pi\) that this original micro moment was supposed to target. Do results change much? What does this indicate?

4.5.3 3. Use a within-firm diversion ratio to estimate a nesting parameter

One dimension of preference heterogeneity that we have not modelled is within firm. In our price cut counterfactual, beyond mushyness and prices, we do not see more substitution within firm than across firms. However, there are good reasons to think that we might see more substitution within firm in reality. Consumers tend to be loyal to firms, and may prefer some firms to others for reasons that aren’t captured by our other observed characteristics.

Typically, a good way to estimate preference heterogeneity for a categorical variable is to have a separate random coefficient on a dummy variable for each category. We have done this for the categorical mushy category. However, for some categorical variables with many different categories, adding this many random coefficients would be computationally prohibitive, and there may not be enough variation in the data to do so. In our data, a dummy on each firm would be a lot of additional random coefficients.

Instead, a common choice is to estimate a parameter that governs within category correlation of the idiosyncratic preferences \(\varepsilon_{ijt}\). These categories are called nests, \(h\) in PyBLP notation, and the correlation parameter is called a nesting parameter, \(\rho\) in PyBLP notation. Without any random coefficients, we have the nested logit model. With random coefficients, we have the random coefficients nested logit (RCNL) model. See the RCNL part of the PyBLP documentation for more details.

Using aggregate variation, it is common to target a nesting parameter with an instrument that, for each product \(j\), counts the number of other products in the same nest in the same market \(t\). However, since we have no cross-market variation in this instrument, this is not a particularly credible way to identify \(\rho\), much in the same way that without cross-market variation, we have little hope of credibly identifying the parameters in \(\Sigma\). Indeed, a nesting structure is just a very particular type of random coefficient!

Instead, we will match a within-firm diversion ratio. Assume that in our diversion survey, we have a third statistic: \(\mathbb{P}(f(j) = f(k)) = 0.35\). That is, in the survey, 35% of respondents said they would select a product made by the same firm had their first choice cereal been unavailable. When setting up your problem, create a new column nesting_ids equal to firm IDs in your product_ids. Then when solving the problem, add an additional micro moment that matches this share, and set some nonzero initial value for the rho parameter (it needs to be between 0 and 1). Optimization may take a while because many of the numerical tricks PyBLP uses to make estimation fast don’t work when there’s a nesting parameter (particularly with second choices). Re-run the counterfactual and see whether it seems more reasonable, paying close attention to changes to within-firm cannibalization vs. cross-firm substitution.

References

Berry, Steven. 1994. “Estimating Discrete-Choice Models of Product Differentiation.” The RAND Journal of Economics 25 (2): 242–62. http://www.jstor.org/stable/2555829.
Berry, Steven, James Levinsohn, and Ariel Pakes. 1995. “Automobile Prices in Market Equilibrium.” Econometrica 63 (4): 841–90. http://www.jstor.org/stable/2171802.
McFadden, Daniel. 1974. “Conditional Logit Analysis of Qualitative Choice Behavior.” In Fontiers in Econometrics, edited by Paul Zarembka, 105–42. New York: Academic press.
Train, Kenneth E. 2009. Discrete Choice Methods with Simulation. 2nd ed. Cambridge University Press.

Footnotes

  1. You might have already noticed, but I kind of use variables without subscript as the vector of the variables. For example, \(\delta\) is just \((\delta_1, \ldots, \delta_J).\)↩︎