# Basic Regression Analysis Explained for Attorneys New to Antitrust

Regression analysis is a statistical tool used by economists, statisticians, and others to “understand the relationship between or among two or more variables.”[1] It is perhaps one of the most widely used tools by economic experts in antitrust analysis, central to debates over the estimation of damages in price fixing cases, assessments of market power in monopolization cases, and defining relevant markets in merger reviews.[2] Any attorney new to the antitrust space will invariably encounter regression analysis frequently. In this article, we provide a brief introduction to the regression methodology.

A regression is a statistical tool that combines economic theory and empirical data and methods to answer questions rigorously. There are two types of variables included in a regression analysis – “dependent variables” and “independent variables” (also called “explanatory variables” or “covariates”). A “dependent variable” can be understood as the outcome of something you are trying to explain that depends on other factors. “Independent variables” are factors that could potentially change the “dependent variable” but themselves should not be affected by the change of other “independent variables” you include in the regression. By estimating the statistical relationship between the observed outcomes (“dependent variable”) and “independent variables” that potentially affect the “dependent variable,” we can analyze diverse data to “evaluate competing theories about the relationships that may exist among a number of explanatory facts.”[3]

To better see how a regression model works, take a hypothetical example—the pricing of avocados. Suppose we are interested in understanding what determines the prices of avocados sold in the United States. In this case, the “dependent variable” is the price of avocados. To estimate a regression, one would need data on the prices of avocados, potentially over time, across geographies, and even by individual customers. The economist relies on economic theory and their research and understanding of the avocado industry to then ask: *What factors likely determine the prices of avocados sold to consumers?*

Economists generally think of two broad categories of price determinants—supply factors and demand factors. Supply factors are variables that can affect “the quantity of a good that producers are willing to sell at a given price.”[4] For example, one supply factor that could change avocado prices would be the cost of producing avocados in the United States, such as raw material input costs, labor costs, and utilities. You would expect prices to be lower if the cost of producing avocados decreased, as the seller would require less money to produce the same amount of avocados. Demand factors are variables that can affect “how much of a good consumers are willing to buy.”[5] On the demand side, factors like increased income or the promotion of avocado recipes on social media could motivate people to consume more avocados, driving up avocado prices. Therefore, with available data on our dependent variable avocado prices, and independent variables such as labor costs, raw material costs, utilities, and per capita income, we could use a regression model to quantify how each of those supply and demand factors affect the price of avocados. The relationship between the dependent variable and the independent variable is quantified by a number, referred to as the “coefficient.” By studying the value of the coefficient, both the direction and magnitude, we could estimate the extent of the change in avocado price, given a change in per capita income, holding all other independent variables constant.[6] So in this case, a *positive* coefficient estimate for per capita income would represent prices for one avocado in the US *increased* as income increased.

A regression like the one mentioned above, which can account for multiple factors at the same time, is also referred to as being “multivariate.” In the avocado example, if we had included per capita income as the only factor in the regression, then the model would not be complete, as we had already identified other factors that affect avocado prices. It could be the case that avocado prices are higher in areas where there are higher labor costs observed *and* higher per capita income. Without controlling for labor costs, the estimated coefficient of per capita income could potentially include effects from labor costs that were not accounted for (this issue is also referred to as “omitted variable bias”). By controlling for all relevant factors, the regression would then be able to produce more accurate and informative results.

Once the regression has been run and the resulting coefficients have been interpreted, another important step of the regression analysis is to test how meaningful the results are—whether the effect of the independent variable on the outcome is merely *due to chance*. The common standard is that the coefficient demonstrates statistical significance at five percent, meaning that the likelihood that the relationship between the independent variable and the dependent variable is simply due to chance, is *less* than five percent.[7]

To summarize, performing a regression analysis is more than just running a program, it involves the thoughtful process of assessing all factors that could be of interest, gathering available data, carefully designing a model to run, and interpreting and testing the results to ensure that they are sensible and informative.

[1] Rubinfeld, Daniel L. “Reference Guide on Multiple Regression,” Reference Manual on Scientific Evidence: Third Edition, Ch. 8, 2011 (“Reference Guide on Multiple Regression”), p. 305.

[2] For example, regression analysis was used in Winn Dixie Stores, et al v. Eastern Mushroom Marketing Cooperative Inc, et al (2022), Piggly Wiggly Clarksville, Inc. v. Interstate Brands Corp. (2004), and In re: Pool Products Distribution Market Antitrust Litigation, Case No. 12-md-2328-SSV (E.D. La. 2012).

[3] American Bar Association, Econometrics 1–2 (2005), p. 4.

[4] Pindyck, Robert S. and Rubinfeld, Daniel L., “Microeconomics,” 8th ed., 2013 (“Microeconomics”), p. 22.

[5] Microeconomics, p. 23.

[6] *See* Gujarati, Damodar N., Basic Econometrics, 4th ed. McGraw Hill 2003, p. 18.

[7] Reference Guide on Multiple Regression, p. 194.

## Experts

- Monica (Sidong) ZhongManaging Consultant