# Basics of Regression Analysis for Pay Equity Studies

Companies may be interested in assessing their compensation policies for a number of reasons. In a survey by the Society for Human Resource Management, three-quarters of firms responded that they regularly audit for pay equity across a variety of characteristics such as gender, race or ethnicity, and age.^{[1]}

Before undertaking this type of analysis, it is helpful to understand the different tools available to assess pay equity. For example, firms may compare the pay of an individual employee to that of employees they have identified as performing similar work at a similar level. This type of comparison may be very individualized as employees can be paid differently for several reasons. To assess larger groups of employees—and account for multiple different factors that impact pay at once—firms may engage economists to perform pay equity studies using regression models.

**What is a regression model?**

Regression models are used to measure the relationship between one outcome (e.g., annual rate of pay) and one or more other factors. When modeling pay, these would include the multiple factors that impact an employee’s pay such as their position, level of seniority, or professional background. As discussed in more detail below, these included factors are key to a model’s reliability—if important factors are omitted (or measured improperly), the results of the analysis will not be reliable.

Below is a stylized example of a regression model that could be used to assess pay equity across male and female employees:

In this example, the regression estimates the relationship between pay and:

- Gender;
- Characteristics of each employee’s job such as the subject matter or level of responsibility for that job; and
- Individual characteristics of each employee such as performance, educational attainment, or previous work experience.

For each of the factors included, the regression model estimates the average relationship—also referred to as the average “effect”—of that specific factor on pay across employees. Generally, an employee’s pay cannot be explained fully by the factors included in a model—the difference between what can be explained and their actual pay is captured by ε, also referred to as the “error term.” Although this is referred to as the “error term,” it is not a mistake made by the regression. This term captures the effect of any factors that impact pay not included within the model.

In this example, the estimated “effect” of gender is the measure of interest. This “effect” estimates the difference, on average, in pay between male and female employees after accounting for the other factors included within the model.^{[2] }In addition to estimating the size of the “effect,” regression models also estimate if this relationship is statistically significant (i.e., not due to random chance).^{[3]}

If a regression model accounting for all available relevant factors finds a negative and statistically significant effect for the group of interest, this indicates that these employees are paid less, on average, than other employees included in the analysis holding constant the included factors. This result would indicate that these groups may require further investigation to determine if this average difference was due to, for example, a relevant factor that impacted pay not being available within a firms’ HR database or an actual difference in pay between groups.

**What questions should I ask before undertaking a pay equity regression analysis?**

One question to ask is, which employees should be included in the analysis? It is important to keep in mind that regression models compare the compensation across all of the employees included in the analysis. If, for example, one was to apply the illustrative model above to employees in two different business units, the regression requires that the included factors have exactly the same impact on the pay for employees at both business units. However, it may be the case that these two business units determine pay using entirely different processes.^{[4]}

The results of a regression model can only tell you the average relationship between a given factor and pay. It is important to understand that averages may obscure underlying variation across employees being studied such as differences in the value of advanced degrees for specific types of jobs or additional years of experience for more senior positions.

The next question to ask is, what are the factors that should be accounted for in this regression analysis? That is, what are the business-related determinants of pay for the included employees? The specific determinants of pay will depend on the firm itself as well as the type of job.^{[5] }For example, firms may pay a premium for employees with certain certifications, which would need to be accounted for in a regression model.

The factors included within a regression model are key, as unexplained differences in pay may be incorrectly attributed to the characteristic you are studying (e.g., gender, age, race or ethnicity) if relevant factors are omitted. This is particularly problematic if the factor omitted is correlated with the characteristic you are studying within the included employees. To illustrate this issue, assume that you are studying a group of ten employees with the same job title. This group includes five women and five men who have a range of salaries, shown in the example below. This chart shows each employee’s salary ordered from lowest to highest, with female employees shown in blue and male employees in green. In this hypothetical example, the female employees have lower salaries, on average, than their male counterparts.

Only reviewing the average pay of employees within the same job title may miss key differences across these employees. The chart below looks at these same employees, but now groups them by highest degree attained, which shows that employees with bachelor’s degrees have lower salaries than employees with master’s degrees. In fact, once this difference is accounted for, we see that male and female employees with the same type of degree have similar salaries.

Like this simple comparison of average salaries across employees, if a regression model omits key factors, it too may find an average “effect” of gender on pay even where no difference exists. This is why it is important to identify the determinants of pay before undertaking a pay equity analysis.

**What are some limitations of regression analysis?**

In interpreting the results of a regression model, it is important to consider potential limitations of the analysis. Often, robust data that includes all of the relevant factors that determine pay is not available to be analyzed. For a pay equity study, many of the relevant data points are tracked by firms themselves within their employment and payroll database systems, but some information such as an employee’s prior experience (and the relevance of that experience to their current job) or performance metrics may not be consistently tracked.^{[6]}

If data is not available for a specific factor, there may be other available data points that can be used as “proxies” for missing information. For example, age at hire can be used as a proxy for prior experience as these two measures are generally correlated (i.e., employees that are hired at older ages are more likely to have more prior experience). However, using age at hire would not account for the type of prior experience, the relevance of that experience to the employee’s current role, or any time spent out of the workforce.

It is important to consider what factors not included within the model may be key determinants of pay. A regression may find a significant relationship between pay and the included factors even if pay varies widely among employees in the same job and demographic group with similar characteristics. In this case, the regression model may be missing key elements that explain these differences in pay, but because these factors are omitted, the differences are included in the unexplained “error term.”

It is also important to remember that regression models are useful tools to determine average relationships. However, an average finding does not necessarily indicate that the same is true across different groups included in the analysis, e.g., different business units or different types of jobs.

In undertaking a regression analysis, economists may also perform “sensitivity testing,” where, for example, the same model is applied to different groups of employees or a different set of factors are included. Finding different results across different groups (or using a different set of relevant factors) indicates that the average relationship the regression found may not be applicable across employees. For example, if the results change when additional factors are included, this may indicate that the original model omitted relevant factors.

Ultimately, the reliability of the results of a regression model depends on the specification and the available data to be tested. It is important to assess these factors when interpreting the results of a regression analysis.

### CITATIONS

[1] https://www.shrm.org/topics-tools/news/inclusion-equity-diversity/research-pay-equity-audits.

[2] The same would apply if the pay equity analysis were studying some other characteristic, e.g., employees’ age. In that case, the estimated “effect” for a variable indicating employees 40 or older would estimate the difference, on average, in pay between employees under 40 compared to those 40 and over.

[3] Generally, results are deemed to be “statistically significant” if the identified result would occur by chance only 5 percent or less of the time. Statistical significance may also be described in terms of “standard deviations,” with approximately two standard deviations or higher indicating statistically significant results.

[4] Certain types of statistical tests, such as a Chow Test, can be used to determine if it is appropriate to include multiple groups in one model, see for example, Jeffrey M. Wooldridge, *Introductory Econometrics*, 5th ed. South-Western Cengage Learning, 2013, p. 245–248.

[5] For example, the California Equal Pay Act lists education, training, or experience as examples of “bona fide” factors that may explain differences in pay between employees performing “substantially similar work,” see https://www.dir.ca.gov/dlse/california_equal_pay_act.htm.

[6] Even if prior experience is tracked within firms HR data systems, one would need to compare experience across many different prior firms, some of which may be more relevant for an employee’s current job. Simply looking at the number of years of prior experience would not account for these differences.

## Experts

- Partner