Using Stata for a memory-saving fixed-effects estimation of the three-way error-components model
Researchers trying to estimate tens or hundreds of thousands of fixed effects for two or more groups (workers and firms; pupils, teachers and schools; etc.) in datasets with high numbers of observations are often limited by the size of computer memory available. Such a model is commonly estimated by sweeping out one of the effects by the fixed-effects transformation (time-demeaning) and by including the remaining effects as dummy variables. If K is the number of fixed effects to be included as dummy variables, and N is the number of observations, then the design matrix is of dimension N x K (neglecting any remaining right-hand-side regressors). The time-demeaned dummies have to be stored as “float” variables consuming 8 bytes per cell in Stata. For example, with 2 million observations (N) and 10 thousand fixed effects (K), the memory requirement would be 160 gigabytes. This paper describes how the memory requirement can be reduced to store only a K x K matrix, which in the given example reduces the memory requirement to below 1 gigabyte. The paper also describes the Stata program felsdvreg.ado, which implements the method in Mata. Besides implementing the memory-saving estimation method, the program also takes care of checking the identification of the effects and provides useful summary statistics.