Asymptotic results are presented in Section 3. The asymptotic framework is motivated by the application to macroeconomic forecasting. Because the number of time series (N) far exceeds the number of observation dates (T), N and T are modeled as tending to infinity, but T/N -*> 0. Because macroeconomic theory does not clearly suggest finitely many factors, the number of factors (r) is treated as tending to infinity, but much more slowly than T. Because r is not known, the number of estimated factors (k) is not assumed to equal the number of true factors. In this framework, it is shown that, if k>r, the estimated factors are uniformly consistent (they span the space of the true factors, uniformly in the time index). Given this result and some additional conditions, it is then shown that, if k>r, an information criterion will consistently estimate the number of factors entering the forecasting equation for the variable of interest, and the resulting forecasts are as efficient asymptotically as if the true factors were observed. These theoretical predictions are examined in and supported by a Monte Carlo experiment reported in section 4.

In section 5, these methods are used to produce monthly forecasts of the twelve-month growth of industrial production (IP) and the twelve-month growth of the consumer price index (CPI) in the United States. The full data set spans 1959:1-1997:9. Factors are extracted and forecasts are made for a balanced panel of 170 time series and an unbalanced panel of 224 time series. These diffusion index forecasts perform well in a simulated real-time forecasting comparison with several state of the art benchmark multivariate models.

This research is related to two bodies of literature. The first is a relatively small literature in which dynamic factor models have been applied to macroeconomic data. Geweke (1977) and Sims and Sargent (1977) analyzed these models in the frequency domain for a small number of variables. Engle and Watson (1981), Sargent (1989), and Stock and Watson (1991) estimated small-N parametric time domain dynamic factor models by maximum likelihood. Quah and Sargent (1993) used the EM algorithm to extend this approach to a moderate number of series (N -60).