Recent advances in information technology now make it possible to access in real time, at a reasonable cost, literally thousands of economic time series for major developed economies.
This raises the prospect of a new frontier in macroeconomic forecasting, in which a very large number of time series are used to forecast a few key economic quantities such as output or inflation. Time series models currently used for macroeconomic forecasting, however, incorporate only a handful of series: vector autoregressions, for example, typically contain a half-dozen to one dozen variables, rarely more. Although thousands of time series are available in real time, a theoretical framework for using these data for time series forecasting remains undeveloped.

This paper addresses the problem of forecasting a single time series using a very large number of predictors, potentially many more predictors than dates at which the time series are observed. Our approach is motivated by the diffusion indexes developed by business cycle analysts at the National Bureau of Economic Research (NBER). These indexes are averages of contemporaneous values of a large number of time series; a classic use of a diffusion index is to measure whether a recession or expansion is widespread throughout the economy. Because it is an average of many variables, a diffusion index summarizes the information in a large number of economic time series. In constructing diffusion indexes, NBER business cycle analysts exercised expert judgment to identify the series and the weight placed on each series in the index.
Section 2 provides a probability model in which diffusion indexes are interpreted as estimates of the unobserved factors in a dynamic factor model, and discusses the estimation of these factors. This dynamic factor model has several important features.