Prediction Revisited. Mark P. Kritzman
Чтение книги онлайн.

Читать онлайн книгу Prediction Revisited - Mark P. Kritzman страница 7

Название: Prediction Revisited

Автор: Mark P. Kritzman

Издательство: John Wiley & Sons Limited

Жанр: Ценные бумаги, инвестиции

Серия:

isbn: 9781119895596

isbn:

СКАЧАТЬ and alignment of their many features, but also the typical variation and covariation of those features across a broader sample. We applied the method first to compare periods in time, each characterized by its economic circumstances or the returns of financial assets, and this led to other uses. We were impressed by the method's potential to tackle familiar problems in new ways, often leading to new paths of understanding. This eventually led to our own discovery that the prediction from a linear regression equation can be equivalently expressed as a weighted average of the values of past outcomes, in which the weights are the sum of two Mahalanobis distances: one that measures unusualness and the other similarity. Although we understood intuitively why unusual observations are more informative than common ones, it was not until we connected our research to information theory that we fully appreciated the nuances of the inverse relationship of information and probability.

      Our focus on observations led us to the insight that we can just as well analyze data samples as collections of pairs rather than distributions of observations around their average. This insight enabled us to view variance, correlation, and R-squared through a new lens, which shed light on statistical notions that are commonly accepted but not so well understood. It clarified, for example, why we must divide by N – 1 instead of N to compute a sample variance. It gave us more insight into the bias of R-squared and suggested a new way to address this bias. And it showed why we square distances in so many statistical calculations. (It is not merely because unsquared deviations from the mean sum to zero.)

      But our purpose goes beyond illuminating vague notions of statistics, although we hope that we do this to some extent. Our larger mission is to enable researchers to deploy data more effectively in their prediction models. It is this quest that led us down a different path from the one selected by the founders of classical statistics. Their purpose was to understand the movement of heavenly bodies or games of chance, which obey relatively simple laws of nature. Today's most pressing challenges deal with esoteric social phenomena, which obey a different and more complex set of rules.

      The emergent approach for dealing with this complexity is the field of machine learning, but more powerful algorithms introduce complexities of their own. By reorienting data-driven prediction to focus on observation, we offer a more transparent and intuitive approach to complexity. We propose a simple framework for identifying asymmetries in data and weighting the data accordingly. In some cases, traditional linear regression analysis gives sufficient guidance about the future. In other cases, only sophisticated machine learning algorithms offer any hope of dealing with a system's complexity. However, in many instances the methods described in this book offer the ideal blend of transparency and sophistication for deploying data to guide us into the future.

      Practitioners have difficult problems to solve and often too little time. Those on the front lines may struggle to absorb everything that technical training has to offer. And there are bound to be many useful ideas, often published in academic articles and books, that are widely available yet seldom used, perhaps because they are new, complex, or just hard to find.

      Most of the ideas we present in this book are new to us, meaning that we have never encountered them in school courses or publications. Nor are we aware of their application in practice, even though investors clearly thrive on the quality of their predictions. But we are not so much concerned with precedence as we are with gaining and sharing a better understanding of the process of data-driven prediction. We would, therefore, be pleased to learn of others who have already come to the insights we present in this book, especially if they have advanced them further than we do in this book.

      We rely on experience to shape our view of the unknown, with the notable exception of religion. But for most practical purposes we lean on experience to guide us through an uncertain world. We process experiences both naturally and statistically; however, the way we naturally process experiences often diverges from the methods that classical statistics prescribes. Our purpose in writing this book is to reorient common statistical thinking to accord with our natural instincts.

      Let us first consider how we naturally process experience. We record experiences as narratives, and we store these narratives in our memory or in written form. Then when we are called upon to decide under uncertainty, we recall past experiences that resemble present circumstances, and we predict that what will happen now will be like what happened following similar past experiences. Moreover, we instinctively focus more on past experiences that were exceptional rather than ordinary because they reside more prominently in our memory.

      Natural Process

       Records experiences as narratives.

       Focuses on experiences that are like current circumstances.

       Focuses on experiences that are unusual.

      Classical Statistics

       Record experiences as data.

       Include observations irrespective of their similarity to current circumstances.

       Treat unusual observations with skepticism.

      The advantage of the natural process is that it is intuitive and sensible. The advantage of classical statistics is that by recording experiences as data we can analyze experiences more rigorously and efficiently than would be allowed by narratives. Our purpose is to reconcile classical statistics with our natural process in a way that secures the advantages of both approaches.

      We accomplish this reconciliation by shifting the focus of prediction away from the selection of variables to the selection of observations. As part of this shift in focus from variables to observations, we discard the term variable. Instead, we use the word attribute to refer to an independent variable (something we use to predict) and the word outcome to refer to a dependent variable (something we want to predict). Our purpose is to induce you to think foremost of experiences, which we refer to as observations, and less so of the attributes and outcomes we use to measure those experiences. This shift in focus from variables to observations does not mean we undervalue the importance of choosing the right variables. We accept its importance. We contend, however, that the choice of variables has commanded disproportionately more attention than the choice of observations. We hope to show that by choosing observations as carefully as we choose variables, we can use data to greater effect.