"Data snooping occurs when a given set of data is used more than once for purposes of inference or model selection. When such data reuse occurs, there is always the possibility that any satisfactory results obtained may simply be due to chance rather than to any merit inherent in the method yielding the results. This problem is practically unavoidable in the analysis of time-series data, as typically only a single history measuring a given phenomenon of interest is available for analysis. It is widely acknowledged by empirical researchers that data snooping is a dangerous practice to be avoided, but in fact it is endemic. The main problem has been a lack of sufficiently simple practical methods capable of assessing the potential dangers of data snooping in a given situation. Our purpose here is to provide such methods by specifying a straightforward procedure for testing the null hypothesis that the best model encountered in a specification search has no predictive superiority over a given benchmark model. This permits data snooping to be undertaken with some degree of confidence that one will not mistake results that could have been generated by chance for genuinely good results."
"In this paper we utilize White's Reality Check bootstrap methodology (White (1999)) to evaluate simple technical trading rules while quantifying the data-snooping bias and fully adjusting for its effect in the context of the full universe from which the trading rules were drawn. Hence, for the first time, the paper presents a comprehensive test of performance across all technical trading rules examined. We consider the study of Brock, Lakonishok, and LeBaron (1992), expand their universe of 26 trading rules, apply the rules to 100 years of daily data on the Dow Jones Industrial Average, and determine the effects of data-snooping."
"Tests of financial asset pricing models may yield misleading inferences when properties of the data are used to construct the test statistics. In particular, such tests are often based on returns to portfolios of common stock, where portfolios are constructed by sorting on some empirically motivated characteristic of the securities such as market value of equity. Analytical calculations, Monte Carlo simulations, and two empirical examples show that the effects of this type of data snooping can be substantial."
"Economics is primarily a non-experimental science. Typically, we cannot generate new data sets on which to test hypotheses independently of the data that may have led to a particular theory. The common practice of using the same data set to formulate and test hypotheses introduces data-snooping biases that, if not accounted for, invalidate the assumptions underlying classical statistical inference. A striking example of a datadriven discovery is the presence of calendar effects in stock returns. There appears to be very substantial evidence of systematic abnormal stock returns related to the day of the week, the week of the month, the month of the year, the turn of the month, holidays, and so forth. However, this evidence has largely been considered without accounting for the intensive search preceding it. In this paper we use 100 years of daily data and a new bootstrap procedure that allows us to explicitly measure the distortions in statistical inference induced by data-snooping. We find that although nominal P-values of individual calendar rules are extremely significant, once evaluated in the context of the full universe from which such rules were drawn, calendar effects no longer remain significant."
"A real-time investor is one who must base his portfolio decisions solely on information available today, not using information from the future. Academic predictability papers almost always violate this principle via exogenous specification of critical portfolio formation parameters used in the backtesting of investment strategies. We show that when the choice of parameters such as predictive variables, traded assets, and estimation periods are endogenized (thus making the tests more real-time), all evidence of predictability vanishes. However, an investor with the correct specific sets of priors on predictive variables, assets, and estimation periods will find evidence of predictability. But since no real theory exists to guide one on the choice of the correct priors, finding this predictability seems unlikely. Our results provide an explanation for the performance gap between mutual funds and the academic market predictability literature, and carry important implications for asset pricing models, cost-of-capital calculations, and portfolio management."
"Data-snooping arises when the properties of a data series influence the researcher's choice of model specification. When data has been snooped, tests undertaken using the same series are likely to be misleading. This study seeks to predict equity market volatility, using daily data on U.K. stock market returns over the period 1955–1989. We find that even apparently innocuous forms of data-snooping significantly enhance reported forecast quality, and that relatively sophisticated forecasting methods operated without data-snooping often perform worse than naive benchmarks. For predicting stock market volatility, we therefore recommend two alternative models, both of which are extremely simple."