Detecting cumulative watershed effects: the statistical power of pairing
Introduction
In the United States hundreds of billions of dollars have been directed toward improving water quality, and there is a need to document whether these continuing expenditures are resulting in significant gains in water quality. Since most large point discharges are already regulated as individual sources, our most serious water quality problems are increasingly due to the cumulative effects of multiple point and nonpoint sources (USEPA, 1991, MacDonald, 2000). This means that monitoring efforts to evaluate change and assess the cost-effectiveness of pollution control efforts must increasingly be done at the watershed scale.
The detection of significant change is also a critical research issue (Reid, 1993). Much of the information on the effects of management activities on runoff and water quality has been derived from paired-watershed experiments. The general design of these experiments is that two similar catchments are monitored for a calibration period. A treatment is then imposed on one of the catchments while the other catchments are maintained as a control. Any change in the relationship between the treated and control catchments is considered a treatment effect (Wicht, 1967). To interpret the results of such studies we must have a means of determining when significant change has occurred, and to design such studies we must be able to estimate what duration of monitoring is likely to be required to detect a given change. This information is also needed to guide the design of projects to monitor water quality (Ward et al., 1990, MacDonald et al., 1991).
The statistical analyses of paired-watershed studies date back to the 1940s (Wilm, 1949, Kovner and Evans, 1954, and there is a growing body of literature on the detection of hydrologic change (see review by Esterby, 1996). More recently the US Environmental Protection Agency (USEPA) has published several papers and reports on the detectability of change associated with nonpoint source pollution from agricultural lands, forestry activities, and urban areas (USEPA, 1997a, USEPA, 1997b, USEPA, 1997c).
The procedures presented in these papers can be applied to detecting the effect of an imposed treatment at a single station, or to the effect of treating one catchment in a paired watershed design. The ability to detect a statistically significant change at a single station is a function of the sample size prior to and after treatment, the variability in each data series, the chosen level of significance, and the magnitude of the change induced by the treatment. In a paired-watershed design, the detectability of change will also depend on the strength of the correlation between the variable of interest on the treated watershed and the corresponding variable on the control watershed. Hence the paired watershed design represents a special case of adding an explanatory variable (i.e., the corresponding values of the variable of interest from the control basin). The same statistical model applies to a single station when an additional explanatory variable-such as precipitation is correlated with the water quality variable of interest. Although the statistical model could be extended to include multiple explanatory variables, we shall confine our discussion to the benefits of adding a single explanatory variable.
In designing studies to detect change, the statistical power for detecting change should be a primary consideration (Lettenmaier, 1976, Hirsch et al., 1982, Ward et al., 1990, MacDonald et al., 1991). Nevertheless, many studies still do not adequately consider power, which is defined as the probability that a statistical test will detect a change of a given magnitude. As a result, sample size calculations may result in a low probability of detecting change, and the issue of power in paired watershed studies has not been fully recognized or explained. Previous studies also have not explicitly quantified the benefits that can accrue from using a paired watershed design or including another explanatory variable in the analysis. Similarly, previous studies have not evaluated the cost of including an additional explanatory variable when in fact there is little or no correlation with the variable of interest.
This paper addresses these limitations by: (1) presenting a power analysis for the statistical (analysis of covariance) model commonly used in paired-watershed studies; (2) determining when it is beneficial to include an explanatory variable, such as paired observations from a control watershed; and (3) quantifying the benefits of including an explanatory variable in terms of an improved ability to detect change. The results should provide a more realistic basis for managers and researchers to design their studies and water quality monitoring programs.
Section snippets
Physical situation
For the purpose of this paper, let us assume that we are interested in detecting the effect of some actions imposed at roughly the same time on a treatment watershed. Let us further assume that we have made, or can make, water quality observations for some period of time both before treatment (calibration period) and after treatment (treatment period). For convenience we shall use annual sediment load as the variable of interest. However, the approach is not limited to this variable or time
Concept of MDE
The concept of MDE (Bunte and MacDonald, 1999), or minimum detectable change, MDC (Spooner et al., 1987), is defined as the smallest change in the average value of a given water quality variable that would be considered statistically significant. This same concept was referred to as the smallest significant difference in some of the early work on paired-watershed studies (Wilm, 1949, Kovner and Evans, 1954). Both Spooner et al., 1987, Bunte and MacDonald, 1999 consider the MDE to be the change
Confidence interval for difference in means
The simplest statistical model for representing observed values of annual sediment load for either the calibration or treatment period iswhere yi is the observed value of annual sediment load for year i, μ the long-term mean annual sediment load, and εi is the random noise term, assumed here to be independent and normally distributed with mean zero and standard deviation σε.
The assumption of independence will be violated when there is serial dependence that is not caused by the treatment
Paired watershed approach
One of the biggest problems in detecting a change in water quality is the high level of variability due to the variability in precipitation or other causal processes. As discussed in Bunte and MacDonald (1999), the CV of annual sediment loads on small undisturbed watersheds is typically close to 100%. With increasing variability there is a corresponding increase in the time period needed to characterize the means and detect significant change.
An approach to account for and effectively reduce
Summary and conclusions
The ability to detect a step change was explored for two different statistical models. The first model examined the MDE and the power to discern a step change in means. The second model included an additional explanatory variable as commonly applied in paired watershed experiments. For both models the MDE was defined as a change in the mean in the variable of interest that was just larger than the half-width of the calculated confidence interval for that variable. An extended power analysis
Acknowledgements
The work reported here was supported by the U.S. Environmental Protection Agency under Agreement X 825789-01-0, and this built on an earlier study supported by the National Council for Air and Stream Improvement.
References (34)
- et al.
History of forest hydrology
Journal of Hydrology
(1993) - et al.
Engineering Statistics
(1972) - et al.
Clear-cut logging and sediment production in the Oregon coast range
Water Resources Research
(1971) - Bunte, K., MacDonald, L.H., 1999. Scale Considerations and the Detectability of Sedimentary Cumulative Watershed...
- Clausen, J.C., Spooner, J., 1993. Paired Watershed Study Design. EPA 841-F-93-009, US Environmental Protection Agency,...
Practical Nonparametric Statistics
(1980)Review of methods for the detection and estimation of trends with emphasis on water quality applications
Hydrological Processes
(1996)Statistical Methods for Environmental Pollution Monitoring
(1987)- Grabow, G.L., Spooner, J., Lombardo, L., Line, D.E., 1998. Detecting water quality changes before and after BMP...
- Grabow, G.L., Spooner, J., Lombardo, L., Line, D.E., 1999. Detecting water quality changes before and after BMP...
Theory and Application of the Linear Model
Statistical Methods in Water Resources
Techniques of trend analysis for monthly water quality data
Water Resources Research
A method for determining the minimum duration of watershed experiments
Transactions American Geophysical Union
Detection of trends in water quality data from records with dependent observations
Water Resources Research
Considerations of scale in water quality monitoring and data analysis
Water Resources Bulletin
Cited by (56)
Assessing the influence of urban greenness and green stormwater infrastructure on hydrology from satellite remote sensing
2022, Science of the Total EnvironmentCitation Excerpt :Experimental designs that provide a robust accounting for watershed factors that contribute to hydrologic variability have the best chance to draw causal linkages between GSI and catchment-scale hydrologic changes (Jarden et al., 2016). Yet most studies have relied on limited data using a paired watersheds approach (e.g., Dietz and Clausen, 2008; Hager et al., 2013; Roy et al., 2008; Yang and Li, 2013), which can have severe limitations for detecting differences, especially when treatment levels are low and time extents are short (Loftis et al., 2001). As a result, direct empirical data that support watershed-scale system responses to GSI remain scant.
Power analysis for detecting the effects of best management practices on reducing nitrogen and phosphorus fluxes to the Chesapeake Bay Watershed, USA
2022, Ecological IndicatorsCitation Excerpt :In a recent analysis of agricultural drainages, Murphy et al. (2020) concluded through a simulation study that natural variability in event flow and concentration likely hampered power to detect the effects of BMPs, leading to inconsistent results across different drainages. Through a power analysis Loftis et al. (2001) determined that standard approaches for choosing sample sizes for paired watershed studies resulted in power levels less than 50%, and pre and post management action sample sizes would need to be doubled to obtain acceptable power levels. Studying the effects of nutrient reduction on water quality in small agricultural watersheds, Wellen et al. (2020) found it would likely take decades to centuries worth of data to detect the effects of a 20% reduction in nutrients.
An analysis of the sample size requirements for acceptable statistical power in water quality monitoring for improvement detection
2020, Ecological IndicatorsCitation Excerpt :Past work on statistical power and sample size requirements in the context of water quality has typically focused on trend analyses (e.g. Irvine et al., 2012). While previous research has emphasized the advantages of paired basin experiments (Loftis et al., 2001; Bishop et al., 2005; King et al., 2008), most watershed scale evaluations of conservation measures are performed on large basins, where replication is not possible (Tomer and Locke, 2011). A recent analysis by the Northeast-Midwest Institute suggested that a hypothesis test of monthly total phosphorus concentrations would require roughly 10 years of monitoring data to have confidence in the results of hypothesis tests (Betanzo et al., 2015).
Effect of contemporary forest harvesting practices on headwater stream temperatures: Initial response of the Hinkle Creek catchment, Pacific Northwest, USA
2013, Forest Ecology and ManagementCitation Excerpt :This adjustment facilitates intuitive interpretation of intercept data, and we normalize reference stream data in this way for maximum, mean, and minimum stream temperatures. We performed analysis of variance (ANOVA) using SAS software (version 9.1, SAS Corporation, Cary, NC) on slope and intercept parameters derived from reference-treatment regression plots, with year as a fixed effect and stream as a random effect (Meredith and Stehman, 1991; Loftis et al., 2001). We assessed the effect of treatment on each coefficient with a single degree of freedom contrast of average pre-treatment years versus the single post-treatment year.