Elsevier

Journal of Hydrology

Volume 251, Issues 1–2, 15 September 2001, Pages 49-64
Journal of Hydrology

Detecting cumulative watershed effects: the statistical power of pairing

https://doi.org/10.1016/S0022-1694(01)00431-0Get rights and content

Abstract

The statistical power for detecting change in water quality should be a primary consideration when designing monitoring studies. However, some of the standard approaches for estimating sample size result in a power of less than 50%, and doubling the pre- and post-treatment sample size are necessary to increase the power to 80%. The ability to detect change can be improved by including an additional explanatory variable such as paired watershed measurements. However, published guidelines have not explicitly quantified the benefits of including an explanatory variable or the specific conditions that favor a paired watershed design. This paper (1) presents a power analysis for the statistical model (analysis of covariance) commonly used in paired watershed studies; (2) discusses the conditions under which it is beneficial to include an explanatory variable; and (3) quantifies the benefits of the paired watershed design. The results show that it is beneficial to include an explanatory variable when its correlation to the water quality variable of concern is as low as about 0.3. The ability to detect change increases non-linearly as the correlation increases. Power curves quantify sample size requirements as a function of the correlation and intrinsic variability. In general, the temporal and spatial variability of many watershed-scale characteristics, such as annual sediment loads, makes it very difficult to detect changes within time spans that are useful for land managers or conducive to adaptive management.

Introduction

In the United States hundreds of billions of dollars have been directed toward improving water quality, and there is a need to document whether these continuing expenditures are resulting in significant gains in water quality. Since most large point discharges are already regulated as individual sources, our most serious water quality problems are increasingly due to the cumulative effects of multiple point and nonpoint sources (USEPA, 1991, MacDonald, 2000). This means that monitoring efforts to evaluate change and assess the cost-effectiveness of pollution control efforts must increasingly be done at the watershed scale.

The detection of significant change is also a critical research issue (Reid, 1993). Much of the information on the effects of management activities on runoff and water quality has been derived from paired-watershed experiments. The general design of these experiments is that two similar catchments are monitored for a calibration period. A treatment is then imposed on one of the catchments while the other catchments are maintained as a control. Any change in the relationship between the treated and control catchments is considered a treatment effect (Wicht, 1967). To interpret the results of such studies we must have a means of determining when significant change has occurred, and to design such studies we must be able to estimate what duration of monitoring is likely to be required to detect a given change. This information is also needed to guide the design of projects to monitor water quality (Ward et al., 1990, MacDonald et al., 1991).

The statistical analyses of paired-watershed studies date back to the 1940s (Wilm, 1949, Kovner and Evans, 1954, and there is a growing body of literature on the detection of hydrologic change (see review by Esterby, 1996). More recently the US Environmental Protection Agency (USEPA) has published several papers and reports on the detectability of change associated with nonpoint source pollution from agricultural lands, forestry activities, and urban areas (USEPA, 1997a, USEPA, 1997b, USEPA, 1997c).

The procedures presented in these papers can be applied to detecting the effect of an imposed treatment at a single station, or to the effect of treating one catchment in a paired watershed design. The ability to detect a statistically significant change at a single station is a function of the sample size prior to and after treatment, the variability in each data series, the chosen level of significance, and the magnitude of the change induced by the treatment. In a paired-watershed design, the detectability of change will also depend on the strength of the correlation between the variable of interest on the treated watershed and the corresponding variable on the control watershed. Hence the paired watershed design represents a special case of adding an explanatory variable (i.e., the corresponding values of the variable of interest from the control basin). The same statistical model applies to a single station when an additional explanatory variable-such as precipitation is correlated with the water quality variable of interest. Although the statistical model could be extended to include multiple explanatory variables, we shall confine our discussion to the benefits of adding a single explanatory variable.

In designing studies to detect change, the statistical power for detecting change should be a primary consideration (Lettenmaier, 1976, Hirsch et al., 1982, Ward et al., 1990, MacDonald et al., 1991). Nevertheless, many studies still do not adequately consider power, which is defined as the probability that a statistical test will detect a change of a given magnitude. As a result, sample size calculations may result in a low probability of detecting change, and the issue of power in paired watershed studies has not been fully recognized or explained. Previous studies also have not explicitly quantified the benefits that can accrue from using a paired watershed design or including another explanatory variable in the analysis. Similarly, previous studies have not evaluated the cost of including an additional explanatory variable when in fact there is little or no correlation with the variable of interest.

This paper addresses these limitations by: (1) presenting a power analysis for the statistical (analysis of covariance) model commonly used in paired-watershed studies; (2) determining when it is beneficial to include an explanatory variable, such as paired observations from a control watershed; and (3) quantifying the benefits of including an explanatory variable in terms of an improved ability to detect change. The results should provide a more realistic basis for managers and researchers to design their studies and water quality monitoring programs.

Section snippets

Physical situation

For the purpose of this paper, let us assume that we are interested in detecting the effect of some actions imposed at roughly the same time on a treatment watershed. Let us further assume that we have made, or can make, water quality observations for some period of time both before treatment (calibration period) and after treatment (treatment period). For convenience we shall use annual sediment load as the variable of interest. However, the approach is not limited to this variable or time

Concept of MDE

The concept of MDE (Bunte and MacDonald, 1999), or minimum detectable change, MDC (Spooner et al., 1987), is defined as the smallest change in the average value of a given water quality variable that would be considered statistically significant. This same concept was referred to as the smallest significant difference in some of the early work on paired-watershed studies (Wilm, 1949, Kovner and Evans, 1954). Both Spooner et al., 1987, Bunte and MacDonald, 1999 consider the MDE to be the change

Confidence interval for difference in means

The simplest statistical model for representing observed values of annual sediment load for either the calibration or treatment period isyi=μ+εiwhere yi is the observed value of annual sediment load for year i, μ the long-term mean annual sediment load, and εi is the random noise term, assumed here to be independent and normally distributed with mean zero and standard deviation σε.

The assumption of independence will be violated when there is serial dependence that is not caused by the treatment

Paired watershed approach

One of the biggest problems in detecting a change in water quality is the high level of variability due to the variability in precipitation or other causal processes. As discussed in Bunte and MacDonald (1999), the CV of annual sediment loads on small undisturbed watersheds is typically close to 100%. With increasing variability there is a corresponding increase in the time period needed to characterize the means and detect significant change.

An approach to account for and effectively reduce

Summary and conclusions

The ability to detect a step change was explored for two different statistical models. The first model examined the MDE and the power to discern a step change in means. The second model included an additional explanatory variable as commonly applied in paired watershed experiments. For both models the MDE was defined as a change in the mean in the variable of interest that was just larger than the half-width of the calculated confidence interval for that variable. An extended power analysis

Acknowledgements

The work reported here was supported by the U.S. Environmental Protection Agency under Agreement X 825789-01-0, and this built on an earlier study supported by the National Council for Air and Stream Improvement.

References (34)

  • J.S.G. McCulloch et al.

    History of forest hydrology

    Journal of Hydrology

    (1993)
  • A.H. Bowker et al.

    Engineering Statistics

    (1972)
  • G.W. Brown et al.

    Clear-cut logging and sediment production in the Oregon coast range

    Water Resources Research

    (1971)
  • Bunte, K., MacDonald, L.H., 1999. Scale Considerations and the Detectability of Sedimentary Cumulative Watershed...
  • Clausen, J.C., Spooner, J., 1993. Paired Watershed Study Design. EPA 841-F-93-009, US Environmental Protection Agency,...
  • W.J. Conover

    Practical Nonparametric Statistics

    (1980)
  • S.R. Esterby

    Review of methods for the detection and estimation of trends with emphasis on water quality applications

    Hydrological Processes

    (1996)
  • R.O. Gilbert

    Statistical Methods for Environmental Pollution Monitoring

    (1987)
  • Grabow, G.L., Spooner, J., Lombardo, L., Line, D.E., 1998. Detecting water quality changes before and after BMP...
  • Grabow, G.L., Spooner, J., Lombardo, L., Line, D.E., 1999. Detecting water quality changes before and after BMP...
  • G.E Grant et al.
  • F.A. Graybill

    Theory and Application of the Linear Model

    (1976)
  • D.R. Helsel et al.

    Statistical Methods in Water Resources

    (1992)
  • R.M. Hirsch et al.

    Techniques of trend analysis for monthly water quality data

    Water Resources Research

    (1982)
  • J.L. Kovner et al.

    A method for determining the minimum duration of watershed experiments

    Transactions American Geophysical Union

    (1954)
  • D.P. Lettenmaier

    Detection of trends in water quality data from records with dependent observations

    Water Resources Research

    (1976)
  • J.C. Loftis et al.

    Considerations of scale in water quality monitoring and data analysis

    Water Resources Bulletin

    (1991)
  • Cited by (56)

    • Assessing the influence of urban greenness and green stormwater infrastructure on hydrology from satellite remote sensing

      2022, Science of the Total Environment
      Citation Excerpt :

      Experimental designs that provide a robust accounting for watershed factors that contribute to hydrologic variability have the best chance to draw causal linkages between GSI and catchment-scale hydrologic changes (Jarden et al., 2016). Yet most studies have relied on limited data using a paired watersheds approach (e.g., Dietz and Clausen, 2008; Hager et al., 2013; Roy et al., 2008; Yang and Li, 2013), which can have severe limitations for detecting differences, especially when treatment levels are low and time extents are short (Loftis et al., 2001). As a result, direct empirical data that support watershed-scale system responses to GSI remain scant.

    • Power analysis for detecting the effects of best management practices on reducing nitrogen and phosphorus fluxes to the Chesapeake Bay Watershed, USA

      2022, Ecological Indicators
      Citation Excerpt :

      In a recent analysis of agricultural drainages, Murphy et al. (2020) concluded through a simulation study that natural variability in event flow and concentration likely hampered power to detect the effects of BMPs, leading to inconsistent results across different drainages. Through a power analysis Loftis et al. (2001) determined that standard approaches for choosing sample sizes for paired watershed studies resulted in power levels less than 50%, and pre and post management action sample sizes would need to be doubled to obtain acceptable power levels. Studying the effects of nutrient reduction on water quality in small agricultural watersheds, Wellen et al. (2020) found it would likely take decades to centuries worth of data to detect the effects of a 20% reduction in nutrients.

    • An analysis of the sample size requirements for acceptable statistical power in water quality monitoring for improvement detection

      2020, Ecological Indicators
      Citation Excerpt :

      Past work on statistical power and sample size requirements in the context of water quality has typically focused on trend analyses (e.g. Irvine et al., 2012). While previous research has emphasized the advantages of paired basin experiments (Loftis et al., 2001; Bishop et al., 2005; King et al., 2008), most watershed scale evaluations of conservation measures are performed on large basins, where replication is not possible (Tomer and Locke, 2011). A recent analysis by the Northeast-Midwest Institute suggested that a hypothesis test of monthly total phosphorus concentrations would require roughly 10 years of monitoring data to have confidence in the results of hypothesis tests (Betanzo et al., 2015).

    • Effect of contemporary forest harvesting practices on headwater stream temperatures: Initial response of the Hinkle Creek catchment, Pacific Northwest, USA

      2013, Forest Ecology and Management
      Citation Excerpt :

      This adjustment facilitates intuitive interpretation of intercept data, and we normalize reference stream data in this way for maximum, mean, and minimum stream temperatures. We performed analysis of variance (ANOVA) using SAS software (version 9.1, SAS Corporation, Cary, NC) on slope and intercept parameters derived from reference-treatment regression plots, with year as a fixed effect and stream as a random effect (Meredith and Stehman, 1991; Loftis et al., 2001). We assessed the effect of treatment on each coefficient with a single degree of freedom contrast of average pre-treatment years versus the single post-treatment year.

    View all citing articles on Scopus
    1

    Fax: +1-970-491-6307

    2

    Fax: +1-970-491-7895

    3

    Fax: +1-970-491-8671

    View full text