Abstract
The phosphorus index (PI) was developed as a field-scale assessment tool used to identify critical source areas of phosphorus (P) loss, thus most US states have adopted the PI as their strategy for targeted management and conservation practices for effective mitigation of P loss from agricultural landscapes to surface waters. Recent studies have focused on evaluating and updating PI weighting factors (WFs) to ensure agreement between PI values and measured losses of P. Given that the WF of each site characteristic is usually determined individually without considering possible interactions, the goal of this study was to demonstrate how artificial neural networks (ANNs) that consider real-world interdependence can be used to determine WFs. Our specific objectives were to evaluate ANN performance for predicting soluble P (SP) concentrations in tile effluent using site characteristics as predictor variables, and to evaluate whether ANN-generated WFs can be used to improve PI performance. Garson’s algorithm was used to determine the relative importance of each site characteristic to SP loss. Data from a monitored in-field laboratory were used to evaluate the ability of a PI with no WFs (PINO), a PI with WFs as proposed in the original Lemunyon and Gilbert PI (PILG), and a PI with ANN-generated WFs (PIANN), to estimate SP loss potential in tile discharge. Simulation results showed that the ANN model provided reliable estimates of SP in tile effluent (R2 = 0.99; RMSE = 0.0024). The relative importance analysis underscored the value of routine soil P testing for agronomic sufficiency for environmental stewardship, and highlighted the necessity of prioritizing both contemporary and legacy P sources during P loss risk assessments. Unlike the other PIs, PIANN was able to provide reasonable estimates of SP loss potential as illustrated with significant exponential relationships (R2 = 0.60; p < 0.001) between PIANN values and measured mean annual SP concentrations in tile effluent. These findings demonstrate that ANNs can be used to develop PIs with a strong correlation to measured SP.
Phosphorus (P) enrichment of fresh surface waters is a major water quality concern in many watersheds because of its role as the limiting nutrient for harmful and nuisance algal blooms (Sharpley et al. 1994). Significant progress has been made toward limiting point source inputs of P to P-sensitive waters. However, there has been marginal, and in most cases, elusive success in the remediation of nonpoint P sources, specifically diffuse P losses from agricultural fields (Dubrovsky et al. 2010; Kleinman et al. 2011). The degree of coincidence of P source and transport factors (critical source areas) control P movement from agricultural fields to surface waters (Sharpley et al. 2011). Therefore, the success of mitigation efforts lies in creating an understanding and representation of these two factors in P loss assessment tools (Sharpley et al. 2012; Gburek et al. 2002).
In 1993, Lemunyon and Gilbert proposed the Phosphorus Index (PI) to encompass source and transport factors controlling P movement from a field (Lemunyon and Gilbert 1993). Initially, the PI served as a voluntary, simple, educational, and qualitative screening tool for farmer identification of fields with high potential risks of P loss to runoff (Lemunyon and Gilbert 1993; Gburek et al. 2000). However, in response to the increasing water quality concern, the USDA’s Natural Resource Conservation Service (NRCS) has added the PI concept to its National Nutrient Management Conservation Practice Standard (Code 590) as one of the options available to states for P loss risk assessment (USDA NRCS 2013). In most US states, the NRCS-Nutrient Management Standard (code 590) requires the determination of PI solely or in combination with an agronomic or threshold soil test P (STP) value, i.e., a PI determination has to be done once the agronomic or threshold STP is exceeded (Sharpley et al. 2003, 2012). The original PI by Lemunyon and Gilbert (1993) was made up of eight site characteristics: soil erosion, irrigation erosion, runoff class, STP, inorganic P application rate and method, and organic P application rate and method. Each site characteristic was assigned a weighting factor (WF) to reflect its relative importance in contributing to P loss in runoff, and a rating value, i.e., 1 (low), 2 (medium), 4 (high), or 8 (very high), to represent increasing risk level. As an additive index, the final PI value for each site was obtained by multiplying each site characteristic’s weight with the corresponding rating value and summing up the resulting weighted characteristics (Lemunyon and Gilbert 1993).
To adapt the PI, many states embarked on revising and evaluating the original PI to accommodate local conditions and priorities (Sharpley et al. 2003). These modifications include the incorporation of additional site characteristics (e.g., degree of P saturation, connectivity to water bodies, subsurface drainage, best management practices, etc.), using measured field data to determine WFs, PI calculation (additive, multiplicative, or component indices), and the interpretation of the final PI values (Sharpley et al. 2003, 2013; Osmond et al. 2012). This diversity in the PI formulation and interpretation for similar situations among states made necessary a concerted effort to evaluate and validate existing PIs (Sharpley et al. 2013). One key finding from these evaluations is the great influence WFs assigned to site characteristics have on PI performance (Osmond et al. 2006). In consequence, these WFs have come under scrutiny, especially given that in many state PIs, WFs were initially assigned based on the professional judgement of experts or adapted from pre-existing PIs in neighboring states (Bolster et al. 2012; Sharpley et al. 2013, 2012; Drewry et al. 2011). Recently, more states have used findings from studies investigating the relationships between each site characteristic and measured P loss data to determine WFs leading to improved PI performance. For example, updated WFs based on measured P losses in the Kansas PI (soil erosion and STP WFs) (Sonmez et al. 2009) and Arkansas PI (STP and soluble reactive P WFs) (DeLaune et al. 2004) led to improved correlations between PI values and measured P losses.
In every state where WF determination was based on scientific data, the effect of each site characteristic on measured P loss was individually investigated (DeLaune et al. 2004; Sonmez et al. 2009; Bolster et al. 2012). Thus, the real-world synergistic and antagonistic effects among the PI source and transport characteristics were not considered (Gburek and Sharpley 1998; Sharpley et al. 2011). Artificial Neural Networks (ANNs) offer a novel approach to unravel complex nutrient loss dynamics in agricultural fields. An ANN is a computer-based system inspired by the learning process present in the vast network of neurons in the human brain (Lek and Guegan 1999). Similar to the neural networks in a human brain, an ANN is made up of interconnected processing units (neurons) organized in a predetermined topology (Lek and Guegan 1999). An ANN can handle both qualitative and quantitave data, to analyze both linear and nonlinear responses, and merge information (Schultz et al. 2000). Despite ANNs being more powerful predictive tools compared to traditional models such as linear regressions, multiple regressions, Soil and Water Assessment Tool (SWAT), the Haith’s Generalized Watershed Loading Function (GWLF), etc., most researchers shy away from ANNs due to existing criticism that they are “black boxes” (Olden and Jackson 2002; Benítez et al. 1997). Once fitted, an ANN model does not provide insights or details on the underlying relationships, relative importance (weights) of each input variable, and the structures of the covariates (inputs) with the modeled outcomes (Benítez et al. 1997). To overcome this weakness, numerous weight algorithms (e.g., partial derivatives [PaD], connection weights, Garson’s etc.) that interpret the connections and the contribution (weight) of ANN input factors have been developed (Olden and Jackson 2002). In consequence, ANNs coupled with relevant weight algorithms have recently received increased attention as potential tools suited to modeling input-output relationships in complex agricultural systems for which there is limited understanding (Yang et al. 2018; Liakos et al. 2018). One area of growth in ANN application is water quality modeling. Kaluli et al. (1998) successfully simulated nitrate (NO3−) leaching from agricultural fields and identified subirrigation, covercropping (corn [Zea mays L.] and ryegrass [Lolium multiflorum]) and a threshold nitrogen (N) rate (180 kg N ha−1) as possible ways to greatly reduce NO3− leaching from fields. Salehi et al. (2000) used ANNs to predict NO3− losses in drain outflows with their results revealing that ANNs accurately predicted NO3− loss using fewer input parameters but that the ANN model itself was site-specific (not transferable to other sites not studied). Kim et al. (2012) went further and compared the performance of ANNs with other existing nutrient models (SWAT and GWLF). Their results revealed that ANNs were as accurate or sometimes much more accurate in predicting watershed nutrient loading for various management strategies compared to SWAT and GWLF. Results from these studies and others (Sharma et al. 2003; Kim and Gilley 2008; Al-Mahallawi et al. 2012; Lallahem and Hani 2017), suggest that ANNs can be used to model nonpoint source agricultural nutrient loss to surface and groundwater. Additionally, building an ANN no longer requires advanced programming skills as several user-friendly ANN packages exist for use in open-source softwares, e.g., NeuralNet (Marcus 2018) and neuralnet (Fritsch et al. 2019), available for use in the R language environment (R Core Team 2017).
This study aimed to simulate soluble P (SP) concentrations in tile effluent using a multi-layer feed-forward artificial neural network (MLF ANN) trained by the backpropagation algorithm. The specific objectives of this study were to (1) evaluate MLF ANN model performance for predicting SP concentrations in effluent from subsurface drains with selected site characteristics as predictor variables, and (2) to compare the performance of a PI with MLF ANN-generated WFs (PIANN) to that of a PI with no WFs (PINO) and a PI with WFs as proposed in the original Lemunyon and Gilbert PI (PILG) (Lemunyon and Gilbert 1993), for predicting SP loss potential in tile discharge.
Materials and Methods
Selection of Input Variables for a Soluble Phosphorus Artificial Neural Network. The first step of this analysis was to determine the relevant site characteristics (input variables) governing SP loss (output variable) from tile drained fields. Input variables included a subset of those found in the Indiana Nutrient and Sediment Transport Risk Assessment Tool (NASTRAT) (IN-NRCS 2013), i.e., Bray/Mehlich 3 STP, soil erosion (water) (SE), surface runoff class (SR), subsurface drainage potential (SDP), and distance to waterbody (DTW), which have been considered as dominant input variables in previous P loss risk assessment studies (Nelson and Shober 2012). To meet the minimum criteria for risk assessment for P loss established by the NRCS in its Title 190 National Instruction, timing, rate and method of P (inorganic and organic) application were included in the analysis. The specific source variables were inorganic P fertilizer rate (FPR), inorganic P fertilizer application method and timing (FPA), organic P manure rate (OPR), and organic P manure application method and timing (OPA). Following the recommendation from Kleinman (2017) and Welikhe et al. (2020) to incorporate soil P sorption saturation into P loss risk assessment tools, P sorption capacity (PSC)-based environmental indices of P saturation ratio (PSR) and soil P storage capacity (SPSC) were included as additional input variables. The output of interest was SP, i.e., annual flow-weighted mean DRP concentrations (fDRP) (mg L−1).
Study Site and Creation of Data Sets. The creation of a robust ANN requires the use of a big data set that can be sufficiently divided into training, testing, and cross-validation subsets (Sinshaw et al. 2019; Berzina et al. 2009). Because a sufficiently large data set did not exist, this study followed the methods of Bolster et al. (2012) and Fiorellino et al. (2017), whereby an empirical data set (small data set) was used to generate a theoretical data set (big data set). These data sets were generated to represent well-managed agricultural fields with common cropping systems in Indiana. Here, well-managed agricultural fields refer to fields that adhere to state-established conservation practice standards for nutrient management and reduction of runoff and erosion processes (Indiana NRCS FOTG 2013). The theoretical data set with possible representative combinations of input and output variables was used to evaluate ANN perfomance for predicting SP losses in tile effluent, while the empirical data set consisting of actual (measured) site characteristics was used to test whether ANN-generated weights improved PI performance.
The empirical data set contained site characteristics, field management practices, soil, and water quality data collected from tile-drained plots at the Water Quality Field Station (WQFS), Purdue University. Together the treatments at the WQFS (table S1 in supplementary materials) provide an ideal opportunity to investigate P loss in tile effluent from well-managed fields that have received either no P or regular additions of either inorganic or organic P, and were either tilled or not tilled. Runoff and erosion is not monitored at the site; therefore data on measured P loss via these pathways were not available. However, it is important to note that the facility has little variation in slope. For in-depth details on the WQFS facility, management histories, equipment, and routine orthophosphate analytical protocols, see Ruark et al. (2009), Hernandez-Ramirez et al. (2011), and Welikhe et al. (2020).
Site characteristics and field management practices collected were those required as input into the ANN and PIs. Data were collected from the plots at the WQFS between 2011 and 2013 water years (e.g., October 1, 2010, to September 30, 2011, for water year 2011). In the NASTRAT (IN-NRCS 2013), similar to the Lemunyon and Gilbert (1993) PI, both P source and transport site characteristics are presented as categorical variables with discreet values assigned to each category (table 1). In the Lemunyon and Gilbert (1993) PI, these categories (low, medium, high, etc.) were further assigned a rating value using a base of 2 (low = 20 [1], high = 24 [16]) to represent increasing risk level from one category to the next. However, the use of categorical variables limits maximum values for P loss factors and often results in arbitrary breakpoints in calculated index values (Nelson and Shober 2012). Thus, when possible, many PIs have resorted to using continuous variables instead of categorical variables (Nelson and Shober 2012). Here, we chose to use continuous values for P source variables except P application methods (FPA and OPA) and P transport variables. The P (inorganic and organic) application methods together with their assigned rating values include no P applied (negligible category = 0), P placed with planter/injected deeper than 2 in (5 cm) (very low category = 1), P incorporated immediately before crop (low category = 2), P incorporated >3 months before crop or surface applied <3 months before crop (medium category = 4), P surface applied >3 months before crop (high category = 8) (table 1).
Categorical variables included in the empirical data set, including name, brief description, and rating values used in the study. Categories used were obtained from the Nutrient and Sediment Transport Risk Assessment Tool (IN-NRCS 2013). Percentage values assigned to each categorical variable in the theoretical data set using modified uniform probability distributions are shown in bold.
Soil samples from the WQFS were obtained in the 0 to 20 cm depth and sent to A&L Great Lakes Soil Testing Laboratory, Fort Wayne, Indiana (https://algreatlakes.com/), for routine chemical characterization. Further details on methods used during chemical characterization can be found in Welikhe et al. (2020). Table S2 presents a summary of data on P, aluminum (Al), organic matter (OM%), PSR, SPSC, and fDRP. Plots at the WQFS received either inorganic or organic P fertilizer applications (table S1). Plots receiving inorganic P fertilizer applications did so at university recommended rates based on STP (Vitosh et al. 1995), while manured treatments received yearly additions of swine effluent at rates meant to supply 255 ± 24 kg N ha−1 y−1. Rates of applied P were obtained from the WQFS field logs (table S1). Field records show inorganic P (triple super phosphate; 0-45-0) was surface applied <3 months before crop (~1 week or more before crop) and, when organic P was added, it was injected deeper than 5 cm; therefore, these variables were assigned a value of 4 and 1, respectively, in the data set (table S3). However, on years when starter fertilizer (equal mix of urea ammonium nitrate [NH4NO3; 28-0-0] and liquid ammonium phosphate [(NH4)3PO4] [10-34-0]; 19-17-0) was used as the only source of P, it was placed 5 cm below the soil surface at planting and was therefore assigned a value of 1 (table S3).
In the empirical data set (table S3), all transport variables were represented as categorical variables with each category assigned a rating value using a rating system of base 2 (table 1). Based on field observations at the WQFS, SE, SR, SDP, and DTW were assigned the following rating values: 1, 0, 4, and 1, respectively. These category values reflect a low soil loss risk (<44.8 t ha−1 y−1), a negligible risk of overland movement of soil solution from the site, a medium risk of SP losses through subsurface pathways, and that the plots at the site were >31 m away from surface water, respectively (table 1).
The first step during theoretical data set generation was the determination of the most accurate probability distribution function (fpd) for each variable based on recorded data in the empirical data set. The range of each numerical variable was defined using the range observed in the empirical data set to represent values that could potentially exist in well-managed fields (table 2). The process of fpd selection and fitting to these observed values was done using different functions in the R package fitdistrplus (Delignette-Muller and Dutang 2015; R Core Team 2017). The adequate fit of the selected fpds was examined with histograms and theoretical density plots (Q-Q plot, empirical and theoretical cumulative distribution functions, and P-P plots) (figure S1) (Delignette-Muller and Dutang 2015). Bolster et al. (2012) and Fiorellino et al. (2017) used the Annual Phosphorus Loss Estimator (APLE) to calculate total P loss from surface runoff using the generated theoretical data set as input. Our empirical data set included SP loss data from subsurface runoff (i.e., fDRP), therefore, we attempted to generate fDRP values using fpds. As there was no clear fpd fit for SP, it was generated using the 5th, 10th, 25th, 50th, 75th, 90th, 95th, 98th, and 99th percentiles (figure S2).
Range of values for continuous input variables used for generating the theoretical data set.
Both inorganic and organic P fertilizer applications were considered in this study. For simplicity, this study only considered corn even though most fields in Indiana are under corn–soybean (Glycine max [L.] Merr.) rotations. The assumption was that due to its higher yield potential and subsequent higher crop P removal rates, all P application rates determined using corn as a reference crop would encompass possible P application rates to soybean crops. In Indiana, inorganic fertilizer application rates are based on the tri-state fertilizer recommendations (Vitosh et al. 1995). These recommendations use observed STP and the potential yield of a selected crop (corn) to determine recommended inorganic P application rates for crops. The STP values generated using a log-normal distribution together with an average corn yield potential of 10,080 kg ha−1 were used to determine the corresponding inorganic P rates for the data set. Dayton et al. (2017) used a similar average corn yield goal in their sensitivity analysis of the Ohio PI. Organic P rates, which are also dependent on STP levels, were based on the organic nutrient guidelines in the Indiana NRCS Conservation Practice Standard code 590 (Indiana NRCS FOTG 2013). As per the guideline, organic P applications to soils with STP levels ≤50 mg kg−1 are based on the current crop’s (corn) N needs. Fields with STP levels between 51 to 100 mg kg−1 and 101 to 200 mg kg−1 are assigned organic P application rates that do not exceed 1.5 × crop phosphorus pentoxide (P2O5) removal rate and the crop P2O5 removal rate, respectively (Indiana NRCS FOTG 2013). To simplify the simulation, our study assumed that all manured fields received swine effluent similar to the WQFS, the reference experimental site. Subsequently, an N-based rate of 192 kg P2O5 ha−1 y−1 for swine effluent was determined for fields in the data set with STP levels ≤50 mg kg−1 following the steps outlined in Joern and Brichford (2003). Assuming that a corn–soybean rotation (with an average yield goal of 10,080 kg ha−1 for corn and 3,350 kg ha−1 for soybean similar to Dayton et al. [2017]) has an average crop removal rate of 56 kg P2O5 ha−1 y−1, fields with STP levels between 51 to 100 mg kg−1 and 101 to 200 mg kg−1 were assigned organic P rates of 84 kg P2O5 ha−1 y−1 and 56 kg P2O5 ha−1 y−1, respectively. Values for P application method and timing (FPA and OPA) were randomly assigned using modified uniform probability distributions. Based on the description of the five application methods in the empirical data set and professional knowledge, the study assumed that 99% of fields in Indiana receive inorganic P applications, which are surface applied <3 months before crop, and that there was an equal probability (0.25%) of inorganic P being applied to a field using one of the remaining four FPA methods (table 1). A similar distribution was used for organic P applications, but since organic P applications represents <20% of P applications in the region (King et al. 2018; Smith et al. 2018), 80% of the fields were assumed to receive no manure applications with the remaining fields having an equal probability (5%) of organic P being applied using any of the remaining four OPA methods (table 1).
Since data were not available for generation of transport variables, decision-making criteria based on information obtained from literature and professional knowledge were used to establish a distribution of possible values. Subsequently, values for SE, SR, SDP, and DTW were randomly assigned using a modified uniform probability distribution based on assumptions made. According to USDA NRCS (2015), recent estimates of annual soil loss by the Revised Universal Soil Loss Equation (RUSLE2) for croplands (both cultivated and uncultivated) in Indiana is approximately 6.14 ± 0.36 t ha−1 y−1. Therefore, the study assumed that most agricultural fields (99%) would be in the low soil loss category identified in the NASTRAT (< 44.8 t ha−1 y−1) (IN-NRCS 2013), with the remaining fields having equal probabilities (0.5%) of being in the medium (44.8 to 82.9 t ha−1 y−1) or high (>82.9 t ha−1 y−1) soil loss categories. Surface runoff classes (SR) are determined based on the interaction of two site characteristics: soil permeability and percentage slope of the predominant soil in the field (IN-NRCS 2013). Assuming that no agricultural fields are established on slopes >10% and that most soils belong to the moderately slow and slow soil permeability class similar to soils at the WQFS (Drummer silty clay loam and Raub silt loam), the study assigned equal probabilities (25%) to the negligible, very low, low, and medium surface runoff potential categories with zero probabilities of agricultural fields being established in areas with high and very high surface runoff potentials. Indiana NASTRAT guidelines specify that any fields with artificial subsurface drainage (at any depth) should automatically receive a medium drainage potential ranking, while fields with surface tile inlets should automatically receive a high drainage potential ranking. Since approximately 80% of cropland in Indiana has some type of subsurface drainage (Blann et al. 2009), 80% of the fields in the data set were classified as having a medium drainage potential with the rest of the fields having an equal probability (5%) of being in any of the remaining categories. Finally, for the DTW variable, it was assumed that 90% of the fields were established at ≥31 m from surface water, with 5% between 9 and 30 m from surface water, and the final 5% at ≤9 m from surface water.
Once the distributions were determined, the theoretical data set was generated by stochastic data generation ([n = 10,000] using R 3.4.0. [R Core Team 2017]) to represent possible physical and management conditions in well-managed agricultural fields.
Description and Development of a Soluble Phosphorus Multilayered Feed-Forward Artificial Neural Network. To predict SP losses, this study used a multilayered feed-forward (MLF) neural network structure. This is a relatively common network structure used in water resource applications such as the prediction of nutrient concentrations from runoff (Kim and Gilley 2008), prediction of watershed nutrient loading (Kim et al. 2012), prediction of NO3– contamination of ground water (Ehteshami et al. 2016), and risk assessment of P loss (Berzina et al. 2009). An MLF ANN is typically organized in successive layers, i.e., an input layer (independent variables), a hidden layer (connecting layer), and an output layer (dependent variables). Information flows unidirectionally through successive layers via adjustable connection weights (numeric weights) that recognize different patterns (Svozil et al. 1997). Most MLF ANNs contain one or more hidden layers, but in many water quality studies, using one hidden layer is reasonable (Wu et al. 2015; Khalil et al. 2011). Using the inputs and output generated in the theoretical data set, our MLF ANN is represented by the following equation 1:
1
where fDRP, STP, PSR, SPSC, FPR, FPA, OPR, OPA, SE, SR, SDP, and DTW are previously defined. In our study, neuralnet function in the R neuralnet package (Fritsch et al. 2019; R Core Team 2017) was used to create the MLF ANN trained by the backpropagation algorithm.
Given the inputs and output consisted of variables with different units and ranges, all data were scaled between 0 and 1 using the min-max normalization technique (Gopal et al. 2015). The datapoints (n = 10,000) were randomly divided into two subsets: training set (60%) and testing set (40%). The training set (n = 6,000) was used to adjust the connection weights, biases, and optimum parameters, and the testing set (n = 4,000) was used to confirm the actual predictive power of the network. To train the MLF ANN, the backpropagation algorithm (Hecht-Nielsen 1989) was used. This algorithm utilizes supervised training, compares its resulting outputs against target outputs, and propagates errors backwards through the systems to adjust the weights of the neurons in each layer and minimize the sum of square errors of the network (Hecht-Nielsen 1989). In this study, initial default MLF ANN parameters included the sum of squared error as the error function, 0.01 as the threshold for convergence (partial derivative of the error function to stop iteration), 100,000 as the stepmax (maximum number of steps of the training process), 1 as the number of the neurons in the hidden layer, resilient backpropagation (rprop+; the learning algorithm), and logistic function as the activation function. For additional details on network parameters, description, default settings, and available options, see Fritsch et al. (2019). To ensure optimum network configuration, the activation function and number of neurons in the hidden layer were changed. The activation function was changed from logistic to softplus, which has been shown to significantly improve model performance and convergence with fewer training steps compared to other standard functions (Zheng et al. 2015). Also, a trial and error approach was used to set the number of neurons in the hidden layer by varying them between (2n1/2 + m) and (2n + 1) (Fletcher and Goss 1993), where n and m are the number of input and output variables (neurons), respectively, until the desired network accuracy was achieved. Finally, we used 10 fold cross-validation as described by Olden et al. (2008) and Chowdhury and Shukla (2002) to test network robustness across different samples and ensure the network was not overfit to a particular set of data. In keeping with their procedure, the theoretical test data set (n = 4,000) was partitioned into 10 subsamples, each containing 400 (i.e., n/10) observations; 9 subsamples were used as the training data set, and the remaining 1 subsample was retained as the validation data set for testing the MLF ANN. This procedure was repeated 10 times (hence, 10 folds), with each of the 10 subsamples being used only once as the validation data set. Network robustness has important implications particularly when the MLF ANN will be used for prediction purposes. An optimum network is the one that is robust across different samples (Olden et al. 2008). Therefore, this cyclic process of training (feed-forward and error backpropagation), testing, and cross-validation, was repeated until the desired network accuracy was achieved.
During the optimization of the network, the goodness of fit between predicted and measured SP was evaluated using the coefficient of determination (R2) and the root mean square error (RMSE). The R2 value indicates how well the network fits the data and accounts for the variability in prediction by the variables specified in the network. R2 values >0.9, between 0.8 to 0.9, and between 0.6 to 0.8 indicate model fits that are very satisfactory, fairly good, and unsatisfactory, respectively (Lallahem and Mania 2003). The RMSE indicates how close the predicted values are to the fitting line. The smaller the RMSE value, the closer the predicted value is to the observed value.
Relative Importance Analysis of Input Variables of the Soluble Phosphorus Multilayered Feed-Forward Artificial Neural Network. Relative importance analysis is unique, as it not only quantifies relationships between input and output variables similar to common sensitivity analysis, but it also considers the potential interactions among the input variables (Garson 1991). Network weights determined from relative importance analysis are partially analogous to the coefficients in a linear model. Therefore, the combined effects of weights specific to an input variable represent its relative importance (weight) as a predictor variable (Garson 1991). Garson’s algorithm has been used in previous studies to determine the relative importance of input variables in an MLF ANN (Zheng et al. 2017; Grahovac et al. 2016; Giam and Olden 2015; Zhou et al. 2015). This study used Garson’s algorithm (garson function; package NeuralNetTools) (Marcus 2018) to partition the numerous ANN weights, and subsequently pool and scale (values ranging from 0 to 1) weights specific to each input variable to reflect their respective relative importance (Garson 1991). All relative importance values are presented here in their absolute form.
Phosphorus Index Performance. There is no existing PI in Indiana; therefore the first step in the analysis was to formulate a PI. Our study adopted a multiplicative PI formulation similar the Pennsylvania PI. This formulation is a good representation of processes governing P loss from agricultural fields given it better represents the concept of critical source areas (Gburek et al. 2000; Sharpley et al. 2003). The multiplicative PI has the general form below (equation 2):
2
where T represents PI transport factors (SE, SR, SDP, and DTW), TW represents weights for the various transport factors, S represents PI sources (STP, FPR, FPA, OPR, and OPA), SW represents weights for the source terms, and n and m represent the number of transport and source factors, respectively. Relationships between SP and both PSR and SPSC have thresholds, i.e., 0.21 and 0, respectively, above/below which P loss increases to soil solution (Welikhe et al. 2020). These indices were included in the multiplicative PI as follows: (1) for fields with PSR and SPSC values below and above the identified thresholds, respectively, the weighted PSR and SPSC values were subtracted from the total sum of source factors to indicate reduced risk of P loss from these fields, and (2) for fields with PSR and SPSC values above and below the identified thresholds, respectively, the weighted PSR and SPSC values were added to the total sum of source factors to indicate increased risk of P loss from these fields. During PIANN determination, the absolute values of SPSC were used.
To evaluate whether the ANN-generated weights improve PI accuracy, we compared the performance of a PI weighted using ANN generated weights (PIANN), a PI weighted using the Lemunyon and Gilbert (1993) PI weights (PILG), and an unweighted (no weights) PI (PINO), for predicting observed fDRP concentrations in the empirical data set. Equations for the latter PI calculations are detailed in table 3. All PIs were calculated using information (site characteristics and field management practices) from the empirical data set (table S3). Following Sharpley et al. (2001), fDRP concentrations were subsequently regressed (exponential regression) against calculated PI scores (PIANN, PINO, and PILG). All regressions were performed using R 3.4.0. (R Core Team 2017).
Equations for multiplicative phosphorus index formulations used in the study.
Results and Discussion
Performance of the Soluble Phosphorus Multilayered Feed-Forward Artificial Neural Network. Based on Fletcher and Goss (1993) criteria, during network optimization, the number of neurons in the hidden layer was varied between 6 and 23. Results showed that an MLF ANN with 7 neurons in the hidden layer gave the lowest RMSE during both testing and cross-validation (table 4). An increase in mean RMSE during cross-validation indicates model overfitting (fitting the noise in the training data) (Liu et al. 2007). In this study, as the number of neurons increased from 7, cross-validation mean RMSE increased (table 4), indicating a decrease in network performance. Thus, the final network structure consisted of 11 neurons in the input layer, 7 neurons in the hidden layer, and 1 neuron in the output layer (figure 1). Given the numerous connection weights among the neurons, attempts to trace the direction and relative magnitude of weights between neurons were not successful.
Testing and 10-foldcross validation performance of the Multilayered Feed-Forward Artificial Neural Network with increasing number of neurons in the hidden layer. The cross-validation root mean squared error is the mean root mean squared error of the 10 folds analyzed during cross-validation.
Neural interpretation diagram of the best multilayered feed-forward network structure with 11,7, and 1 neuron(s) in the input (I), hidden (H), and output (O) layers, respectively. B1 and B2 are bias terms added to hidden and output layers. Black and grey lines represent positive and negative connections respectively, while line thickness represents the relative magnitude of each connection weight.
Comparison of the predicted (predicted by the trained MLF ANN) with the generated (theoretical testing data set) SP values illustrated the accuracy of the selected MLF ANN (figure 2). The predicted and generated values in the test set were very close to the 1:1 regression line (y = x, predicted SP = generated SP). However, the slope for the best fit line relating predicted and generated SP values was 0.99 (p < 0.001), which shows that the MLF ANN slightly underestimated the concentrations of SP in tile effluent. Also, it was visually apparent that there were outlying cases that were not accurately predicted by the MLF ANN. Since the underestimation by the network was very small, we used the network as is. An R2 and RMSE value of 0.99 and 0.0024, respectively, for the test set, demonstrated a good linear fit and confirmed the ability of the trained MLF ANN to predict SP precisely. A small variation in RMSE across folds during cross-validation indicates a robust network (Liu et al. 2007); therefore, the reasonably small variation in RMSE across folds in this study is evidence that the selected MLF ANN was quite robust (table 5). Much of the variation across the folds in RMSE values is associated with each fold having random initial starting seeds and the random split of the test data set into training and validation sets during the cross-validation process (Zhang et al. 1999). Kim and Gilley (2008) showed that ANNs trained with data sets that adequately represent the critical processes involved in nutrient loss to drainage waters achieved higher predictive ability. In the present study, the high predictive ability by the MLF ANN resulted from the use of a theoretical data set whose variables were carefully generated to represent conditions that could potentially exist in well-managed agricultural fields. This highlights the importance of a good data set (generated or measured) that accurately represents existing conditions in an area of interest during ANN model building.
Selected multilayered feed-forward artificial neural network parity plot for annual soluble phosphorus concentrations in tile discharge. The black line is the 1:1 (y = x) line.
Root mean squared error values for each fold during cross-validation.
Relative Importance of the Soluble Phosphorus Multilayered Feed-Forward Artificial Neural Network Input Variables. Garson’s relative importance algorithm uses absolute values of the connection weights to calculate variable importance; therefore, the values presented here do not provide the direction of the relationship between input and output variables (Garson 1991). Input variable contributions ranged from 0.004 to 0.279, with SP losses from well-managed fields being strongly governed by source factors (table 6). Our analysis revealed that STP had the greatest weight (0.279) on SP loss. As an indicator of total sorbed P in soils (Sims et al. 2000), STP has strongly been linked to SP losses in both surface runoff (Pote et al. 1999) and subsurface drainage (Duncan et al. 2017) waters. Also, in previous, related work, Welikhe et al. (2020) showed that soils with P levels above the critical agronomic STP threshold of 22 mg P kg−1 (which corresponds to the suggested environmental STP threshold) were more prone to desorbing P and had a greater risk of losing SP to tile effluent. Therefore, the high weight assigned to STP mainly reflects that the amount of bio-available P strongly influences SP losses in tile drains in well-managed agricultural fields.
Results of the relative importance analysis for input variables of the soluble phosphorus multilayered feed-forward artificial neural network on annual flow-weighted mean dissolved reactive phosphorus concentrations in tile effluent. Values in bold and in parentheses represent Lemunyon and Gilbert (1993) weighting factors normalized to sum to 1.
Phosphorus application rates have come under scrutiny as one of the reasons for increased P loss in agricultural watersheds (Smith et al. 2015). The MLF ANN identified FPR as the second most influential site characteristic with a weight of 0.233 (table 6). The FPR are based on the Tri-state Fertilizer Recommendations (Vitosh et al. 1995). When these fertilizer recommendations were being developed, P fertilizer was relatively inexpensive in comparison to crop value, and underfertilization with its associated loss in yields was viewed as a higher economic risk than overfertilization (Nelson 1967). Therefore, historic recommendations included a safety margin to ensure yield potential would not be decreased across different soil types (Nelson 1967; Vitosh et al. 1995). The result is that on many highly productive soils where Tri-state Fertilizer Recommendations are followed, it is very likely that inorganic P is applied at rates greater than crop demand, which contributes to P enrichment of surface soils and subsequent P losses (Nizeyimana et al. 2001; Smith et al. 2015).
The SPSC had a weight very close to FPR, i.e., 0.231 and 0.233, respectively (table 6). As an index of a soil’s sink strength, the capacity dimension of SPSC takes into account previous P loading and enables the prediction of how much P a soil can sorb before becoming an environmental risk (Nair 2014; Nair et al. 2015). The close weighting between FPR and SPSC indicates that in well-managed agricultural fields both contemporary (specifically inorganic P additions) and legacy P sources should be prioritized as top site characteristics for P loss risk assessment when developing strategies aimed at abating SP loss from fields. Compared to SPSC, PSR had a small relative importance of 0.097 to SP loss in well-managed fields, which are not excessively P rich (maximum STP in the empirical data set was 104 mg P kg−1) (table 6). This is consistent with previous work that showed that P sorption capacity also has a bigger influence on SP loss potential in excessively P-rich soils (Reid et al. 2012; Bolinder et al. 2011).
Even though not many fields in the region (~20%) (King et al. 2018; Smith et al. 2018) receive organic P as we generated in the theoretical data set, the relative importance analysis identified OPR (weight = 0.084) as the fifth most influential site charateristic (table 6). Indiana soils with STP levels ≤50 mg kg−1 receive N-based manure applications (Indiana NRCS FOTG 2013). Previous research confirms that there is a higher P loss from fields receiving manure amendments applied at N-based application rates. These studies show that when manure is applied to meet the N requirements of a corn crop (N-based application), it results in the oversupply of P to soils because of the low N:P ratio in manures compared to most crops (King et al. 2018; Dodd and Sharpley 2016; Toth et al. 2006). The influence that both organic (OPR) and inorganic P (FPR) rates have on SP loss emphasizes the significance of controlling P accumulation (i.e., contemporary P sources) in soils in order to control the amount of P that remains in solution and is susceptible to leaching once soil P sorption sites are saturated (Breeuwsma and Silva 1992). With regards to P application methods, previous studies show that the potential for SP loss is exacerbated when either organic or inorganic P are surface applied, and it is generally reduced when P materials are incorporated by injection, banding, or tillage (Smith et al. 2016; Daverede et al. 2004).
Compared to other source factors, both FPA and OPA were assigned very low weights i.e., 0.007 and 0.004, respectively. The low weights were expected given the potential for SP movement with either surface or subsurface runoff assigned in the theoretical data set was not high. Also, the low weights assigned may reflect the impact of the time elapsed between the surface application of P fertilizers and runoff events. Findings by Smith et al. (2007) show that the shorter the duration between P fertilizer applications and runoff events, the greater the risk of P losses to surface water. Even though the most common P application method (especially for inorganic P) in Indiana is surface application, the timing of applications in these well-managed fields is such that runoff events do not occur soon after. This finding demonstrates the importance of considering the timing of P applications in addition to application methods in P loss risk assessments.
The transport site characteristics had low relative importance compared to the source site characteristics (table 6). In the theoretical data set, distributions of possible values for the transport site characteristics were generated using decision-making criteria based on literature and professional knowledge of well-managed agricultural fields. In consequence, the values assigned to the transport site characteristics in the theoretical data set reflected the many widely adopted conservation practices aimed at reducing P movement from a field through reductions in erosion, runoff, and P entrapment within fields (Dodd and Sharpley 2016). Among the four transport variables, SE had the highest relative importance of 0.043, with the remaining transport site characteristics (SR, SDP, and DTW) all having relatively low impacts on SP losses in tile drains (table 6). The high impact of SE may be the result of unintended consequences arising from the adoption of conservation practices. The conservation practices (e.g., no till, minimum till, cover cropping, etc.) in the region have successfully reduced soil erosion; unfortunately this reduction in soil disturbance has inadvertently increased P loss (mainly SP loss) in tile drained fields due to buildup of labile P in surface soils, especially on fields with long-term P applications (Jarvie et al. 2017; Dodd and Sharpley 2016; Duiker and Beegle 2006). Since 99% of the fields in the theoretical data set were generated to represent fields with minimal annual soil loss, the latter unintended effect of reduced soil erosion on SP losses in tile drains is a possible explanation of why soil erosion is the most important transport factor.
The SR and SDP had relative importance values that were low and close in magnitude to each other, i.e., 0.008 and 0.007, respectively (table 6). This result agrees with previously reported observations of SR and SDP contributions to SP losses. In many watersheds in the Midwest, there is an approximate 50-50 split between surface runoff and tile discharge contributions to stream flow (King et al. 2014). In these watersheds, previous studies show that, in some settings, surface runoff and tile drains exported approximately equal amounts of DRP to streams (Ruark et al. 2012; King et al. 2014; Macrae et al. 2007).
Finally, DTW was last in relative importance (0.006) among the transport factors (table 6). Reid et al. (2018) shows that the actual risk of SP loss to surface water depends on the distance and landscape characteristics between the point of SP release (edge of field) to the surface water, with higher risks being associated with shorter distances between these two points. In this study, the MLF ANN assigned the least relative importance to DTW since in the theoretical data set, 90% of the fields were established at ≥31 m from surface water (low risk category).
Overall, the ANN-generated WFs (WFANN) were different from the WFs assigned to site characteristics in the original PI (WFLG) (table 6). Unlike the WFLG, which were arbitrarily assigned (Lemunyon and Gilbert 1993), the WFANN were assigned based on relationships identified between site characteristics and measured SP loss by the MLF ANN. The difference between WFLG and WFANN further highlight the need identified by the authors of the original PI (Lemunyon and Gilbert 1993) to use field studies to more accurately assign WFs that reflect the contribution each site characteristic to P loss in an area of interest.
Comparison of Phosphorus Index Performance. There was not a significant exponential relationship between index values and measured fDRP concentrations for both the PINO (p = 0.11) and PILG (p = 0.06) index formulations (figure 3a and 3b). This indicated that both PINO and PILG poorly represented the risk of SP loss to tile discharge. According to the authors of the PILG, the function of input factor weights was to define each input’s relative contribution to P loss risk (Lemunyon and Gilbert 1993). Therefore, the poor performance of PINO (figure 3a) was expected given the absence of weights that, much like coefficients in a linear model, describe the relationship between a predictor variable and its response variable (Sharpley et al. 2012; Garson 1991). Lemunyon and Gilbert (1993) acknowledged that weights in their PI were arbitrarily selected, which explains the poor performance of PILG (figure 3b). Ideally, to better capture P loss risk in a particular region, P index weights should be obtained from measured P loss data (Sharpley et al. 2012). This explains the significant exponential relationship observed between PIANN and measured fDRP concentrations (figure 3c; R2 = 0.56, p < 0.001). Previous studies, for example, Eghball and Gilley (2001), DeLaune et al. (2004), and Sonmez et al. (2009), also reported improved PI performance when weights were based on measured P losses.
Observed annual soluble phosphorus concentrations and soluble phosphorus loss risk calculated with the (a) unweighted phosphorus index (PINO), (b) a phosphorus index weighted using Lemunyon and Gilbert (1993) weights (PILG), and (c) a phosphorus index weighted using artificial neural network generated weights (PIANN).
Distinct groups of data were observed when P loss data were plotted against both PINO and PILG (figure 3a and 3b). Fiorellino et al. (2017) showed that the distinct groups of data observed when P loss data are plotted against calculated PI values arise from the use of categorical variables, which prevent calculated PI values from relating well with measured P loss data. Given that plots used in this study had similar values for transport factors, distinct groups of data separated out based on P application rates (both organic and inorganic) and their corresponding P application method categories. Using the ANN-generated weights appeared to alleviate (to an extent) the distinct groups of data observed in the other two PIs (figure 3). This observation highlights the importance of considering possible synergistic and antagonistic interactions between site characteristics during weight determination. Despite the improved relationship between PIANN and fDRP concentrations, there was still a significant amount scatter around the best fit line (figure 3c). When Lemunyon and Gilbert (1993) came up with the PI concept, they did not intend for it to be used to quantitatively estimate actual P loss from a field, but rather it was originally intended to be used as a tool to rate a field’s relative risk to P loss based on the site characteristics. The significant amount of scattering still observed even with the use of WFsANN demonstrates the limitation of using a simple PI concept to model complex physical processes governing P loss. Yet, evaluating PIANN values against measured SP loss data was a reasonable approach for assessing its performance. Overall, the significant exponential relationship between PIANN and SP shows that the proposed PIANN for well-managed agricultural fields was directionally and magnitudinally correct, as is expected of PIs used to assess the risk of P loss (Sharpley et al. 2012).
Summary and Conclusions
The MLF ANN with 11-7-1 topology and trained by backpropagation presented satisfactory predictive ability and robust generalization capacity for modeling complex SP loss through tile drains. Through the analysis of relative importance, Garson’s algorithm showed feasibility for evaluating the weights of site characteristics used in a PI. In this study, the ANN-generated PI weights obtained were reasonable and consistent with known relationships between site characteristics and SP loss. The large weight assigned to STP during relative importance analysis underscores the value of routine soil P testing both for agronomic sufficiency and for environmental stewardship. Given the increased risk of SP loss once a soil’s STP level exceeds the agronomic critical STP level, this value can also be considered the environmental threshold (Welikhe et al. 2020). Also, the relative importance analysis highlighted the need to prioritize both contemporary and legacy P sources when determining best management practices to minimize P loss from well-managed agricultural fields. Further, unlike the PINO and PILG, the PIANN had a significant exponential relationship with fDRP and was able to provide reasonable estimates of P loss in tile effluent. These findings suggest that modifications to a PI, such as WFsANN, would likely improve the predictive ability of the risk assessment tool.
In the long-term, we propose the use of a big, multilocation, measured data set (that can be sufficiently divided into testing, training, and cross-validation subsets), with a wide range of measured field data representative of all fields in Indiana, not just well-managed fields, to allow for the identification of P risk categories in the final PI. Also, future research efforts should include all P forms (soluble, particulate, and total P) and consider P loss from all pathways not just through subsurface drainage. The authors acknowledge that although this approach is rigorous, it is a relatively simple option for developing accurately weighted PIs that better reflect complex physical processes governing P loss. They note that the reliability of this approach will depend on the accuracy of the MLF ANN on which the weights are based and will be valid only over the range of conditions considered in the analysis. Also, as is common practice, the stochastic generation of the theoretical data set from only one simulation (realization) raises the question on how representative the data set is of existing conditions in well-managed agricultural fields. Guo et al. (2018) showed that a minimum of 25 simulations are needed to accurately capture statistical characteristics of climate during stochastic weather data generation. To our knowledge, no such study has been carried out to identify the number of simulations needed to accurately capture statistical characteristics of soils and management data. Therefore, this study does not propose the adoption of these weights into an Indiana PI. Rather, we propose the use of the methods presented here during the development of a state PI, taking care that the MLF ANN used is trained, tested, and validated using a sufficiently large data set of measured inputs (source and transport factors) and output (total P) to improve PI accuracy.
Supplementary Material
Supplementary material for this article is available in the online journal at https://doi.org/10.2489/jswc.2021.00153.
- Received September 1, 2020.
- Revision received February 25, 2021.
- Accepted March 18, 2021.
- © 2021 by the Soil and Water Conservation Society