The online database Sustaining the Earth's Watersheds–Agricultural Research Data System (STEWARDS), was developed in a six year project by a team from several USDA Agricultural Research Service (ARS) locations to deliver weather, hydrology, and water quality data to the public. Launched as part of the Conservation Effects Assessment Project (CEAP; Mausbach and Dedrick 2004), the timeline started with 2002 to 2003's Conception and Agency commitment, extended through July 7, 2007's, beta release with three watersheds, to the 2007 to 2008 public release and population with other data and watersheds (Steiner et al. 2008). In fact, the database continues to grow, adding between 1 and 2 million data records a year. At the decadal anniversary of public availability, the authors set out to examine the impact STEWARDS has had on the scientific community, conservation programs, and the general public.
Data in STEWARDS were collected by a number of USDA ARS locations, some of which have documented the various data collection efforts in the literature. The six regional watersheds established after authorization by Senate Bill 59 (USDA 1959) all have database documentation paper series published. Marks (2001) introduced 9 other papers for Reynolds Creek Experimental Watershed in Idaho. Bosch et al. (2007) is the first of 5 papers for Little River Experimental Watershed in Georgia. Moran et al. (2008) introduced 19 papers for Walnut Gulch Experimental Watershed in Arizona. Bryant et al. (2011) is the first of 4 for Mahantango Creek Watershed in Pennsylvania. Steiner et al. (2014) introduced a series of 11 papers for Little Washita River Experimental Watershed. Sadler et al. (2015) introduced a series of 9 for Goodwater Creek Experimental Watershed. Two additional ARS watersheds, established in the 1990s, have shorter periods of records but are substantially documented. Hatfield et al. (1999) is the first of 7 research articles based on the North Walnut Creek watershed in Iowa. Locke (2004) introduces 17 articles with primary focus on the Beasley Lake watershed in Mississippi. Harmel et al. (2007; 2014) documented data from the Reisel watersheds in Texas, which were established in 1938 and had water quality data measurements starting in 2000. There are 5 additional watersheds for which data have been or will be contributed to STEWARDS but for which literature documentation, specifically for the data, has not yet been published. For those, the best citations of research using the data are given here. From east to west, these include the Choptank Reservoir watershed in Maryland (Whitall et al. 2010; Hively et al. 2011; McCarty et al. 2014), the Walnut Creek watershed in Ohio (King et al. 2008, 2009, 2012; Smiley et al. 2011), the St. Joseph watershed in Indiana (Smith et al. 2008, 2015; Williams et al. 2018), the Little River Ditches and Lower St. Francis watersheds in Arkansas (Aryal and Reba 2017; Aryal et al. 2018), and the Upper Snake Rock watershed in Idaho (Bjorneberg et al. 2008, 2015).
The immediate question encountered was how to determine and measure impact of such a database. One possibility was to examine how STEWARDS complied with expectations of the CEAP project, of overall public policy, and of the scientific community. The first two such measures are discrete yes/no answers, and the last remains subjective. Attempts to derive quantitative or objective measures suggested typical indicators found in scientific literature—citations or downloads of the papers documenting STEWARDS, or the same for papers documenting the data contained in STEWARDS. Yet other indications, but not always measures, include whether the visibility of STEWARDS generated additional exposure to ARS data, whether STEWARDS received scientific recognition through awards, whether STEWARDS informed other databases, or whether STEWARDS informed public policy. The objectives of this paper are to respond to the subject title using all of these means of documenting impact.
MEETING EXPECTATIONS
Expectations of CEAP. At the start of CEAP, Mausbach and Dedrick (2004) included as Objective 3 of 5 for the ARS benchmark watersheds:
Develop water quality, water conservation, and soil quality databases that can be used to evaluate effects of conservation practices, and to compile air quality and wildlife habitat data for future assessment. These databases will be used periodically to validate and evaluate the model used in the watershed and national assessments and to validate and verify the regionalized models.”
By the time of publication in 2008, Richardson et al. (2008) listed the database as Objective 1 of the ARS CEAP Watershed Assessment Study: “One of the prime objectives of the USDA ARS benchmark watershed studies is to develop and implement a Web-based system that would make the data readily available.” Further, they emphasized the accomplishment: “The most lasting legacy of a watershed research program is the basic data that is obtained and available for current and future interpretation. The development and release of STEWARDS as a repository of the watershed data is a significant accomplishment.”
In the initial paper accompanying the public release, Steiner et al. (2008) listed the following impacts the development team considered STEWARDS to have had by that time, and the impacts expected to accrue from the presence of the database:
Impacts
Improved scientific credibility by documenting quality assurance/quality control procedures
Increased collaborative opportunities for individual scientists, watershed teams, and the ARS water resources program
Increased learning opportunities for participants at watersheds
Increased demands on scientists for provision of open data
Better accountability at the agency level for investment in long-term watershed research
Anticipated impacts
Increased scientific productivity at watersheds
Increased credit to scientists for contribution to open data systems
The initial five had already been observed by the development team; the two anticipated impacts have been very difficult to document. The latter one in particular has contributed to calls for credit to scientists, but it is difficult to find evidence such credit has been awarded.
Expectations of Public Policy. During STEWARDS development, the US General Accounting Office (2004) issued report 04-382, titled “Watershed Management: Better Coordination of Data Collection Efforts Needed to Support Key Decisions.” This outlined an inventory of various federal databases on water quality and quantity and recommended better coordination, better clearinghouses, and better metadata. STEWARDS was developed with these recommendations in mind, with particular attention to the metadata, which were under ARS control. Geospatial metadata were compliant with the Federal Geospatial Data Consortium standards in force at the time (FGDC 1998). Metadata for methods were modeled after the multiagency National Environmental Methods Index (https://www.nemi.gov/) standards, adapted to include several elements needed for ARS use. The STEWARDS team had to develop metadata standards for the site descriptions, as there were none in common use. STEWARDS was registered in the federal metadata clearinghouse to improve discoverability of the data content.
Since public release of STEWARDS, federal policy regarding public availability of data acquired by federal agencies has been clarified. The February 22, 2013, White House Office of Science and Technology Policy policy memorandum requires publications and data to be accessible if federally funded, either wholly or in part (Stebbins 2013). While the impetus for the federal response, a We the People petition with 65,000 signatures, requested only that publications be made publically available, the memorandum responded that both publications and supporting data be made available. The following passage is relevant to STEWARDS: “…digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze.” The memorandum explicitly emphasized the development of and adherence to data management plans, including clearly stating expectations to cover costs of data management plan compliance and training required for this effort. The USDA implementation plan for the memorandum was effective in January of 2016. Since this initiative, STEWARDS geospatial metadata have been added to the Ag Data Commons and GeoPortal. However, the existence of STEWARDS at the time of the policy memorandum indicates that the CEAP watershed project was substantially compliant in advance of the new policy.
Expectations of the Scientific Community. The expectations of the whole community of science regarding data availability are less documented, however the number of signatures on the petition that prompted the policy memorandum fairly clearly states an interest in open access (OA) for publications, and as the policy states, the associated data are logically expected to be available as well. There have been other calls by scientific entities for access specifically to data. Steiner et al. (2009a) summarized several that predated the STEWARDS release.
Since development of the World Wide Web, there has been movement toward more open access to information. The Science Commons project (http://sciencecommons.org, accessed 25 June 2009) describes the evolution of a call for open access to information through declarations such as the Bethesda Statement on Open Access Publishing (https://legacy.earlham.edu/~peters/fos/bethesda.htm, accessed 25 June 2009), the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (http://oa.mpg.de/openaccess-berlin/berlindeclaration.html, accessed 25 June 2009), and the Budapest Open Access Initiative (http://www.soros.org/openaccess/, accessed 25 June 2009), with the latter two advocating open access to data and databases. Klump et al. (2006) discussed implications of open access information, including the need for incentives to authors and protection of the intellectual rights of the author while allowing use of data by the scientific community.
A number of journals now require source data for a journal article to be provided upon submission as a requirement for review and publication, and some journals provide means to make those data available as supplemental material with the article. Others make the storage and retrieval mechanisms available to the authors as an option. Others require that the data be permanently available to the public through persistent links included in the article.
Expectations of the scientific community regarding data in general extend beyond simple availability to characteristics of data and the amount and types of metadata that accompany data. In the overview and introduction to a special section describing one site's database, Sadler et al. (2015) reviewed literature on these topics. Their summary of the state of data art follows:
Gray (2009) described aspects of data volume, type, documentation, access, and communication that enable data exploration needed for current science. Data must be self-described, meaning that metadata (methods, units, collection context, responsible agents, provenance, etc.) must be permanently connected to the data, including when delivered to end users.
QUANTITATIVE MEASURES
Documenting STEWARDS. We have four measures that can be examined that are specific to STEWARDS: citations of the four original papers describing the development, structure, and user interface of STEWARDS upon release; downloads of data contained in STEWARDS; downloads of papers documenting data in STEWARDS; and citations of those papers. The first of those measures is not strong as had been hoped (table 1).
Downloads of STEWARDS data are more impressive (table 2). Users register upon signing into STEWARDS and downloads are logged. Analysis of these data provide cumulative and annual downloads of data records.
The current content of STEWARDS is approximately 16 million data records. The structure of STEWARDS is a collection of flat data tables, each with multiple fields of data, so a record represents 1 to 22 measurements, with a mean of 10 measurements per record.
The trajectory of data downloads by watershed location offers some indication of what issues of public concern might drive data download numbers. As seen in figure 1, massive amounts of data have been downloaded from the St. Joseph watershed, which is located in northeast Indiana and is the only one that drains into Lake Erie. While the reasons for the high volume cannot be known, it is reasonable to infer that the water quality issues raised about western Lake Erie could be a cause. Note that roughly half of all STEWARDS record downloads are from the St. Joseph watershed as a result of the last two to three years of download volume.
Cumulative downloads of data contained in STEWARDS by location. Individual symbols represent individual downloads. Note that data holdings represent additional watersheds near or nested around those described in the text.
Information about downloads of papers documenting data in STEWARDS is limited to those articles published in journals providing such data. Unfortunately for the purposes of this paper, the only papers relevant to STEWARDS for which these data are available are the two collections that have been published most recently, namely the Little Washita and Fort Cobb in Oklahoma in 2014, and the Goodwater Creek and Salt River in Missouri in 2015. The Oklahoma collection had 11 papers, of which the initial introduction was made OA. It had 1,388 downloads, somewhat higher that the journal's mean for 2014 of 1,316 downloads for OA articles. The Missouri collection of 9 papers was made entirely OA, and its introduction has had 1,234 downloads, also somewhat higher than the journal's mean of 1,050 for OA articles published in 2015. The remainder of the papers in both collections had about 75% to 80% of the corresponding access average for the journal year. Oklahoma had a mean download count of 398 (range 328 to 518), compared with the mean for 2014 articles without OA of 499. Missouri's collection, excluding the introduction, ranged from 533 to 888 downloads, for a mean of 760 against a journal-year mean of 1,050 for OA articles. The trajectory of downloads shows an initial jump that tapers off with time (shown in figure 2), which is a high initial year download count with rates descending to means of ~50 to 100 annually (data not shown). The corresponding plot for Oklahoma data is much the same, but with more difference between download counts for the OA introduction and the non-OA other papers (data not shown).
Citations of the initial papers documenting STEWARDS, from data acquired April of 2020.
Downloads of data contained in STEWARDS by year of download in the second five years of existence. This excludes the 2016 download of essentially the entire database by a professor who apparently intends it for educational purposes. That event included approximately 13 million additional records downloaded, making the total some 25 million inclusive.
Citation data for these sections can be found from both SCOPUS and Google Scholar for comparison with download data (table 3). However, given the recency of the papers, it is perhaps more informative to examine all collections of articles describing data in STEWARDS, which date variously from 1999 through 2011 and thus have much more data to examine. Further, interpretation of the data can be informed by inclusion of collections of other data papers, not in STEWARDS, but that have publicfacing websites through which to search and download data. These included the Boise, Idaho, collection published in Water Resources Research in 2001 and the Tucson, Arizona, collection, also published in Water Resources Research in 2008. These collections differ in approach. The 1999 Ames, Iowa, collection predated the Journal of Environmental Quality's 2013 decision to allow data collections, so that series was entirely research papers with intensive description of the data later contributed to STEWARDS. The series from Boise, Idaho; Tifton, Georgia; and University Park, Pennsylvania; emphasized description of data, as allowed with the Water Resources Research journal policy. The Oxford, Mississippi, series was published in an American Chemical Society book based on a symposium and included both research and data papers. The Tucson, Arizona, series was designed to have about equal numbers of data papers and research papers using the data. Early citation analysis from that series was promising, and prompted the El Reno, Oklahoma, and Columbia, Missouri, units to choose a similar model or a mix of data and research papers. To the authors' knowledge, only the latter two (2014 and 2015) series made use of the OA options that were described above.
Cumulative downloads of papers describing data from the Columbia, Missouri, location and contained in STEWARDS.
Citation averages from SCOPUS and Google Scholar for collections of papers describing data contained in STEWARDS, with two non-STEWARDS database data added for comparison. Note that the authors were unable to identify the Oxford papers in SCOPUS but were able to in Google Scholar. JEQ = Journal of Environmental Quality. WRR = Water Resources Research. ACS = American Chemical Society (book).
It is immediately apparent that the papers for the Ames, Iowa, series are much more highly cited on average than the other papers in that journal that year. Inspection of citations of individual papers indicates three exceeded 100 citations, and the others were only moderately above the journal mean for 1999 articles. Similarly, certain papers in other series far exceeded the mean for the collection, but other than introductory or overview papers, the specific papers being cited the most vary. In some cases, the precipitation data are most widely cited, suggesting numbers of researchers studying rainfall may be greater than those studying other data. In some cases, the stature of the research unit or individual researchers involved appeared important. Data that were somewhat rare, such as Boise's snowfall data, received somewhat more citations. Although it is certainly very early in the trajectory, it would appear that OA helps.
OTHER INDICATIONS OF IMPACT
Additional Exposure of Agricultural Research Service Data. One potential indicator of the impact of STEWARDS would be whether its visibility prompted any additional public exposure of the data. One instance has been identified at this time. Subsequent to release of STEWARDS, personnel from the US Geological Survey–US Environmental Protection Agency cooperation on water quality data approached ARS to include STEWARDS' water quality data in their Water Quality Portal (WQP) (https://www.waterqualitydata.us/). After technical details were established and the additional data and format requirements were met, ARS staff at the locations and those administering STEWARDS began contributing the water quality data to WQP. Download counts from this portal are provided in table 4. Note that WQP is structured such that every measurement is on a separate record, so the counts from this portal are not strictly comparable to those from STEWARDS directly. Recall that STEWARDS data are perhaps 5 to 10 measurements per record. The STEWARDS data averaged 10 fields per record, so the table 4 total of some 500 million would represent some 50 million of STEWARDS downloads. This would be, conservatively, twice the download count as from STEWARDS for just the water quality data. Further, this represents less than four years, where the STEWARDS data represent at least two more. Clearly, adding the WQP exposure increased the impact of ARS water quality data. However, without the development and release of STEWARDS, the contributions and thus impact of ARS data in WQP would have been unlikely for some time.
Scientific Recognition through Awards. Two such indicators of recognition exist. The scientific staff who led the STEWARDS development received the Soil and Water Conservation Society's 2011 Conservation Research Award for the STEWARDS database. It was also cited as one of many elements in the successful application for an American Association for the Advancement of Science (AAAS) award as one of “Exemplary Collaborative Case Studies” that was recognized on March 15, 2011, at the AAAS R&D roundtable. There may have been other awards received by individuals, but documentation is not available. Details would necessarily remain confidential, but it is likely that all those on the development team were rewarded through annual performance appraisals for their effort. The scientific staff have listed the STEWARDS accomplishment in their promotion packages, and STEWARDS has been listed in multiple successful nominations for society fellow and other research awards.
Informing Other Databases. Staff involved in STEWARDS development have repeatedly been asked to inform ARS leadership about their experiences and observations in STEWARDS during multiple data initiatives, including the ARS big data initiative that launched with a workshop in February of 2013 titled “Big Data and Computing: Building a Vision for ARS Information Management.” That workshop was a response to the big data initiative that had recently been announced by the White House Office of Science and Technology Policy. STEWARDS was also presented as a success story in “Expanding and Leveraging ARS Natural Resources Networks and Working Groups,” which was a white paper summarizing the findings from a June of 2016 workshop on the topic. Elements of the STEWARDS structure were ported to the Greenhouse gas Reduction through Agricultural Carbon Enhancement network (GRACEnet) and Resilient Economic Agricultural Practices (REAP) databases. Site description and methods metadata structures from STEWARDS have been proposed as a model for the many ARS databases that must necessarily include such information but are without accepted metadata standards for such information. Such metadata were the key enabling factor that integrated comparable data obtained with slight differences in methods necessitated by local conditions and resources.
Download counts of records (rows) of STEWARDS (Agricultural Research Service) water quality data from the Water Quality Portal (WQP).
Informing Public Policy. Of all the measures and qualitative discussions considered here, impact on informing public policy may be the most difficult to document. For the most part, any knowledge is anecdotal; no audit trail from data to policy exists, and the supporting technical documents created are not always readily available. Those instances of which we know are usually from personal knowledge provided occasionally by the providers of data who were told it was used, by members of science advisory panels present in discussions, or by legislative stakeholders involved in the development of policy. However, it is not usually clear whether specific data were used, nor how important it was in the policy development process. For instance, the Idaho ARS staff knew that the Upper Snake Rock watershed data in STEWARDS were used in the reevaluation of Mid Snake/Upper Snake-Rock Subbasin total maximum daily load (TMDL), which appears to have quite limited availability (Tetra Tech 2014). We can infer that interest in the Indiana watershed data may have come from the recent water quality concerns in the western Lake Erie basin. Similarly, one would expect Gulf hypoxia to prompt interest in data from along the Mississippi River, or Chesapeake Bay water quality issues to prompt interest in the two contributing watersheds.
CONCLUSIONS AND RECOMMENDATIONS
From the above, it is clear that quantifying impact of STEWARDS poses multiple challenges. Ironically perhaps, data about the access of, use of, and impact of data are usually lacking (although this is improving recently). Recent trends in website analytical data will help such efforts in the next decade or so, but often inferences that can be made currently are quite qualitative and subject to many cautionary statements that may prevent solid conclusions. That said, it would appear the below could be asserted. First, scientific impact of contributing data is analogous (but not necessarily equal) to the impact of a scientific publication—it is the first step, but impact depends the further use of the data. Discoverability and accessibility of the data, and quality and availability of the associated metadata, are critical and remain something of a hindrance in raising impact. It appears that discoverability of data is somewhat closely tied to the stature of scientists collecting data and conducting research with the data. It also would seem that the prominence of resource issues of concern related to the data are quite important for the data's impact. It would also appear that every effort to raise visibility and reduce obstacles to access of data would help.
- © 2020 by the Soil and Water Conservation Society