Introduction

Data on tree canopy and impervious cover provide important information on the extent and variation of these characteristics across a region. Measurements of tree canopy cover provide basic structural data used to model tree services, such as air pollution mitigation and carbon dioxide sequestration (Nowak and Crane 2002; Nowak and others 2006), while impervious surface data are important for assessing development impacts on urban temperatures, precipitation runoff, and water quality (Heisler and others 2007; Theobald and others 2009). Tree canopy and impervious cover data provide essential information related to natural resources and development planning and policies at the local to national scale.

The 2001 National Land Cover Database (NLCD) provides free, easily accessible, 30-m resolution percentage tree canopy and percentage impervious cover values for the conterminous United States created from a consistent peer-reviewed methodology (MRLC 2009). Several studies have used NLCD data for assessing the urban tree canopy cover (Bridges 2008; Nowak and Greenfield 2008), urban temperature modeling (Heisler and others 2007), estimates of canopy height (Walker and others 2007), distribution of constructed manmade surfaces (Elvidge and others 2007), non-point source nitrogen export into water systems (Shields and others 2008), and wildlife habitat distribution (Martinuzzi and others 2009). While a formal accuracy assessment of NLCD land cover estimates has been conducted (Wickham and others 2010), a formal accuracy assessment of NLCD tree canopy and impervious cover data has yet to be completed (Stehman and others 2008; US EPA 2010).

In 2007, the 2001 NLCD was made publicly available by the Multi-Resolution Land Characteristics Consortium (MRLC) (MRLC 2009; Homer and others 2007). The 2001 NLCD provides 30-m resolution classified land cover and percentage tree canopy and impervious cover estimates for the conterminous United States derived from circa 2001 Landsat 7 imagery. Twelve mapping teams employed by the MRLC used standardized data preparation, classification, and quality control to process the Landsat imagery within 65 distinct mapping zones (Huang and others 2001; Yang and others 2003; Homer and others 2004, 2007). Mapping zones were delimited to represent relative geographic homogeneity with consideration of economy (cost), physiography, land-cover distribution, spectral uniformity, and optimal edge-matching (Homer and Gallant 2001). High resolution tree canopy and impervious cover maps, derived from 1-m resolution digital orthoimagery quarter quadrangles, were used to develop unique algorithms for each mapping zone to estimate percentage tree canopy and impervious cover from raw Landsat 7 imagery (c. 2001). Each cover layer is accompanied by metadata documenting error estimates based on a cross-validation technique utilizing the algorithms and training data for each mapping zone (MRLC 2009; Homer and others 2007). According to these preliminary error estimates, the tree canopy cover values have an average error ranging from 6 to 17% and impervious cover has an average error ranging from 4 to 17% (MRLC 2009; Homer and others 2007).

With an early and limited release of the 2001 NLCD, Walton (2008) found potential underestimation of tree canopy cover in 36 cities and villages in NLCD mapping zone 63 (western New York State). A later study was developed to compare 2001 NLCD cover estimates with photo-interpreted estimates of Google Earth imagery from randomly sampled and geographically dispersed Census-designated places (e.g., cities, villages; hereafter referred to as places) and counties in the United States (Greenfield and others 2009). Results of this comparison revealed that 2001 NLCD underestimates tree canopy cover by an average of 9.7% and underestimates impervious cover by an average of 5.7% within places and 1.3% in counties. The underestimate appeared to be consistent across the country with no statistical differences among physiographic regions. However, there were statistical differences in the degree of underestimation of tree canopy cover among mapping zones and of impervious cover by population density class. The study reported here continues this work by expanding the analysis to the entire conterminous United States to further explore the differences between NLCD-derived and photo-interpreted percentage tree canopy and impervious cover among all 65 mapping zones.

Google Earth imagery is used as a reference data source for tree canopy and impervious cover estimates because of its national aerial imagery coverage. Google Earth imagery has been used to augment existing geographic data and when other data sources specific to a particular application are incomplete, inconsistent, or nonexistent. For example, Google Earth has been used to evaluate the spatial distribution of insurance risk and natural disaster mapping and crisis management (Slingsby and others 2008; Nourbakhsh and others 2006), as reference data to validate land cover maps (Cha and Park 2007), to enable the use of volunteered geographic information to post, reference and verify geographic data (Goodchild 2007; Wood and others 2007), for NLCD land cover accuracy assessments when other media were unavailable (Wickham and others 2010), and to make applications of geographic visualization and decision-making support available to the public (Sieber 2006; Butler 2006; Goodchild 2007; Sheppard and Cizek 2009).

Stehman and others (2008) designed of a formal accuracy assessment of the 2001 NLCD, which includes recommended evaluation protocols to meet six defined objectives. However, only the first objective, which assesses the per-class thematic accuracy of the classified land cover, has been completed (Wickham and others 2010). The protocol set out by Stehman and others (2008) establishes a pixel-by-pixel assessment of the NLCD percentage tree canopy and impervious data that meets several MRLC objectives. Results reported here differ in that the analysis was not designed to be a pixel-by-pixel accuracy assessment, rather it was designed to test differences between NLCD-derived and photo-interpreted estimates of overall percentage tree canopy and impervious cover for each of the 65 mapping zones. This assessment was conducted to provide a better understanding of the potential limitations of NLCD tree canopy and impervious cover estimates for each mapping zone.

Methods

The comparison between NLCD-derived and photo-interpreted tree canopy and impervious cover percentages was conducted within the boundaries of the 65 NLCD mapping zones (MRLC 2009). The NLCD 2001 percentages for tree canopy and impervious cover for each zone were derived from zone boundary maps registered with the NLCD 2001 layers in a U.S. Geologic Survey USA Contiguous Albers Equal Area Conic projected coordinate system. The NLCD percentage tree canopy and impervious cover for the entire mapping zone polygon was extracted using GIS software (zonal statistics). Overall percentage cover in each zone was calculated as the total NLCD cover in the zone divided by the total area in each zone.

These same mapping zone boundaries were used to randomly draw a sample of 1,000 points within each zone. These points then were converted and transformed into a Google Earth compatible format (Google Inc. 2007) for photo-interpretation. Each random point was interpreted as to its cover type to statistically estimate the percentage tree canopy and impervious cover within each mapping zone.

Despite its widespread and growing use, past editions of Google Earth and its content have been known to have issues regarding unknown dates of imagery (dates of imagery currently are provided) and erroneous content (Goodchild 2007; Potere 2008; Sheppard and Cizek 2009). Potere (2008) specifically found that the horizontal positional accuracy of Google Earth imagery for several developed countries, including the United States, had a root mean squares error of 22.6 m and had a mean error of 19 m. However, the positional accuracy will have a negligible effect on results in this study as the cover estimates are based on random samples within large geographic areas (mapping zones). Sample points that are off from a given coordinate will still produce a valid random sample of points within the mapping zone area for the cover analysis. Inaccurate horizontal positions would only affect the sample for points near the boundary of the map zones as some points may actually represent areas outside of the mapping zone. Given the large zone area relative to the mapping zone boundary, the potential number or effect of points interpreted outside the mapping zone is negligible.

There are other aerial sources of data to compare NLCD-derived values (e.g., digital orthoimagery quarter quadrangles), however, Google Earth imagery provides one of the best means to assess overall tree canopy and impervious cover as it offers nearly complete coverage of the conterminous United States with interpretable images.

Trained photo-interpreters with experience interpreting leaf-off and leaf-on imagery classified each point as trees (yes/no), impervious surface (yes/no), or as a non-interpretable image. As reflected in the 2001 NLCD, tree canopy and impervious cover designations are not mutually exclusive (e.g., tree cover over sidewalk or road), and the photo interpreters were instructed to determine if the tree canopy covered an impervious surface, in which case it was classified as both tree and impervious. Most points (99.6%) fell on images that were readily interpretable (high-resolution imagery). Points falling on imagery with medium to coarse resolution (e.g., 30-m resolution) or with atmospheric obstructions (clouds) were considered non-interpretable and not included in the final analysis. Overall, 63 of the 65 mapping zones had at least 99% interpretable points. The lowest percent of interpretable points was 93% in zone 9.

Four photo-interpreters were used, with each mapping zone being assessed by one photo-interpreter. Photo-interpretation results were verified by having 100 points within each zone reinterpreted by another photo-interpreter. Some disagreements with the audit values were due to changes in Google imagery between the original interpretation and the audit. Zones with less than 90% agreement were reinterpreted and rechecked until at least 95% agreement was attained. Overall, the audit control checks resulted in a 95% average agreement between the original interpretation and the audit values.

To help understand how differences within zones might differ by land-cover classes, interpreted points in each zone were stratified into 4 groups based on general NLCD land-cover classes (general LC class): (1) Trees/shrubs (NLCD classes: deciduous forest, evergreen forest, mixed forest, scrub/shrub, and woody wetland); (2) Agriculture/grassland (classes: grassland/herbaceous, pasture/hay, and cultivated crops); (3) Developed (classes: developed, open space, low intensity, medium intensity, and high intensity); and (4) Other (classes: barren land and emergent herbaceous wetland) (MRLC 2010). General NLCD classes with small areas within a zone would have a relatively small sample size. For general classes with a sample size of less than 20 interpretable points, additional random points were interpreted to ensure a minimum sample size of 20.

Within each general LC class in each zone, the percentage of tree canopy or impervious cover (p) was calculated as the number of sample points (x) hitting the cover attribute divided by the total number of interpretable sample points (n) within the general LC class (p = x/n). The standard error of the estimate (SE) was calculated as \( SE = \sqrt {{\frac{{p*\left( {1 - p} \right)}}{n}}} \) (Lindgren and McElrath 1969). This method has been used to assess canopy cover in many cities (e.g., Nowak and others 1996). Total cover and SE for each mapping zone was calculated by weighting the general LC class cover estimates by NLCD general LC class area in each zone. A 95 and 99% confidence interval of the photo-interpreted cover values was used to test for differences between the photo-interpreted and NLCD predicted cover values for each zone. That is, the NLCD estimate was determined to be significantly different from a photo-interpreted value if the NLCD value was outside the 95% confidence interval bounds of the interpreted value.

In some cases, the number of points falling on tree canopy or impervious cover within an NLCD general LC class would be zero, and thus the standard error and confidence interval estimate of the percentage cover would also be zero. In this case, any non-zero NLCD cover estimate would be considered significantly different from the interpreted values, no matter how small the difference. To avoid these minor differences being considered significantly different from cover values of zero, the standard error of the zero cover estimates was calculated using a sample size of one for the cover estimate (i.e., x = 1 instead of x = 0). These adjustments for the test of significance are noted in the appropriate tables (Tables 1, 2). Spearman correlations were also used to determine if differences between photo-interpretation and NLCD values were correlated with the amount of photo-interpreted cover.

Table 1 Difference between photo-interpreted and NLCD 2001 derived tree canopy cover values by generalized NLCD land cover classes within each mapping zone
Table 2 Difference between photo-interpreted and NLCD 2001 derived impervious cover values by generalized NLCD land cover classes within each mapping zone

Results

Comparisons of photo-interpreted and NLCD-derived values reveal that NLCD underestimates tree canopy cover by a national average of 9.7% (standard error [SE] = 1.0%) and underestimates impervious cover by 1.4% (SE = 0.4%). Results varied by mapping zone with a maximum underestimation of tree canopy cover of 28.4% (zone 3) and a maximum underestimation of impervious cover by 5.7% (zone 56) (Tables 3, 4; Figs. 1, 2). Overall, NLCD significantly underestimated tree canopy cover in 64 of the 65 zones (98%) and impervious cover in 44 zones (68%) compared to photo-interpreted cover values.

Table 3 Difference between photo-interpreted and NLCD 2001 derived tree canopy cover values by mapping zone
Table 4 Difference between photo-interpreted and NLCD 2001 derived impervious cover values by mapping zone
Fig. 1
figure 1

Differences in tree canopy cover estimates between photo-interpreted (PI) values and NLCD 2001 by mapping zone (PI minus NLCD value). Differences of 0% indicate no statistical difference

Fig. 2
figure 2

Differences in impervious cover estimates between photo-interpreted (PI) values and NLCD 2001 by mapping zone (PI minus NLCD value). Differences of 0% indicate no statistical difference

Based on photo-interpretation, tree canopy cover varied by mapping zone from a low of 1.6% in zone 33 to a high of 84.7% in zone 66 (Table 3). Impervious cover varied by mapping zone from a low of 0.1% in zone 21 to a high of 11.3% in zone 65 (Table 3).

Within developed land, NLCD significantly underestimated tree canopy cover in 31 mapping zones, overestimated tree cover in one zone, and had an overall underestimation of 13.7% (SE = 4.6%). NLCD estimates in developed land also significantly underestimated impervious cover in 14 mapping zones, overestimated impervious cover in three zones, and had an overall impervious cover underestimation of 5.2% (SE = 4.8%) (Tables 1, 2).

Within forest lands, NLCD significantly underestimated tree canopy cover in 57 mapping zones with an overall underestimation of 11.7% (SE = 1.4%). It also significantly underestimated impervious cover in 30 mapping zones with overall impervious cover underestimation of 0.9% (SE = 0.4%) (Tables 1, 2).

Within agricultural and grass lands, NLCD significantly underestimated tree canopy cover in 55 mapping zones with an overall underestimation of 6.7% (SE = 1.0%). It also significantly underestimated impervious cover in 38 mapping zones with overall impervious cover underestimation of 1.5% (SE = 0.5%) (Tables 1, 2).

Within other lands, NLCD significantly underestimated tree canopy cover in 19 mapping zones with an overall underestimation of 8.0% (SE = 3.6%). It did not significantly underestimate impervious cover in any mapping zones; overall impervious cover in other lands was underestimated by 1.3% (SE = 1.4%) (Tables 1, 2).

Differences between photo-interpreted and NLCD tree canopy cover were significantly correlated with the amount of photo-interpreted tree cover (Spearman correlation coefficient (rs) = 0.70). Differences between photo-interpreted and NLCD impervious cover were also significantly correlated with the amount of photo-interpreted impervious cover (rs = 0.89). Thus, differences between photo-interpretation and NLCD cover values tended to increase with increased amounts of tree canopy or impervious cover.

Discussion

Differences between cover estimates generated by computer-classified NLCD and photo-interpreted images are not surprising due to methodological differences (Dougherty and others 2004) and the reported accuracy of the NLCD (MRLC 2009). However, the overall and variable underestimation of tree canopy cover by NLCD relative to photo-interpretation is important to understand because of the increasing use of NLCD products in environmental management and planning applications (e.g., use of NLCD canopy cover in evaluating habitat distribution and conservation (Martinuzzi and others 2009) or hydrologic modeling and monitoring using estimates of impervious surfaces (Journal of Hydrologic Engineering 2009).

The average differences found in this analysis of all mapping zones were similar to differences projected from the sampling of geographically dispersed areas within varying population density classes in a preliminary analysis (Greenfield and others 2009). The preliminary analysis estimated an average NLCD underestimation in tree canopy cover of 9.7%, which is the same overall difference exhibited by the analysis of all 65 mapping zones. The preliminary analysis found an average underestimation in impervious cover of 5.7% within places (e.g., cities villages) and 1.3% in counties. The analysis of all 65 mapping zones found an average difference of 1.4%, which is similar to the difference from the county analysis. The preliminary county estimates are more representative of the entire mapping zones because of the lower density of populations and impervious surfaces in the counties compared to places. The preliminary study underestimation of impervious surfaces in places (5.7%) is comparable the underestimation exhibited by developed lands (5.4%), as places contain significant amounts of developed land.

In comparing NLCD tree canopy cover data with various other estimates of tree canopy cover in Syracuse, NY, NLCD produced the lowest tree cover estimate (12.7%), which was much lower than the other canopy estimates that ranged between 21.4 and 26.6% tree cover (Walton and others 2008).

Contrasting NLCD impervious and canopy estimates with cover estimates derived from higher resolution imagery within a sampled sub-watershed in urban and suburban Baltimore, MD, revealed that NLCD-derived tree canopy and impervious cover estimates were 10 and 7%, respectively, below the higher resolution estimates (Smith and others 2010). This difference was attributed to fine-scale variations in canopy (small patches of trees) and impervious cover (smaller buildings and noncontiguous pavement) that were not detected by the NLCD method. Smoothing of fine scale variation within a coarser resolution datasets has been noted in many other studies (Wickham and others 2010; Maxwell and others 2008).

The limitations of photo-interpretation methods can contribute to the differences found between NLCD and photo-interpreted estimates of tree canopy and impervious cover. One limitation is the date of photo-interpreted imagery. NLCD cover maps were based on circa 2001 imagery, while Google Earth imagery tended to be from the mid 2000s. When image interpretation began, dates of Google Earth imagery were not available. When dates became available, imagery dates ranged from the early to late 2000s, with the plurality of image dates tending to be around 2005–2006. Thus, the Google Earth images are from dates subsequent to NLCD images, and with varying temporal differences. Later imagery dates would tend to lead to increased impervious cover due to urban development, which would tend to enhance underestimation by NLCD estimates.

The same development factors that increase impervious cover would potentially decrease tree canopy cover over time as trees may be cleared to make space for impervious surfaces. However, increases in tree canopy cover could also be occurring through time through tree planting or natural regeneration. In the northeastern United States, forest land has increased by nearly 7% since 1953, mainly due to agricultural lands that have reverted to forests (Smith and others 2009). Thus in some areas, underestimation of NLCD tree canopy cover may be exacerbated by tree growth between the dates of imagery; in other areas the underestimation may be reduced. The overall influence of potential changes in tree canopy cover due to the differing imagery dates is unknown, but believed to be minimal. If differing dates of imagery have a significant effect, this would indicate significant landscape change within the 2000 s, which would signify that the NLCD cover maps are currently obsolete due to rapid landscape change. Wickham and others (2010) found that time lags between reference and map image acquisition dates have little effect on agreement and discuss that land-cover change is rare. Landscape change is not likely the dominant factor in the differences exhibited between NLCD and photo-interpretation estimates. It is more likely that the NLCD cover maps are underestimating tree canopy cover and to a lesser extent, impervious cover.

Another photo-interpretation limitation may be photo-interpreter error. As various quality control tests were conducted based on paired comparisons of image interpretation among photo interpreters, the photo-interpretation error is likely minimal. The issue of the horizontal positional accuracy of Google Earth imagery is also not an issue as the mapping zones are large. Google Earth was only used to estimate the overall proportion of cover in each mapping zone, not the accuracy of individual pixels.

For tree canopy cover, the greatest difference between photo-interpretation and NLCD values tended to be in the zones with the greatest tree cover (Figs. 1 and 3). The underestimation of tree cover was greatest in developed land (13.7%), followed by forest (11.7%), other (8.0%) and agriculture and grass lands (6.7%). One potential reason for the underestimation in tree canopy cover may be related to the tree cover structure and distribution. As tree canopy cover tends to become less contiguous (more individual tree crowns) due to development patterns, the NLCD may be underestimating these smaller clumps of canopy cover. This pattern may partly explain why developed lands have greater overall underestimation of tree canopy cover than forest lands. The potential for greater underestimation would tend to increase with greater tree cover and this underestimation may be exacerbated as the canopy cover becomes more fragmented or unevenly distributed.

Fig. 3
figure 3

Tree canopy cover by mapping zone based on aerial photo-interpretation

As the overall difference between photo-interpretation and NLCD impervious cover is relatively small (1.4%) and impervious cover has likely increased between the NLCD and Google Earth images, the NLCD impervious cover estimates at the zone scale are reasonable. However, the underestimation of impervious cover tends to be greatest in the eastern United States (Fig. 2), which is the most urbanized portion of the country (Nowak and others 2005) and likely has had some of the greatest urban development in the 2000s (Nowak and Walton 2005). These results are comparable to impervious surface area assessment of the mid-Atlantic region that showed that NLCD underestimated impervious cover by approximately 5%, with the underestimation occurring regardless of development intensity (Jones and Jarnagin 2009). The underestimation for the mid-Atlantic region (zone 60) in this study was 3.3%. Geographic differences in accuracy have also been shown for NLCD land-cover classes (e.g., shrubland, grassland, deciduous forests) (Wickham and others 2010).

Other factors that may affect the impervious and canopy underestimation are varying geographies (e.g., topography, vegetation types), different processing or training methods, and varied interpretations and applications of the NLCD protocols used among the 12 different teams within their assigned mapping zones. One example of an interpretation that may result in the varied underestimation of surface cover is the selection of training sites and data used in developing the sub-pixel classification and algorithm used to process the Landsat imagery. The image processing and classification derived from rural training sites may vary considerably from training data obtained from urban areas, or from more homogeneous land cover to more heterogeneous land cover sites (Greenfield and others 2009; Walton and others 2008). Underestimation of tree and impervious cover may also be partly due to masking during map development, including the varied selection of ancillary data used for masking (Homer and others 2007; Homer and others 2004; Huang and others 2001; MRLC 2009; Yang and others 2003). While masking was generally used to decrease overestimation resulting from the NLCD regression application, it may have overcompensated and produced this underestimation.

While the NLCD tree canopy cover and impervious surface data are a free and easily accessible data set created with consistent methodology that may be used effectively in comparisons across the United States, users of the NLCD tree canopy cover maps should be aware of the overall and variable underestimation of tree canopy and impervious cover. Utilizing NLCD cover estimates for secondary analysis (e.g., tree biomass, rainfall interception) can lead to regional to national underestimation of these cover-dependent secondary estimates. Future research should investigate the apparent widespread underestimation to help improve national cover mapping. A formal accuracy assessment would help in this regard (e.g., Stehman and others 2008; Stehman and Czaplewski 1998, 2003).

Conclusion

NLCD 2001 cover estimates appear to be underestimating tree canopy and impervious cover across the conterminous United States to varying degrees. The absolute underestimation of tree canopy cover (9.7%) is much higher than that exhibited for impervious cover (1.4%). These results indicate that underestimation of tree canopy and impervious cover was related to the amount of tree and impervious cover, with developed lands exhibiting the greatest underestimation of both tree canopy and impervious cover. A better understanding of the differences between NLCD and photo-interpreted cover values can be used to produce more accurate cover maps across the United States.