An information-fusion method to identify pattern of spatial heterogeneity for improving the accuracy of estimation

Li, Lianfa; Wang, Jinfeng; Cao, Zhidong; Zhong, Ershun

doi:10.1007/s00477-007-0179-1

An information-fusion method to identify pattern of spatial heterogeneity for improving the accuracy of estimation

Original Paper
Published: 25 September 2007

Volume 22, pages 689–704, (2008)
Cite this article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Lianfa Li¹,
Jinfeng Wang¹,
Zhidong Cao¹ &
…
Ershun Zhong¹

435 Accesses
40 Citations
Explore all metrics

Abstract

While spatial autocorrelation is used in spatial sampling survey to improve the precision of the feature’s estimate of a certain population at area units, spatial heterogeneity as the stratification frame in survey also often have a considerable effect upon the precision. Under the context of increasingly enriched spatiotemporal data, this paper suggests an information-fusion method to identify pattern of spatial heterogeneity, which can be used as an informative stratification for improving the estimation accuracy. Data mining is major analysis components in our method: multivariate statistics, association analysis, decision tree and rough set are used in data filter, identification of contributing factors, and examination of relationship; classification and clustering are used to identify pattern of spatial heterogeneity using the auxiliary variables relevant to the goal and thus to stratify the samples. These methods are illustrated and examined in the case study of the cultivable land survey in Shandong Province in China. Different from many stratification schemes which just uses the goal variable to stratify which is too simplified, information from multiple sources can be fused to identify pattern of spatial heterogeneity, thus stratifying samples at geographical units as an informative polygon map, and thereby to increase the precision of estimates in sampling survey, as demonstrated in our case research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new method based on association rules mining and geo-filter for mining spatial association knowledge

Article 25 June 2017

Yaolin Liu, Peng Xie, … Ronghui Tan

Knowledge-Based Multicriteria Spatial Decision Support System (MC-SDSS) for Trends Assessment of Settlements Suitability

A raster-based spatial clustering method with robustness to spatial outliers

Article Open access 19 February 2024

Haoyu Wang, Changqing Song, … Peichao Gao

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD conference on management of data, WA, USA
Alexander H, Daniel AK (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th international conference on knowledge discovery and data mining, AAAI Press
Aleksander Ø (1999) Discernibility and rough sets in medicine: tools and applications (Dissertation). Norwegian University of Science and Technology, Norway
Anselin L (1992) Spacestat: a program for statistical analysis of spatial data, NCGIA, Santa Barbara
Google Scholar
Bonham-Carter FG (1994) Geographic information systems for geoscientists: modelling with GIS. Pergamon, Ottawa
Google Scholar
Bergen KM, Brown DG, Rutherford JF, Gustafson EJ (2005) Change detection with heterogeneous data using ecoregional stratification, statistical summaries and a land allocation algorithm. Remote Sens Environ 97:434–446
Article Google Scholar
Cochran WG (1977) Sampling techniques. Wiley, New York
Google Scholar
Cressie N (1991) Statistics for spatial data. Wiley, New York
Google Scholar
Ertoz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of third SIAM international conference on data mining, San Francisco, CA, USA
ESRI (2004) ArcGIS desktop help. ESRI, Redlands
Google Scholar
Gallego FJ (2005) Stratified sampling of satellite images with a systematic grid of points. ISPRS J Photogramm Remote Sens 59:369–376
Google Scholar
Gediga G, Duntsch I (2000) Statistical techniques for rough set data analysis. In: Polkowski L, Tsumoto S, Lin T (eds) Rough set methods and applications. Physica Verlag, Heidelberg, pp 545–565
Google Scholar
Giarratano J, Riley G (1998) Expert systems: principles and programming. PWS Publishing Company, Boston
Goovaerts P, Jacquez GM, Marcus WA (2005) Geostatistical and local cluster analysis of high resolution hyperspectral imagery for detection of anomalies. Remote Sens Environ 95:351–367
Article Google Scholar
Haining R (2003) Spatial data analysis: theory and practice. Cambridge University Press, Cambridge
Google Scholar
Komorowski J, Pawlak Z, Polkowski L, Skowron A (1999) Rough sets: a tutorial. In: Rough fuzzy hybridization. Springer, Heidelberg
Google Scholar
Kong W, Ou M (2006) Research on the change of the cultivated land’s areas and its driving factors of Shandong Province, (in Chinese). Agric Econ 28:74–76
Google Scholar
Lawrence R, Wright A (2001) Rule-based classification systems using classification and regression tree (CART) analysis. Photogramm Eng Remote Sens 67:1137–1142
Google Scholar
Lawrence R, Bunn A, Powell S, Zambon M (2004) Classification of remotely sensed imagery using stochastic gradient boosting as a refinement of classification tree analysis. Remote Sens Environ 90:331–336
Article Google Scholar
Li D, Wang S, Li D, Wang X (2002) Theories and technologies of spatial data mining and knowledge discovery (in Chinese). Geomat Inf Sci Wuhan Univ 27(3):221–233
Google Scholar
Li L, Wang J, Liu J (2005) Optimal decision-making model of spatial sampling for survey of China’s land with remotely sensed data. Sci China Ser D 48(6):752–764
Article Google Scholar
Liu J, Zheng X (2004) Correlate analysis between the dynamic changes of cultivated lands and grain total yield (in Chinese). Areal Res Dev 12(6):102–105
Google Scholar
Liu M, Zhuang D, Hu W (2001) On current cultivated land change based on geomorphology and spatial differentiation characteristics (in Chinese). Resour Sci 23(5):11–16
CAS Google Scholar
McRoberts RE, Holden GR, Nelson MD, Liknes GC, Gormanson DD (2006) Using satellite imagery as ancillary data for increasing the precision of estimates for the Forest Inventory and Analysis program of USDA Forest Service. Can J For Res 36:2968–2980
Google Scholar
Michalski R, Bratko I, Kubat M (1998) Machine learning and data mining: methods and applications. Wiley, London
Google Scholar
Miller HJ, Han J (2001) Geographic data mining and knowledge discovery. Taylor & Francis, New York
Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill, New York
Google Scholar
Mjolsness E, DeCoste D (2001) Machine learning for science: state of the art and future prospects. Science 293(5537):2051–2055
Article CAS Google Scholar
Pal SK, Ghosh A, Shankar BU (2000) Segmentation of remotely sensed images with fuzzy thresholding and quantitative evaluation. Int J Remote Sens 21(11):2269–2300
Article Google Scholar
Rijsbergen CV (1979) Information retrieval, 2nd edn. Butterworths, London
Google Scholar
Ripley BD (1981) Spatial Statistics. John Wiley & Sons, New York
Rodriguez-Iturbe I, Mejia JM (1974) The design of rainfall networks in time and space. Water Resour Res 10:713–728
Article Google Scholar
Steinbach M, Klooster S, Potter C (2003) Discovery of climate indices using clustering, KDD 2003 Washington, DC, http://www-users.cs.umn.edu/∼kumar/papers/kdd03_nasa.pdf
Tan PN, Steinbach M, Vipin K (2006) Introduction to data mining. Pearson Education, Inc., New York
Google Scholar
Tobler W (1979) In: Gale, Olsson (eds) Cellular geography, philosophy in geography. Reidel, Dordrecht
Wang G (2001) Theory and knowledge acquirement of rough set (in Chinese). Press of Xian Transportation University, Xian
Google Scholar
Wang J, Liu J, Zhuang D, Li L, Ge Y (2002) Spatial sampling design for monitoring the area of cultivated land. Int J Remote Sens 23(2):263–284
Article Google Scholar
Witten IH, Frank E (2000) Data mining, practical machine learning tools and techniques with JAVA implementations, Elsevier, Singapore
Google Scholar
Zhang Y, Hang Y, Chen H, Xue F, Wang J, Sun G (2001) The impact of dimensions of sampling geographic cells of statistical indicators on the distribution of disease. Literature and Information of Preventative Medicine (in Chinese), 7(6):613–615
Google Scholar
Zhang P, Steinbach M, Kumar V, Shekhar S, Tan PN, Klooster S, Pot C (2005) Discovery of patterns in earth science data using data mining. In: New generation of data mining applications, vol 4. Wiley, New York
Zeng Z (2004) Research on computer classification of satellite images and application in geoscience (in Chinese). Science Press, Beijing
Google Scholar

Download references

Acknowledgments

This research has been done in support of the grants 40601077/D0120 and 40471111/D0120 from the Natural Science Foundation of China, and the grant 2007AA12Z233 from Hi-tech Research and Development Program of China (863).

Author information

Authors and Affiliations

Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, No. (A) 11, Rd. Datun, Anwai, Beijing, 100101, People’s Republic of China
Lianfa Li, Jinfeng Wang, Zhidong Cao & Ershun Zhong

Authors

Lianfa Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinfeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhidong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Ershun Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lianfa Li.

Appendices

Appendix 1: The equation of k-nearest neighbor for a nominal variable

$$ \hat{f}(x_{i} ) \leftarrow {\mathop {\arg \max }\limits_{v\; \in \;V} }{\sum\limits_{i\; = \;1}^k {\delta (v,f(s_{i} \left| {s_{i} \in N(x_{i} )} \right.))} } $$

(5)

where $ \hat{f}(x_{i} ) $ is the estimate of the grid unit x _i, V is the finite set {v ₁,…,v _s} of the band variable, k is the number of the nearest neighbors, f(s _i|s _i ∈ N(x _i)) is the categorical or discrete value of the ith nearest neighbor unit, s _i belonging to the neighborhood of x _i, N(x _i), δ(a,b) = 1 if a = b and δ(a,b) = 0 otherwise.

Appendix 2: A brief introduction to rough set

The rough set assumes that an information system consists of the 4-tuples S = 〈U, B, V, f〉 where U is a finite set of objects, i.e. each a gird unit, x _i, in our dataset, B is a finite set of attributes, i.e. each a band in the dataset, $ V = {\bigcup\limits_{b \in B} {V_{b} } }, $ V _b is the value domain of the band attribute, b and f: U × B → V is a total function such that f(x _i, b) ∈ V _b for every b ∈ B, x _i ∈ U, called information function. Any pair (b,v), b ∈ B, v ∈ V _q is called descriptor in S. In rough set, those attributes used in classification are called the conditional variables, in fact, auxiliary variables X _k (k is the order no. of band variables) and the classification variable is the decisive variable, Y. Conditional variables are used to classify the object x _i in the system. If two objects in the dataset U, x _i and x _j have the relation, f(x _i,b) = f(x _j,b) for every b ∈ B, we call x _i and x _j indiscernible and all of such indiscernible objects composes a class set of the goal variable. Each conditional (auxiliary) variable has different levels of classifying the objects and the level is called significance of attribute (SA) in terms of the decisive variable. For more, please refer to Aleksander (1999), Komorowski et al. (1999) and Wang (2001).

Appendix 3: Modeling for the stratification survey of the cultivatable land’s areal proportion

Given

N :: is the number of all the aerial photos that cover the whole study region;
n :: is the number of the photo units sampled;
L :: is the number of stratums;
N _h :: is the total number of units in the hth stratum;
n _h :: is the number of units sampled for analysis in the hth stratum;
β _ih :: is the areal proportion of the cultivable land of the ith aerial photo in the hth stratum;

1.
If a sample unit of aerial photo is overlapped by several polygons within different strata the samples will be separated into smaller sub-sample units within the strata. The areal proportion of the cultivatable land in the sub-sample unit remains same and each sample unit’s weight in the stratum is proportional to the unit’s total area: w _ih = S _ih/S _h where S _ih is the area of the sample unit in the hth stratum and S _h is the total area of all the units sampled in the hth stratum.
2.
Within each stratum, the units are randomly sampled according to the principle of SRS. However, the estimation equation with spatial proportion sampling is derived from Ripley (1981) and Wang et al. (2002).

The sampling proportion:
$$ f_{h} = n_{h} /N_{h} ; $$
(6)

The number of units sampled:
$$ a = f_{h} N_{h} ; $$
(7)

The proportion:
$$ \hat{\beta }_{h} (a) = {\sum\limits_{i = 1}^{n_{h} } {\beta _{{ih}} w_{{ih}} } } $$
(8)

The variance
$$ \begin{aligned}{} \hat{\sigma }_{{\hat{\beta }_{h} (a)}} (n_{h} )^{2} & = E_{h} {\left[ {\hat{\beta }_{h} (a) - \beta _{h} (a)} \right]}^{2} = E{\left[ {\frac{1} {{n_{h} }}{\sum\limits_{a = 1}^{n_{h} } {n_{h} } }\beta _{h} (a)w_{h} (a) - \frac{1} {{N_{h} }}{\int\limits_{N_{h} } {(n_{h} \beta _{h} (a)w_{h} (a)\;{\text{d}}a} }} \right]}^{2} \\ {\text{ }} & = \frac{1} {{n_{h} }}\{ 1 - E_{h} [r(a - {a}\ifmmode{'}\else$'$\fi)]\} \hat{\sigma }^{2}_{{\hat{\beta }_{h} (N_{h} )}} = F(n_{h} )\hat{\sigma }^{2}_{{\hat{\beta }_{h} (N_{h} )}} \\ \end{aligned} $$
(9)
where β _h(a) = β _ah, w _h(a) = w _ah, and $ F(n_{h} ) = (1/n_{h} )\{ 1 - E_{p} [r(a - {a}\ifmmode{'}\else$'$\fi)\left| R \right.]\} ;\,E_{h} [r(a - {a}\ifmmode{'}\else$'$\fi)\left| R \right.] $ is the expected value of the spatial correlation structure of the target variable in the study region, R (Rodriguez-Iturbe and Mejia 1974; Ripley 1981):
$$ \hat{\sigma }_{{\hat{\beta }_{h} (a)}} (N_{h} ) \equiv {\sum\limits_{a = 1}^{N_{h} } {{\left\{ {{\left[ {\beta _{h} (a)w_{h} (a) - {\sum\limits_{a = 1}^{N_{h} } {\beta _{h} (a)w_{h} (a)} }} \right]}^{2} w_{h} (a)} \right\}}} } $$
(10)
3.
For the estimation of the population’s mean, Cochran’s equation is mainly referred to. The estimate of $ \ifmmode\expandafter\bar\else\expandafter\=\fi{\beta }_{{{\text{STR}}}} $ is
$$ \hat{\ifmmode\expandafter\bar\else\expandafter\=\fi{\beta }}_{{{\text{STR}}}} = \frac{{{\sum\limits_h^L {n_{h} \hat{\ifmmode\expandafter\bar\else\expandafter\=\fi{\beta }}_{h} } }}} {n} = {\sum\limits_{h = 1}^L {w_{h} \hat{\ifmmode\expandafter\bar\else\expandafter\=\fi{\beta }}_{h} } }\quad {\text{where}}\;n = n_{1} + n_{2} + \cdots + n_{L} $$
(11)

The sampling proportion in each stratum:
$$ f_{h} = \frac{{n_{h} }} {n} = f = \frac{{{\sum\nolimits_{h = 1}^{n_{L} } {n_{h} } }}} {N} $$
(12)

The variance:
$$ \hat{V}(\hat{\ifmmode\expandafter\bar\else\expandafter\=\fi{\beta }}) = {\sum\limits_{h = 1}^L {w^{2}_{h} \hat{V}_{h} (\hat{\ifmmode\expandafter\bar\else\expandafter\=\fi{\beta }})} } $$
(13)
where $ \hat{V}_{h} (\hat{\ifmmode\expandafter\bar\else\expandafter\=\fi{\beta }}) $ is the variance of the stratum h.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Wang, J., Cao, Z. et al. An information-fusion method to identify pattern of spatial heterogeneity for improving the accuracy of estimation. Stoch Environ Res Risk Assess 22, 689–704 (2008). https://doi.org/10.1007/s00477-007-0179-1

Download citation

Published: 25 September 2007
Issue Date: October 2008
DOI: https://doi.org/10.1007/s00477-007-0179-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An information-fusion method to identify pattern of spatial heterogeneity for improving the accuracy of estimation

Abstract

Access this article

Similar content being viewed by others

A new method based on association rules mining and geo-filter for mining spatial association knowledge

Knowledge-Based Multicriteria Spatial Decision Support System (MC-SDSS) for Trends Assessment of Settlements Suitability

A raster-based spatial clustering method with robustness to spatial outliers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The equation of k-nearest neighbor for a nominal variable

Appendix 2: A brief introduction to rough set

Appendix 3: Modeling for the stratification survey of the cultivatable land’s areal proportion

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An information-fusion method to identify pattern of spatial heterogeneity for improving the accuracy of estimation

Abstract

Access this article

Similar content being viewed by others

A new method based on association rules mining and geo-filter for mining spatial association knowledge

Knowledge-Based Multicriteria Spatial Decision Support System (MC-SDSS) for Trends Assessment of Settlements Suitability

A raster-based spatial clustering method with robustness to spatial outliers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The equation of k-nearest neighbor for a nominal variable

Appendix 2: A brief introduction to rough set

Appendix 3: Modeling for the stratification survey of the cultivatable land’s areal proportion

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation