Abstract
Geochemical data are commonly censored, that is, concentrations for some samples are reported as “less than” or “greater than” some value. Censored data hampers statistical analysis because certain computational techniques used in statistical analysis require a complete set of uncensored data. We show that the simple substitution method for creating an uncensored dataset, e.g., replacement by3/4 times the detection limit, has serious flaws, and we present an objective method to determine the replacement value. Our basic premise is that the replacement value should equal the mean of the actual values represented by the qualified data. We adapt the maximum likelihood approach (Cohen, 1961) to estimate this mean. This method reproduces the mean and skewness as well or better than a simple substitution method using3/4 of the lower detection limit or3/4 of the upper detection limit. For a small proportion of “less than” substitutions, a simple-substitution replacement factor of 0.55 is preferable to3/4; for a small proportion of “greater than” substitutions, a simple-substitution replacement factor of 1.7 is preferable to4/3, provided the resulting replacement value does not exceed 100%. For more than 10% replacement, a mean empirical factor may be used. However, empirically determined simple-substitution replacement factors usually vary among different data sets and are less reliable with more replacements. Therefore, a maximum likelihood method is superior in general. Theoretical and empirical analyses show that true replacement factors for “less thans” decrease in magnitude with more replacements and larger standard deviation; those for “greater thans” increase in magnitude with more replacements and larger standard deviation. In contrast to any simple substitution method, the maximum likelihood method reproduces these variations. Using the maximum likelihood method for replacing “less thans” in our sample data set, correlation coefficients were reasonably accurately estimated in 90% of the cases for as much as 40% replacement and in 60% of the cases for 80% replacement. These results suggest that censored data can be utilized more than is commonly realized.
Similar content being viewed by others
References
Chayes, F., 1954, The lognormal distribution of the elements: A discussion: Geochim. et Cosmochim. Acta, v. 6, p. 119–120.
Chayes, F., 1960, On correlation between variables of constant sum: J. Geophys. Res., v. 65, p. 4185–4193.
Cohen, A. C., 1959, Simplified estimators for the normal distribution when samples are singly censored or truncated: Technometrics, v. 1, p. 217–237.
Cohen, A. C., 1961, Tables for maximum likelihood estimates: Singly truncated and singly censored samples: Technometrics, v. 3, p. 535–541.
Cohen, A. C., 1976, Progressively censored sampling in the three parameter log-normal distribution: Technometrics, v. 18, p. 99–103.
Cohn, T. A., 1988, Adjusted Maximum Likelihood Estimation of the Moments of Lognormal Populations from Type I Censored Samples: U.S. Geological Survey, Open-file Report 88-350, 34 p.
Crow, E. L., Davis, F. A., and Maxfield, M. W., 1960, Statistics Manual: Dover Publications, New York, 288 p.
De Wijs, H. J., 1951, Statistics of Ore Distribution: Geologie en Minjbouw, 13E Jaargang Nieuw Serie, p. 365–396.
Gilliom, R. J., and Helsel, D. R., 1986, Estimation of distributional parameters for censored trace level water quality data, 1. Estimation techniques: Water Res. Res., v. 22, p. 135–146.
Helsel, D. R., and Cohn, T. A., 1988, Estimation of descriptive statistics for multiply censored water quality data: Water Res. Res., v. 24, p. 1997–2004.
Helsel, D. R., and Gilliom, R. J., 1986, Estimation of distributional parameters for censored trace level water quality data, 2. Verification and applications: Water Res. Res., v. 22, p. 147–155.
Korn, G. A., and Korn, T. M., 1968, Mathematical Handbook for Scientists and Engineers: McGraw-Hill, New York, 1130 p.
Krige, D. G., 1951, A statistical approach to some basic mine valuation problems on the Witwatersrand: J. Chem. Metall. Mining Soc. S. Afr., v. 52, p. 119–139.
Krige, D. G., 1960, On the departure of ore value distributions from lognormal models in South African gold mines: J. S. Afr. Inst. Mining Metall., v. 61, p. 231–244.
Miesch, A. T., 1967, Methods of Computation for Estimating Geochemical Abundance: U.S. Geological Survey Professional Paper 574-B, 15 p.
Miesch, A. T., 1976a, Sampling designs for geochemical surveys—Syllabus for a short course: U.S. Geological Survey Open-File Report, 76–772, 140 p.
Miesch, A. T., 1976b, Geochemical survey of Missouri—Methods of sampling, laboratory analysis and statistical reduction of data: U.S. Geological Survey Professional Paper 954-A, 39 p.
Miesch, A. T., 1982, Estimation of the geochemical threshold and its statistical significance: J. Geochem. Exp., v. 16, p. 77–104.
Miesch, A. T., and Riley, L. B., 1961, Basic statistical methods used in geochemical investigations of Colorado Plateau uranium deposits: HIMMP Trans. (Mining), v. 220, p. 247–251.
Sanford, R. F., Korzeb, S. L., Seeley, J. L., and Zamudio, J. A., 1987a, Geochemical Data for Mineralized Rocks in the Lake City Area, San Juan Volcanic Field, Southwest Colorado: U.S. Geological Survey, Open-file Report 87–54, 213 p.
Sanford, R. F., Grauch, R. I., Hon, K., Bove, D. J., and Grauch, V. J. S., 1987b, Mineral resources of the Redcloud Peak and Handies Peak Wilderness Study Areas, Hinsdale County, Colorado: U.S. Geological Survey, Bulletin 1715-B, 40 p.
Sichel, H. S., 1952, New methods in the statistical evaluation of mine sampling data: London, Inst. Mining and Metall. Trans., v. 61, p. 261–288.
Sichel, H. S., 1966, The estimation of means and associated confidence limits for small samples from lognormal populations: J. S. Afr. Inst. Mining and Metall., Symposium: Mathematical Statistics and Computer Applications in Ore Valuation, p. 106–123.
VanTrump, G., Jr., 1977, The U.S. Geological Survey RASS-STATPAC system for management and statistical reduction of geochemical data: Comp. Geosci., v. 3, p. 475–488.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sanford, R.F., Pierson, C.T. & Crovelli, R.A. An objective replacement method for censored geochemical data. Math Geol 25, 59–80 (1993). https://doi.org/10.1007/BF00890676
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00890676