Senayan

  • Home
  • Information
  • News
  • Help
  • Librarian
  • Member Area
  • Select Language :
    Arabic Bengali Brazilian Portuguese English Espanol German Indonesian Japanese Malay Persian Russian Thai Turkish Urdu

Search by :

ALL Author Subject ISBN/ISSN Advanced Search

Last search:

{{tmpObj[k].text}}
Image of Classification random forest with exact conditioning for spatial prediction of categorical variables

Text

Classification random forest with exact conditioning for spatial prediction of categorical variables

Francky Fouedjio - Personal Name;

Machine learning methods are increasingly used for spatially predicting a categorical target variable when spatially exhaustive predictor variables are available within the study region. Even though these methods exhibit competitive spatial prediction performance, they do not exactly honor the categorical target variable's observed values at sampling locations by construction. On the other side, competitor geostatistical methods perfectly match the categorical target variable's observed values at sampling locations by essence. In many geoscience applications, it is often desirable to perfectly match the observed values of the categorical target variable at sampling locations, especially when the categorical target variable's measurements can be reasonably considered error-free. This paper addresses the problem of exact conditioning of machine learning methods for the spatial prediction of categorical variables. It introduces a classification random forest-based approach in which the categorical target variable is exactly conditioned to the data, thus having the exact conditioning property like competitor geostatistical methods. The proposed method extends a previous work dedicated to continuous target variables by using an implicit representation of the categorical target variable. The basic idea consists of transforming the ensemble of classification tree predictors' (categorical) resulting from the traditional classification random forest into an ensemble of signed distances (continuous) associated with each category of the categorical target variable. Then, an orthogonal representation of the ensemble of signed distances is created through the principal component analysis, thus allowing to reformulate the exact conditioning problem as a system of linear inequalities on principal component scores. Then, the sampling of new principal component scores ensuring the data's exact conditioning is performed via randomized quadratic programming. The resulting conditional signed distances are turned out into an ensemble of categorical outputs, which perfectly honor the categorical target variable's observed values at sampling locations. Then, the majority vote is used to aggregate the ensemble of categorical outputs. The effectiveness of the proposed method is illustrated on a simulated dataset for which ground-truth is available and showcased on a real-world dataset, including geochemical data. A comparison with geostatistical and traditional machine learning methods show that the proposed technique can perfectly match the categorical target variable's observed values at sampling locations while maintaining competitive out-of-sample predictive performance.


Availability
254551Perpustakaan BIG (Eksternal Harddisk)Available
Detail Information
Series Title
Artificial Intelligence in Geosciences
Call Number
551
Publisher
Beijing : KeAi Communications Co. Ltd.., 2021
Collation
14 hlm PDF, 7.744 KB
Language
Inggris
ISBN/ISSN
2666-5441
Classification
551
Content Type
text
Media Type
-
Carrier Type
-
Edition
Vol.2, December 2021
Subject(s)
Classification
Exact conditioning
Spatial prediction
Principal component analysis
Categorical variable
Signed distance
Quadratic programming
Specific Detail Info
-
Statement of Responsibility
-
Other version/related

No other version available

File Attachment
  • Classification random forest with exact conditioning for spatial prediction of categorical variables
    Machine learning methods are increasingly used for spatially predicting a categorical target variable when spatially exhaustive predictor variables are available within the study region. Even though these methods exhibit competitive spatial prediction performance, they do not exactly honor the categorical target variable's observed values at sampling locations by construction. On the other side, competitor geostatistical methods perfectly match the categorical target variable's observed values at sampling locations by essence. In many geoscience applications, it is often desirable to perfectly match the observed values of the categorical target variable at sampling locations, especially when the categorical target variable's measurements can be reasonably considered error-free. This paper addresses the problem of exact conditioning of machine learning methods for the spatial prediction of categorical variables. It introduces a classification random forest-based approach in which the categorical target variable is exactly conditioned to the data, thus having the exact conditioning property like competitor geostatistical methods. The proposed method extends a previous work dedicated to continuous target variables by using an implicit representation of the categorical target variable. The basic idea consists of transforming the ensemble of classification tree predictors' (categorical) resulting from the traditional classification random forest into an ensemble of signed distances (continuous) associated with each category of the categorical target variable. Then, an orthogonal representation of the ensemble of signed distances is created through the principal component analysis, thus allowing to reformulate the exact conditioning problem as a system of linear inequalities on principal component scores. Then, the sampling of new principal component scores ensuring the data's exact conditioning is performed via randomized quadratic programming. The resulting conditional signed distances are turned out into an ensemble of categorical outputs, which perfectly honor the categorical target variable's observed values at sampling locations. Then, the majority vote is used to aggregate the ensemble of categorical outputs. The effectiveness of the proposed method is illustrated on a simulated dataset for which ground-truth is available and showcased on a real-world dataset, including geochemical data. A comparison with geostatistical and traditional machine learning methods show that the proposed technique can perfectly match the categorical target variable's observed values at sampling locations while maintaining competitive out-of-sample predictive performance.
    Other Resource Link
Comments

You must be logged in to post a comment

Senayan
  • Information
  • Services
  • Librarian
  • Member Area

About Us

As a complete Library Management System, SLiMS (Senayan Library Management System) has many features that will help libraries and librarians to do their job easily and quickly. Follow this link to show some features provided by SLiMS.

Search

start it by typing one or more keywords for title, author or subject

Keep SLiMS Alive Want to Contribute?

© 2026 — Senayan Developer Community

Powered by SLiMS
Select the topic you are interested in
  • Computer Science, Information & General Works
  • Philosophy & Psychology
  • Religion
  • Social Sciences
  • Language
  • Pure Science
  • Applied Sciences
  • Art & Recreation
  • Literature
  • History & Geography
Icons made by Freepik from www.flaticon.com
Advanced Search