Senayan

  • Home
  • Information
  • News
  • Help
  • Librarian
  • Member Area
  • Select Language :
    Arabic Bengali Brazilian Portuguese English Espanol German Indonesian Japanese Malay Persian Russian Thai Turkish Urdu

Search by :

ALL Author Subject ISBN/ISSN Advanced Search

Last search:

{{tmpObj[k].text}}
Image of Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia

Text

Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia

Zulfaqar Sa’adi - Personal Name; Zulkifli Yusop - Personal Name; Nor Eliza Alias - Personal Name; Ming Fai Chow - Personal Name; Mohd Khairul Idlan Muhammad - Personal Name; Muhammad Wafiy Adli Ramli - Personal Name; Zafar Iqbal - Personal Name; Mohammed Sanusi Shiru - Personal Name; Faizal Immaddudin Wira Rohmat - Personal Name; Nur Athirah Mohamad - Personal Name; Mohamad Faizal Ahmad - Personal Name;

Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (norm.predict) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of mean, rf, and cart also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.


Availability
167551.136Perpustakaan BIG (Eksternal Harddisk)Available
Detail Information
Series Title
Applied Computing and Geoscience - Open Access
Call Number
551.136
Publisher
Amsterdam : Elsevier., 2023
Collation
16 hlm PDF, 12.766 KB
Language
Inggris
ISBN/ISSN
2590-1974
Classification
551.136
Content Type
text
Media Type
-
Carrier Type
-
Edition
Vol.20, December 2023
Subject(s)
Daily rainfall
Johor river basin
Missing data
Multiple imputation methods
Peninsular Malaysia
Spatiotemporal variability
Specific Detail Info
-
Statement of Responsibility
-
Other version/related

No other version available

File Attachment
  • Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia
    Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (norm.predict) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of mean, rf, and cart also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.
Comments

You must be logged in to post a comment

Senayan
  • Information
  • Services
  • Librarian
  • Member Area

About Us

As a complete Library Management System, SLiMS (Senayan Library Management System) has many features that will help libraries and librarians to do their job easily and quickly. Follow this link to show some features provided by SLiMS.

Search

start it by typing one or more keywords for title, author or subject

Keep SLiMS Alive Want to Contribute?

© 2026 — Senayan Developer Community

Powered by SLiMS
Select the topic you are interested in
  • Computer Science, Information & General Works
  • Philosophy & Psychology
  • Religion
  • Social Sciences
  • Language
  • Pure Science
  • Applied Sciences
  • Art & Recreation
  • Literature
  • History & Geography
Icons made by Freepik from www.flaticon.com
Advanced Search