MACHINE LEARNING BASED BIAS CORRECTION OF CMAQ USING EMISSION SOURCE DECOUPLED FEATURES IN THAILAND
DOI:
https://doi.org/10.21660/2026.139.g15298Keywords:
Bias correction, Chemical transport model, Machine learning, PM2.5, SHAPAbstract
Machine learning (ML) models are commonly used to correct biases in chemical transport model simulations of PM2.5 from multiple emission sources. However, PM2.5 simulations are typically treated as a single predictor in ML models, limiting insight into how individual source influences their predictions. In this study, we decomposed PM2.5 concentrations, simulated by the Community Multiscale Air Quality (CMAQ) model, into individual contributions from biomass burning (BB), anthropogenic (AT), and other sources. These three source contributions were used as predictors in a Light Gradient Boosting Machine (LightGBM) and interpreted via Shapley Additive exPlanation (SHAP) values to diagnose their influence on PM2.5 predictions over Thailand. The proposed model improved PM2.5 prediction compared to the original CMAQ model. SHAP analysis suggested that the BB contribution was the most important predictor, followed by AT and other contributions, with mean absolute SHAP values of 8.43, 3.69, and 2.37 μg/m3, respectively. The BB contribution increased the predicted values by 4.68 ± 13.45 μg/m3 in the dry season and decreased them by 7.44 ± 1.84 μg/m3 in the wet season, relative to the model’s expected output (around 24.56 µg/m³). SHAP interaction analysis suggested that CMAQ overestimation of PM2.5 during high pollution episodes may stem from inaccurate BB and AT emissions. Findings highlight the need to prioritize refinement of the BB emission inventory (e.g., by tuning emission factors) to reduce PM2.5 overestimation.







