Feature Transformation
Feature transformation was carried out to ensure that variables were more informative and suitable for model training. Categorical variables such as major river basin (MRB) and dominant land cover (DLC) were encoded with unique integer values to represent each class. For MRB, the HydroSHEDS-provided HYBASID was used as the unique identifier (Lehner et al., 2008).
Several continuous variables exhibited high skewness, particularly those related to spatial dimensions (e.g., capacity, area, and length). To address this, a natural log transformation was applied. The original abbreviations and variable names were retained after transformation to ensure consistency across tables and figures.
Additionally, negative values of sedimentation rate (SR)—which can arise from sediment management interventions or short observation periods (Minocha and Hossain, 2025a)—were replaced with small nonzero values (0 MCM/year). This adjustment ensures that the models do not attempt to learn physically implausible negative values, while still preserving the integrity of the dataset.
Log-Transformed Features
The following variables were transformed using the natural logarithm. Original feature numbers and abbreviations have been retained for consistency.
No. |
Name |
Abbreviation |
|---|---|---|
P1 |
Original Built Capacity |
log_OBC |
P2 |
Dam Height |
log_HGT |
P6 |
Reservoir Area |
log_RA |
P7 |
Reservoir Perimeter |
log_RP |
P8 |
Flow Length |
log_FL |
P9 |
Catchment Area |
log_CA |
P10 |
Differential Catchment Area |
log_DCA |
P14 |
Land cover of artificial surfaces |
log_LCAS |
P15 |
Land cover of cropland |
log_LCC |
P16 |
Land cover of grassland |
log_LCG |
P17 |
Land cover of trees |
log_LCT |
P18 |
Land cover of shrubs |
log_LCS |
P19 |
Land cover of herbaceous vegetation |
log_LCHV |
P20 |
Land cover of mangroves |
log_LCM |
P21 |
Land cover of sparse vegetation |
log_LCSV |
P22 |
Land cover of bare soil |
log_LCBS |
P23 |
Land cover of snow and glaciers |
log_LCSG |
P24 |
Land cover of water bodies |
log_LCWB |
P36 |
Mean Annual Inflow |
log_MAI |
P37 |
Peak Annual Inflow |
log_PAI |
P39 |
Standard Deviation in Inflow |
log_I_std |
P42 |
Mean Annual Outflow |
log_MAO |
P43 |
Standard Deviation in Outflow |
log_O_std |
P47 |
Mean Surface Area |
log_SA_mean |
P48 |
Standard Deviation of Surface Area |
log_SA_std |
P51 |
Kurtosis of Surface Area |
log_SA_kurt |
P52 |
Mean Surface Area (clipped) |
log_SA_mean_clip |
P62 |
Mean Annual Rainfall |
log_MAR |
P74 |
Relative Original Capacity |
log_ROBC |
P75 |
Geometry Complexity |
log_GC |
P82 |
Rainfall per Unit Area |
log_rain_per_area |
P83 |
Trapping Efficiency |
log_TE |
P84 |
Residence Time |
log_RT |
P85 |
Estimated Capacity Loss Rate |
log_ECLR |
P86 |
Estimated Sedimentation Rate |
log_ESR |
P87 |
Sediment Influx |
log_SIN |
P88 |
Sediment Outflux |
log_SOUT |