User Guide =========== This guide provides step-by-step instructions to generate features for reservoirs and catchments, load pretrained RECLAIM models, and predict sedimentation rates (SR) using the ensemble model or individual base models. Contents -------- 0. Overview 1. Generating Features - 1.1 Single Reservoir - 1.2 Multiple Reservoirs 2. Loading Pretrained RECLAIM Model 3. Making Predictions - 3.1 Predicting with Ensemble - 3.2 Using Base Models 4. Evaluating Predictions 5. Saving and Loading Custom Trained Models 6. Example Workflow Overview -------- The RECLAIM python package provides a stacked ensemble predictor for reservoir sedimentation rate (SR) by combining XGBoost, LightGBM, and CatBoost base models with a powerful ensemble. It also includes utilities to generate features from static and dynamic reservoir and catchment data. 1. Generating Features --------------------- 1.1 Single Reservoir ```````````````````` To compute features for a single reservoir observation, use the `create_features_per_row` function in `generate_features.py`. **Parameters** - `reservoir_static_params`: dict Required keys for reservoir static features: - `obc` : Original Built Capacity (MCM) - `hgt` : Dam Height (m) - `mrb` : Major River Basin (optional) - `lat`, `lon` : Latitude & Longitude (degrees) - `reservoir_polygon` : shapely Polygon - `inlet_point` : shapely Point (optional) - `resolution` : float (optional) - `aec_df` : pd.DataFrame with `['area','elevation']` - `catchment_static_params`: dict Required keys for catchment static features: - `ca` : Catchment Area (sq km) - `dca` : Differential Catchment Area (sq km) - `catchment_geometry` : shapely Polygon or GeoSeries - `glc_share_path` : path to GLC-Share NetCDF (land cover) - `hwsd2_path` : path to HWSD2 NetCDF (soils) - `hilda_veg_freq_path` : path to HILDA vegetation NetCDF - `terrain_path` : path to terrain/DEM derivatives NetCDF - `reservoir_dynamic_info` : dict (optional) Must contain paths and column names for: - `inflow`, `outflow`, `evaporation`, `surface_area`, `nssc`, `nssc2` - `catchment_dynamic_info` : dict (optional) Must contain paths and column names for: - `precip`, `tmin`, `tmax`, `wind` - `observation_period` : list `[OSY, OEY]` (optional) Observation start and end year. **Example** .. code-block:: python from reclaim.generate_features import create_features_per_row reservoir_static = { "obc": 150.0, "hgt": 45.0, "mrb": "Ganges", "lat": 25.6, "lon": 81.9, "reservoir_polygon": reservoir_polygon, "aec_df": aec_df } catchment_static = { "ca": 1200, "dca": 50, "catchment_geometry": catchment_geom, "glc_share_path": "data/glc.nc", "hwsd2_path": "data/soil.nc", "hilda_veg_freq_path": "data/veg.nc", "terrain_path": "data/terrain.nc" } features = create_features_per_row( reservoir_static_params=reservoir_static, catchment_static_params=catchment_static, observation_period=[2000, 2020] ) 1.2 Multiple Reservoirs ````````````````````````` For batch processing, use `create_features_multi` with a list of reservoir dictionaries. **Example** .. code-block:: python from reclaim.generate_features import create_features_multi reservoirs_input = [ { "reservoir_static_params": reservoir_static, "catchment_static_params": catchment_static, "observation_period": [2000, 2020] }, { "reservoir_static_params": reservoir_static2, "catchment_static_params": catchment_static2, "observation_period": [2005, 2020] } ] features_df = create_features_multi(reservoirs_input) This returns a combined DataFrame with one row per reservoir. 2. Loading Pretrained RECLAIM Model ----------------------------------- The package includes a pretrained ensemble model stored in `pretrained_model` folder. **Example** .. code-block:: python from reclaim.reclaim import Reclaim model = Reclaim() model.load_model() # Loads pretrained model from package folder By default, this loads the XGBoost, LightGBM, CatBoost models and metadata (feature order, cat features). 3. Making Predictions --------------------- 3.1 Predicting with Ensemble ````````````````````````` The ensemble prediction uses dynamic, instance-wise weights based on CatBoost output. **Example** .. code-block:: python predictions, weights = model.predict(features_df, return_weights=True) **Parameters** - `log_transform` (bool, default=True) – Apply log1p to stabilize high values - `dynamic_weight` (bool, default=True) – Use instance-wise weights - `threshold` (float, default=30) – Threshold separating low/high predictions - `sat_point` (float, default=70) – Saturation point for above-threshold weights - `smooth_factor` (float, default=0.2) – Controls sigmoid sharpness `weights` is a DataFrame showing the contribution of XGBoost, LightGBM, and CatBoost for each observation. Or you can predict using simple average of individual base models: .. code-block:: python average_pred = model.predict(features_df, log_transform=False, dynamic_weight=False) 3.2 Using Base Models ````````````````````````` You can also predict explicitly using one of the base models: .. code-block:: python model.main_model = "XGBoost" pred_xgb = model.predict(features_df) 4. Evaluating Predictions ------------------------- Evaluate model performance on true SR values: .. code-block:: python y_true = [...] # true sedimentation rates metrics = model.evaluate(features_df, y_true) print(metrics) # {'RMSE': ..., 'MAE': ..., 'R2': ...} 5. Saving and Loading Custom Trained Models ------------------------------------------- Save models after custom training: .. code-block:: python model.save_model(save_dir="custom_models", prefix="my_run") Load previously saved models: .. code-block:: python model.load_model(load_dir="custom_models", prefix="my_run") 6. Example Workflow ------------------- Complete example from feature generation to prediction and evaluation: .. code-block:: python from reclaim.generate_features import create_features_per_row from reclaim.reclaim import Reclaim # Step 1: Generate features features = create_features_per_row( reservoir_static_params=reservoir_static, catchment_static_params=catchment_static, observation_period=[2000, 2020] ) # Step 2: Load pretrained model model = Reclaim() model.load_model() # Step 3: Predict sedimentation rates pred_sr, weights = model.predict(features, return_weights=True) # Step 4: Inspect predictions print(pred_sr) print(weights) # Step 5: Evaluate (if ground truth available) metrics = model.evaluate(features, y_true) print(metrics)