User Guide

This guide provides step-by-step instructions to generate features for reservoirs and catchments, load pretrained RECLAIM models, and predict sedimentation rates (SR) using the ensemble model or individual base models.

Contents

  1. Overview

  2. Generating Features
    • 1.1 Single Reservoir

    • 1.2 Multiple Reservoirs

  3. Loading Pretrained RECLAIM Model

  4. Making Predictions
    • 3.1 Predicting with Ensemble

    • 3.2 Using Base Models

  5. Evaluating Predictions

  6. Saving and Loading Custom Trained Models

  7. Example Workflow

Overview

The RECLAIM python package provides a stacked ensemble predictor for reservoir sedimentation rate (SR) by combining XGBoost, LightGBM, and CatBoost base models with a powerful ensemble. It also includes utilities to generate features from static and dynamic reservoir and catchment data.

1. Generating Features

1.1 Single Reservoir

To compute features for a single reservoir observation, use the create_features_per_row function in generate_features.py.

Parameters

  • reservoir_static_params: dict Required keys for reservoir static features:

    • obc : Original Built Capacity (MCM)

    • hgt : Dam Height (m)

    • mrb : Major River Basin (optional)

    • lat, lon : Latitude & Longitude (degrees)

    • reservoir_polygon : shapely Polygon

    • inlet_point : shapely Point (optional)

    • resolution : float (optional)

    • aec_df : pd.DataFrame with [‘area’,’elevation’]

  • catchment_static_params: dict Required keys for catchment static features:

    • ca : Catchment Area (sq km)

    • dca : Differential Catchment Area (sq km)

    • catchment_geometry : shapely Polygon or GeoSeries

    • glc_share_path : path to GLC-Share NetCDF (land cover)

    • hwsd2_path : path to HWSD2 NetCDF (soils)

    • hilda_veg_freq_path : path to HILDA vegetation NetCDF

    • terrain_path : path to terrain/DEM derivatives NetCDF

  • reservoir_dynamic_info : dict (optional) Must contain paths and column names for:

    • inflow, outflow, evaporation, surface_area, nssc, nssc2

  • catchment_dynamic_info : dict (optional) Must contain paths and column names for:

    • precip, tmin, tmax, wind

  • observation_period : list [OSY, OEY] (optional) Observation start and end year.

Example

from reclaim.generate_features import create_features_per_row

reservoir_static = {
    "obc": 150.0,
    "hgt": 45.0,
    "mrb": "Ganges",
    "lat": 25.6,
    "lon": 81.9,
    "reservoir_polygon": reservoir_polygon,
    "aec_df": aec_df
}

catchment_static = {
    "ca": 1200,
    "dca": 50,
    "catchment_geometry": catchment_geom,
    "glc_share_path": "data/glc.nc",
    "hwsd2_path": "data/soil.nc",
    "hilda_veg_freq_path": "data/veg.nc",
    "terrain_path": "data/terrain.nc"
}

features = create_features_per_row(
    reservoir_static_params=reservoir_static,
    catchment_static_params=catchment_static,
    observation_period=[2000, 2020]
)

1.2 Multiple Reservoirs

For batch processing, use create_features_multi with a list of reservoir dictionaries.

Example

from reclaim.generate_features import create_features_multi

reservoirs_input = [
    {
        "reservoir_static_params": reservoir_static,
        "catchment_static_params": catchment_static,
        "observation_period": [2000, 2020]
    },
    {
        "reservoir_static_params": reservoir_static2,
        "catchment_static_params": catchment_static2,
        "observation_period": [2005, 2020]
    }
]

features_df = create_features_multi(reservoirs_input)

This returns a combined DataFrame with one row per reservoir.

2. Loading Pretrained RECLAIM Model

The package includes a pretrained ensemble model stored in pretrained_model folder.

Example

from reclaim.reclaim import Reclaim

model = Reclaim()
model.load_model()  # Loads pretrained model from package folder

By default, this loads the XGBoost, LightGBM, CatBoost models and metadata (feature order, cat features).

3. Making Predictions

3.1 Predicting with Ensemble

The ensemble prediction uses dynamic, instance-wise weights based on CatBoost output.

Example

predictions, weights = model.predict(features_df, return_weights=True)

Parameters

  • log_transform (bool, default=True) – Apply log1p to stabilize high values

  • dynamic_weight (bool, default=True) – Use instance-wise weights

  • threshold (float, default=30) – Threshold separating low/high predictions

  • sat_point (float, default=70) – Saturation point for above-threshold weights

  • smooth_factor (float, default=0.2) – Controls sigmoid sharpness

weights is a DataFrame showing the contribution of XGBoost, LightGBM, and CatBoost for each observation.

Or you can predict using simple average of individual base models:

average_pred = model.predict(features_df, log_transform=False, dynamic_weight=False)

3.2 Using Base Models

You can also predict explicitly using one of the base models:

model.main_model = "XGBoost"
pred_xgb = model.predict(features_df)

4. Evaluating Predictions

Evaluate model performance on true SR values:

y_true = [...]  # true sedimentation rates
metrics = model.evaluate(features_df, y_true)
print(metrics)  # {'RMSE': ..., 'MAE': ..., 'R2': ...}

5. Saving and Loading Custom Trained Models

Save models after custom training:

model.save_model(save_dir="custom_models", prefix="my_run")

Load previously saved models:

model.load_model(load_dir="custom_models", prefix="my_run")

6. Example Workflow

Complete example from feature generation to prediction and evaluation:

from reclaim.generate_features import create_features_per_row
from reclaim.reclaim import Reclaim

# Step 1: Generate features
features = create_features_per_row(
    reservoir_static_params=reservoir_static,
    catchment_static_params=catchment_static,
    observation_period=[2000, 2020]
)

# Step 2: Load pretrained model
model = Reclaim()
model.load_model()

# Step 3: Predict sedimentation rates
pred_sr, weights = model.predict(features, return_weights=True)

# Step 4: Inspect predictions
print(pred_sr)
print(weights)

# Step 5: Evaluate (if ground truth available)
metrics = model.evaluate(features, y_true)
print(metrics)