Getting Started

Terminology:

Simulated data
- When running basd, you'll likely be wanting to adjust a simulated climate dataset. So we will just refer to the data that we want to adjust as the "simulated data".
Observational data
- When running basd, you need a reference dataset. Again, this will tend to be an observational dataset over a historical period, so we will refer to the reference data as the "observational data".
Application period
- basd adjust and downscales data over a given time period, which we will call the "application period"
Target period
- Similarly, basd uses observational data over a given time period, to compare with simulated data over that same period. This period where we make the comparison is called the "target period".

Conceptual Outline

When starting with basd, we first need to decide what climate data we'd wish to adjust and over what period, and what observational dataset we'll use as reference. In the quickstarter notebook we make use of small example datasets provided here.

For example, let's say we want to bias adjust and downscale one run from the CanESM5 model from the CMIP6 experiments, and we want to start with precipitation (shorthand name pr). Plus, we can use the W5E5v2.0 observational dataset for bias adjustment and statistical downscaling created for ISIMIP3 for our reference. The W5E5v2.0 data covers from 1970-2014, so we'll use that as our target period, and the whole CMIP6 future period of 2015-2100 as our application period.

Then we can outline the steps we need to take to get our desired output.

Save the input data to your machine.

This will include the simulated data that covers your application period and target period, and the observational data which will covers the target period.
Choose key parameters.

Each climate variable uses slightly different steps/parameters during basd that model the distributions the best for that variable. These include things like the distribution family to use, lower/upper bounds and thresholds, the method of trend preservation, etc. You can learn more about each of these parameters on the Bias Adjustment and Statistical Downscaling wikie pages, or especially from Lange's paper. Also in that paper are default parameters for each climate variable that are good for nearly all use-cases.
Apply basd

At this point, the important decisions are made, you just need to create a Python script and call the basd functions while supplying your data and settings/parameters. Of course, easier said than done, so the next section will be dedicated to this step.

Practical Outline

In a Python script, load the relevant packages:

basd - for bias adjustment and statistical downscaling
xarray - for accessing, manipulating, and writing datasets
dask - for parallelization and lazy evaluation
- LocalCluster - to spread processes over computing resources
- Client - for managing processes, threads, local cluster, etc.
os - optional for basic OS tasks

import basd
from dask.distributed import LocalCluster, Client
import xarray as xr

Read in your data, and extract the three key periods.

# Read in data
observational_data = xr.open_mfdataset(obs_input_file, chunks={'time': 365})
simulation_target = xr.open_mfdataset(sim_input_file, chunks={'time': 365})
simulation_application = simulation_target.copy()

# Slice time periods
observational_data = observational_data.sel(time = slice(f'{target_start_year}', f'{target_end_year}'))
simulation_target = simulation_target.sel(time = slice(f'{target_start_year}', f'{target_end_year}'))
simulation_application = simulation_application.sel(time = slice(f'{application_start_year}', f'{application_end_year}'))

Set your settings and parameters with the Parameters object.

Wet-day precipitation in a grid cell over time follows a gamma distribution, which has a lower bound of 0. Thus we set this in our parameters, and we set a lower threshold of 0.0000011574 since we specified "wet days". That exact value is referenced in Lange 2019 and so used here. We also specify that we want a "mixed" additive and multiplicative trend preservation. Again, learn more about what all of these mean in Lange 2019, or on the other Wiki pages. The n_iterations parameter is used to say how many times we should perform the downscaling fitting step.
```
# Default Precipitation Parameters
parameters = basd.Parameters(lower_bound=0,
                            lower_threshold=0.0000011574,
                            trend_preservation='mixed',
                            distribution='gamma',
                            n_iterations=10)
```

Applying the basd functions.

4.1. Bias Adjustment.

First we initialize the bias adjustment step by providing our data, parameters, and the name of the variable that we're adjusting.

# Initialize Bias Adjustment
ba = basd.init_bias_adjustment(
    observational_data, 
    simulation_target, 
    simulation_application,
    'pr', 
    parameters
)

Then we give the outputs of that initialization, as well as paths and names of files depending on if we want daily or monthly data output, or both like in this case. This performs the bias adjustment algorithm and saves the output as NetCDF.

# Perform Bias Adjustment
basd.adjust_bias(
    init_output = ba, # Initialization
    output_dir = output_ba_path, # Output directory path
    day_file = output_day_ba_file_name, # Output daily data file name
    month_file = output_mon_ba_file_name # Output monthly data file name
)

4.2. Statistical Downscaling

Again, we begin by initializing our downscaling object, which requires the input datasets, name of the climate variable, and parameter object. The input data sets are the observational data, and the bias-adjusted data created in the previous step.

We can read in our bias-adjusted data:

ba_file_name = os.path.join(output_ba_path, output_day_ba_file_name)
ba_simulation_data = xr.open_mfdataset( ba_file_name, chunks={'time': 100} )

and then initialize our downscaling process:

ds = basd.init_downscaling(
    observational_data,
    ba_simulation_data,
    'pr',
    parameters
)

Again, just like the bias-adjustment we can now run the downscaling by supplying the output of that initialization, as well as paths and names of files depending on if we want daily or monthly data output, or both like in this case. This performs the downscaling algorithm and saves the output as NetCDF.

basd.downscale(
    init_output = ds,
    output_dir = output_basd_path,
    day_file = output_day_basd_file,
    month_file = output_mon_basd_file,
    encoding={'pr': fine_encoding}
)

In this case we also supplied an encoding for the precipitation data as saved to NetCDF. This may look like the following for example:

# NetCDF output encoding
fine_encoding = {
    'zlib': True, # Compression engine
    'shuffle': True, # Memory storage option often useful when losts of 0s present
    'complevel': 5, # Compression level
    'fletcher32': False, # Optional checksums
    'contiguous': False, # Storage option good for large chunked data
    'chunksizes': (1, 360, 720), # 1 point per chunk along time, 360 lat, 720 lon
    'dtype': 'float32',
    'missing_value': 1e+20,
    '_FillValue': 1e+20
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Getting Started

Terminology:

Conceptual Outline

Practical Outline

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally