Create a Price Forecast Model with Probabilistic Predictions
This tutorial walks you through creating a price forecast model with prediction intervals
Tutorial Overview
In this tutorial, we will explore two different ways to model forecast uncertainty using wholesale electricity prices as our example.
- You will create and deploy nonparametric prediction intervals by directly forecasting quantiles with LightGBM.
- You will create and deploy parametric prediction intervals by forecasting the parameters of a Laplace distributions using NGBoost. Quantiles will then be generated using an operation connector.
Model Performance
Note that the models we are using in this tutorial are not optimized. The goal of this tutorial is to show you how to create and deploy a probabilistic price forecasting models. We do not recommend using these models in production.
Overview
For this tutorial, you will leverage the client library to create two NoTS. Both graphs will forecast the day-ahead locational marginal price (DA LMP) for the DLAP PGAE-APND price node. We will be using the Yes Energy source connector to access historical price data. The graphs differ in the way they model uncertainty.
- The first NoTS will use three LightGBM models to forecast the 5th percentile, the mean, and the 95th percentile.
- The second NoTS will use a single NGBoost model using a Laplace distribution. The model will output forecasts for the location and scale distribution parameters. We then use a Laplace distribution operation connector to generate forecasts for the 5th percentile, the mean, and the 95th percentile time series.
Prediction Intervals using LightGBM
This tutorial creates a graph with three output time series. The graph below shows an example of the three time series over the course of two weeks. The green line on top signifies the 95th percentile of the predictions range, the blue line the mean, and the red line the 5th percentile.
Create your NoTS
Client Library
To create the NoTS in the client library, you can use the following code.
import myst
from myst.connectors.model_connectors import lightgbm
from myst.connectors.operation_connectors import time_transformations
from myst.connectors.source_connectors import time_trends
from myst.connectors.source_connectors import yes_energy
from myst.recipes.time_series_recipes import the_weather_company
# Define the sample period used in this project.
SAMPLE_PERIOD = myst.TimeDelta("PT1H")
# Define the model fit policy to train on the most recent year of data.
MODEL_FIT_START_TIMING = myst.TimeDelta("-P1Y")
MODEL_FIT_END_TIMING = myst.TimeDelta("PT1H")
MODEL_FIT_SCHEDULE_TIMING = myst.TimeDelta("PT24H")
# Define the time series run policy to forecast the next 2 days once an hour.
FORECAST_START_TIMING = myst.TimeDelta("PT1H")
FORECAST_END_TIMING = myst.TimeDelta("PT49H")
FORECAST_SCHEDULE_TIMING = myst.TimeDelta("PT1H")
# Define the Yes Energy items of interest.
DALMP_ITEM = yes_energy.YesEnergyItem(datatype="DALMP", object_id=20000004194) # DLAP_PGAE-APND
LOAD_FORECAST_7DAY_ITEM = yes_energy.YesEnergyItem(datatype="LOAD_FORECAST_7DAY", object_id=10000328796) # PG&E
LOAD_FORECAST_2DAY_ITEM = yes_energy.YesEnergyItem(datatype="LOAD_FORECAST_2DAY", object_id=10000328796) # PG&E
SOLAR_FORECAST_ITEM = yes_energy.YesEnergyItem(datatype="SOLAR_FORECAST", object_id=10002494909) # NP15
WIND_FORECAST_ITEM = yes_energy.YesEnergyItem(datatype="WIND_FORECAST", object_id=10002494909) # NP15
def _get_label_indexer(yes_energy_item):
"""Returns the label indexer for a Yes Energy item."""
return f"{yes_energy_item.datatype}_{yes_energy_item.object_id}"
myst.authenticate()
# Create a new Project.
project = myst.Project.create(title="Prediction Intervals DA LMP")
# Create the Yes Energy target and feature time series.
yes_energy_source = project.create_source(
title="Yes Energy",
connector=yes_energy.YesEnergy(
items=[DALMP_ITEM, LOAD_FORECAST_7DAY_ITEM, LOAD_FORECAST_2DAY_ITEM, SOLAR_FORECAST_ITEM, WIND_FORECAST_ITEM],
stat=yes_energy.YesEnergyAggregation.AVG,
),
)
dalmp_ts = yes_energy_source.create_time_series(
title="DA LMP",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=DALMP_ITEM),
)
load_forecast_7day_ts = yes_energy_source.create_time_series(
title="Load Forecast (7 day)",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=LOAD_FORECAST_7DAY_ITEM),
)
load_forecast_2day_ts = yes_energy_source.create_time_series(
title="Load Forecast (2 day)",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=LOAD_FORECAST_2DAY_ITEM),
)
solar_forecast_ts = yes_energy_source.create_time_series(
title="Solar Forecast",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=SOLAR_FORECAST_ITEM),
)
wind_forecast_ts = yes_energy_source.create_time_series(
title="Wind Forecast",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=WIND_FORECAST_ITEM),
)
load_forecast_ts = project.create_time_series(title="Load Forecast", sample_period=SAMPLE_PERIOD)
load_forecast_ts.create_layer(node=load_forecast_2day_ts, order=0) # Use 2-day forecast where possible
load_forecast_ts.create_layer(node=load_forecast_7day_ts, order=1) # Otherwise 7-day
# Create an hour of day, day of week, and day of year time series from a time trends source.
time_trends_source = project.create_source(
title="Time Trends",
connector=time_trends.TimeTrends(
sample_period=SAMPLE_PERIOD,
time_zone="UTC",
fields=[time_trends.Field.HOUR_OF_DAY, time_trends.Field.DAY_OF_WEEK, time_trends.Field.DAY_OF_YEAR],
),
)
hour_of_day_time_series = time_trends_source.create_time_series(
title="Hour of Day",
sample_period=SAMPLE_PERIOD,
label_indexer=time_trends.Field.HOUR_OF_DAY,
)
day_of_week_time_series = time_trends_source.create_time_series(
title="Day of Week",
sample_period=SAMPLE_PERIOD,
label_indexer=time_trends.Field.DAY_OF_WEEK,
)
day_of_year_time_series = time_trends_source.create_time_series(
title="Day of Year",
sample_period=SAMPLE_PERIOD,
label_indexer=time_trends.Field.DAY_OF_YEAR,
)
# Create a temperature, humidity, and wind speed time series using a The Weather Company recipe.
khaf_temperature_ts = project.create_time_series_from_recipe(
recipe=the_weather_company.TheWeatherCompany(
metar_station=the_weather_company.MetarStation.KHAF,
field=the_weather_company.Field.TEMPERATURE,
)
)
ksfo_temperature_ts = project.create_time_series_from_recipe(
recipe=the_weather_company.TheWeatherCompany(
metar_station=the_weather_company.MetarStation.KSFO,
field=the_weather_company.Field.TEMPERATURE,
)
)
ksfo_humidity_ts = project.create_time_series_from_recipe(
recipe=the_weather_company.TheWeatherCompany(
metar_station=the_weather_company.MetarStation.KSFO,
field=the_weather_company.Field.RELATIVE_HUMIDITY,
)
)
ksfo_wind_speed_ts = project.create_time_series_from_recipe(
recipe=the_weather_company.TheWeatherCompany(
metar_station=the_weather_company.MetarStation.KSFO,
field=the_weather_company.Field.WIND_SPEED,
)
)
# Create a 6-hour rolling mean of the KHAF temperature time series.
rolling_mean_operation = project.create_operation(
title="Rolling PT6H Mean",
connector=time_transformations.TimeTransformations(
rolling_window_parameters=time_transformations.RollingWindowParameters(
window_period=myst.TimeDelta("PT6H"),
min_periods=1,
centered=False,
aggregation_function=time_transformations.AggregationFunction.MEAN,
),
),
)
rolling_mean_operation.create_input(time_series=khaf_temperature_ts, group_name=time_transformations.GroupName.OPERANDS)
khaf_temperature_rolling_6h_mean_ts = rolling_mean_operation.create_time_series(
title=f"{khaf_temperature_ts.title} - Rolling PT6H Mean", sample_period=SAMPLE_PERIOD
)
# Create a 48-hour lag of the DA LMP price at the node.
lag_operation = project.create_operation(
title="Lagged PT48H",
connector=time_transformations.TimeTransformations(
shift_parameters=time_transformations.ShiftParameters(shift_period=myst.TimeDelta("PT48H")),
),
)
lag_operation.create_input(time_series=dalmp_ts, group_name=time_transformations.GroupName.OPERANDS)
lagged_48hr_dalmp_ts = lag_operation.create_time_series(
title=f"{dalmp_ts.title} - Lagged PT48H", sample_period=SAMPLE_PERIOD
)
for title, objective, alpha in (
("Mean", "mean_squared_error", None),
("P5", "quantile", 0.05),
("P95", "quantile", 0.95),
):
# Create a LightGBM model.
model = project.create_model(
title=f"{title} Model",
connector=lightgbm.LightGBM(
objective=objective,
num_boost_round=500,
max_depth=8,
min_child_weight=40,
learning_rate=0.038,
alpha=alpha,
fit_on_null_values=True,
predict_on_null_values=True,
),
)
# Add the features and target to the model.
for feature in [
load_forecast_ts,
solar_forecast_ts,
wind_forecast_ts,
hour_of_day_time_series,
day_of_week_time_series,
day_of_year_time_series,
ksfo_temperature_ts,
ksfo_humidity_ts,
ksfo_wind_speed_ts,
khaf_temperature_ts,
khaf_temperature_rolling_6h_mean_ts,
lagged_48hr_dalmp_ts,
]:
model.create_input(time_series=feature, group_name=lightgbm.GroupName.FEATURES)
model.create_input(time_series=dalmp_ts, group_name=lightgbm.GroupName.TARGETS)
# Add a fit policy to the model.
model.create_fit_policy(
schedule_timing=MODEL_FIT_SCHEDULE_TIMING,
start_timing=MODEL_FIT_START_TIMING,
end_timing=MODEL_FIT_END_TIMING
)
# Create a time series with the model predictions.
forecast_ts = model.create_time_series(title=f"{title} Forecast", sample_period=SAMPLE_PERIOD)
# Add a run policy to the time series.
forecast_ts.create_run_policy(
schedule_timing=FORECAST_SCHEDULE_TIMING,
start_timing=FORECAST_START_TIMING,
end_timing=FORECAST_END_TIMING,
)
Deploy your NoTS
Client Library
The code below will deploy your project, creating a first model fit and time series run immediately.
# Deploy the project.
project.deploy("My Deployment")
# Create ad hoc time series node run job.
forecast_time_series = ["P95 Forecast", "Mean Forecast", "P5 Forecast"]
for time_series in project.list_nodes():
if time_series.title in forecast_time_series:
time_series_run_job = time_series.run(
start_timing=FORECAST_START_TIMING,
end_timing=FORECAST_END_TIMING,
)
Probabilistic Predictions with NGBoost
This tutorial builds a NoTS with a single NGBoost model, using a Laplace distribution. The NGBoost model outputs time series for the location and scale distribution parameters. Because the Laplace distribution is symmetric, the location time series can be interpreted as the forecasted mean. To generate the 5th and the 95th percentiles, we pass the location and scale time series into an operation connector that exposes the Laplace distribution percent-point function.
Create your NoTS
Client Library
import myst
from myst.connectors.model_connectors import ngboost
from myst.connectors.operation_connectors import laplace_distribution
from myst.connectors.operation_connectors import time_transformations
from myst.connectors.source_connectors import time_trends
from myst.connectors.source_connectors import yes_energy
from myst.recipes.time_series_recipes import the_weather_company
# Define the sample period used in this project.
SAMPLE_PERIOD = myst.TimeDelta("PT1H")
# Define the model fit policy to train on the most recent year of data.
MODEL_FIT_START_TIMING = myst.TimeDelta("-P1Y")
MODEL_FIT_END_TIMING = myst.TimeDelta("PT0H")
MODEL_FIT_SCHEDULE_TIMING = myst.TimeDelta("PT24H")
# Define the time series run policy to forecast the next 2 days once an hour.
FORECAST_START_TIMING = myst.TimeDelta("PT1H")
FORECAST_END_TIMING = myst.TimeDelta("PT49H")
FORECAST_SCHEDULE_TIMING = myst.TimeDelta("PT1H")
# Define the Yes Energy items of interest.
DALMP_ITEM = yes_energy.YesEnergyItem(datatype="DALMP", object_id=20000004194) # DLAP_PGAE-APND
LOAD_FORECAST_7DAY_ITEM = yes_energy.YesEnergyItem(datatype="LOAD_FORECAST_7DAY", object_id=10000328796) # PG&E
LOAD_FORECAST_2DAY_ITEM = yes_energy.YesEnergyItem(datatype="LOAD_FORECAST_2DAY", object_id=10000328796) # PG&E
SOLAR_FORECAST_ITEM = yes_energy.YesEnergyItem(datatype="SOLAR_FORECAST", object_id=10002494909) # NP15
WIND_FORECAST_ITEM = yes_energy.YesEnergyItem(datatype="WIND_FORECAST", object_id=10002494909) # NP15
def _get_label_indexer(yes_energy_item):
"""Returns the label indexer for a Yes Energy item."""
return f"{yes_energy_item.datatype}_{yes_energy_item.object_id}"
myst.authenticate()
# Create a new Project.
project = myst.Project.create(title="Prediction Intervals DA LMP")
# Create the Yes Energy target and feature time series.
yes_energy_source = project.create_source(
title="Yes Energy",
connector=yes_energy.YesEnergy(
items=[
DALMP_ITEM,
LOAD_FORECAST_7DAY_ITEM,
LOAD_FORECAST_2DAY_ITEM,
SOLAR_FORECAST_ITEM,
WIND_FORECAST_ITEM,
],
stat=yes_energy.YesEnergyAggregation.AVG,
),
)
dalmp_ts = yes_energy_source.create_time_series(
title="DA LMP",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=DALMP_ITEM),
)
load_forecast_7day_ts = yes_energy_source.create_time_series(
title="Load Forecast (7 day)",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=LOAD_FORECAST_7DAY_ITEM),
)
load_forecast_2day_ts = yes_energy_source.create_time_series(
title="Load Forecast (2 day)",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=LOAD_FORECAST_2DAY_ITEM),
)
solar_forecast_ts = yes_energy_source.create_time_series(
title="Solar Forecast",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=SOLAR_FORECAST_ITEM),
)
wind_forecast_ts = yes_energy_source.create_time_series(
title="Wind Forecast",
sample_period=SAMPLE_PERIOD,
label_indexer=_get_label_indexer(yes_energy_item=WIND_FORECAST_ITEM),
)
load_forecast_ts = project.create_time_series(title="Load Forecast", sample_period=SAMPLE_PERIOD)
load_forecast_ts.create_layer(node=load_forecast_2day_ts, order=0) # Use 2-day forecast where possible
load_forecast_ts.create_layer(node=load_forecast_7day_ts, order=1) # Otherwise 7-day
# Create an hour of day, day of week, and day of year time series from a time trends source.
time_trends_source = project.create_source(
title="Time Trends",
connector=time_trends.TimeTrends(
sample_period=SAMPLE_PERIOD,
time_zone="UTC",
fields=[
time_trends.Field.HOUR_OF_DAY,
time_trends.Field.DAY_OF_WEEK,
time_trends.Field.DAY_OF_YEAR,
],
),
)
hour_of_day_time_series = time_trends_source.create_time_series(
title="Hour of Day",
sample_period=SAMPLE_PERIOD,
label_indexer=time_trends.Field.HOUR_OF_DAY,
)
day_of_week_time_series = time_trends_source.create_time_series(
title="Day of Week",
sample_period=SAMPLE_PERIOD,
label_indexer=time_trends.Field.DAY_OF_WEEK,
)
day_of_year_time_series = time_trends_source.create_time_series(
title="Day of Year",
sample_period=SAMPLE_PERIOD,
label_indexer=time_trends.Field.DAY_OF_YEAR,
)
# Create a temperature, humidity, and wind speed time series using a The Weather Company recipe.
khaf_temperature_ts = project.create_time_series_from_recipe(
recipe=the_weather_company.TheWeatherCompany(
metar_station=the_weather_company.MetarStation.KHAF,
field=the_weather_company.Field.TEMPERATURE,
)
)
ksfo_temperature_ts = project.create_time_series_from_recipe(
recipe=the_weather_company.TheWeatherCompany(
metar_station=the_weather_company.MetarStation.KSFO,
field=the_weather_company.Field.TEMPERATURE,
)
)
ksfo_humidity_ts = project.create_time_series_from_recipe(
recipe=the_weather_company.TheWeatherCompany(
metar_station=the_weather_company.MetarStation.KSFO,
field=the_weather_company.Field.RELATIVE_HUMIDITY,
)
)
ksfo_wind_speed_ts = project.create_time_series_from_recipe(
recipe=the_weather_company.TheWeatherCompany(
metar_station=the_weather_company.MetarStation.KSFO,
field=the_weather_company.Field.WIND_SPEED,
)
)
# Create a 6-hour rolling mean of the KHAF temperature time series.
rolling_mean_operation = project.create_operation(
title="Rolling PT6H Mean",
connector=time_transformations.TimeTransformations(
rolling_window_parameters=time_transformations.RollingWindowParameters(
window_period=myst.TimeDelta("PT6H"),
min_periods=1,
centered=False,
aggregation_function=time_transformations.AggregationFunction.MEAN,
),
),
)
rolling_mean_operation.create_input(
time_series=khaf_temperature_ts,
group_name=time_transformations.GroupName.OPERANDS,
)
khaf_temperature_rolling_6h_mean_ts = rolling_mean_operation.create_time_series(
title=f"{khaf_temperature_ts.title} - Rolling PT6H Mean",
sample_period=SAMPLE_PERIOD,
)
# Create a 48-hour lag of the DA LMP price at the node.
lag_operation = project.create_operation(
title="Lagged PT48H",
connector=time_transformations.TimeTransformations(
shift_parameters=time_transformations.ShiftParameters(shift_period=myst.TimeDelta("PT48H")),
),
)
lag_operation.create_input(time_series=dalmp_ts, group_name=time_transformations.GroupName.OPERANDS)
lagged_48hr_dalmp_ts = lag_operation.create_time_series(
title=f"{dalmp_ts.title} - Lagged PT48H", sample_period=SAMPLE_PERIOD
)
# Create a NGBoost model.
model = project.create_model(
title="NGBoost Laplace Model",
connector=ngboost.NGBoost(
distribution=ngboost.NGBoostDistribution.LAPLACE,
n_estimators=500,
learning_rate=0.038,
max_depth=8,
),
)
# Add the features and target to the model.
for feature in [
load_forecast_ts,
solar_forecast_ts,
wind_forecast_ts,
hour_of_day_time_series,
day_of_week_time_series,
day_of_year_time_series,
ksfo_temperature_ts,
ksfo_humidity_ts,
ksfo_wind_speed_ts,
khaf_temperature_ts,
khaf_temperature_rolling_6h_mean_ts,
lagged_48hr_dalmp_ts,
]:
model.create_input(time_series=feature, group_name=ngboost.GroupName.FEATURES)
model.create_input(time_series=dalmp_ts, group_name=ngboost.GroupName.TARGETS)
# Add a fit policy to the model.
model.create_fit_policy(
schedule_timing=MODEL_FIT_SCHEDULE_TIMING,
start_timing=MODEL_FIT_START_TIMING,
end_timing=MODEL_FIT_END_TIMING,
)
# Create time series for the model forecasts of location and scale parameters.
forecast_loc_ts = model.create_time_series(
title="NGBoost Laplace Forecast Location",
sample_period=SAMPLE_PERIOD,
label_indexer=ngboost.LabelIndexer.LOC,
)
forecast_scale_ts = model.create_time_series(
title="NGBoost Laplace Forecast Scale",
sample_period=SAMPLE_PERIOD,
label_indexer=ngboost.LabelIndexer.SCALE,
)
# Create time series for lower and upper bounds.
lower_bound_operation = project.create_operation(
title="Laplace Forecast P5",
connector=laplace_distribution.LaplaceDistribution(function=laplace_distribution.Function.PPF, value=0.05),
)
lower_bound_operation.create_input(time_series=forecast_loc_ts, group_name=laplace_distribution.GroupName.LOC)
lower_bound_operation.create_input(time_series=forecast_scale_ts, group_name=laplace_distribution.GroupName.SCALE)
forecasts_lower_bound = lower_bound_operation.create_time_series(
title=lower_bound_operation.title, sample_period=SAMPLE_PERIOD
)
upper_bound_operation = project.create_operation(
title="Laplace Forecast P95",
connector=laplace_distribution.LaplaceDistribution(function=laplace_distribution.Function.PPF, value=0.95),
)
upper_bound_operation.create_input(time_series=forecast_loc_ts, group_name=laplace_distribution.GroupName.LOC)
upper_bound_operation.create_input(time_series=forecast_scale_ts, group_name=laplace_distribution.GroupName.SCALE)
forecasts_upper_bound = upper_bound_operation.create_time_series(
title=upper_bound_operation.title, sample_period=SAMPLE_PERIOD
)
# Add a run policy to the time series.
for ts in (forecast_loc_ts, forecasts_lower_bound, forecasts_upper_bound):
ts.create_run_policy(
schedule_timing=FORECAST_SCHEDULE_TIMING,
start_timing=FORECAST_START_TIMING,
end_timing=FORECAST_END_TIMING,
)
Deploy your NoTS
Client Library
The code below will deploy your project, creating a first model fit and time series run immediately.
# Deploy the project.
project.deploy("My Deployment")
# Create ad hoc time series node run.
for ts in (forecast_loc_ts, forecasts_lower_bound, forecasts_upper_bound):
time_series_run_job = time_series.run(
start_timing=FORECAST_START_TIMING,
end_timing=FORECAST_END_TIMING,
)
Updated almost 2 years ago