Create a Solar Generation Forecast

This tutorial will walk you through how to create a forecast for solar generation

📘

Tutorial Overview

In this tutorial, we will explain how to create and deploy a Network of Time Series (NoTS) that generates a solar generation forecast. The resulting forecast will be generated every hour on an ongoing basis.

Overview

This tutorial goes through steps to create a solar generation forecast. The NoTS includes features and methods we have found to be impactful including lagged solar production data, rolling and interpolated weather data, solar data, and time trends (see more below). These features are wrapped into an XGBoost model that predicts generation every 15 minutes.

Lagged Historical Production

This tutorial uses a 24-hour lag of solar production as a feature – so that the graph can leverage the previous day's production data to generate forecasts.

The method below shows how to create the shifted time series. The result of calling this method will be the original time series flowing into a shift operation node, which then flows into a terminal time series to hold the output of the shift operation.

def create_shifted_time_series(time_series: myst.TimeSeries, shift_period: myst.TimeDelta) -> myst.TimeSeries:
    shift_operation = project.create_operation(
        title=f"{shift_period} Shift",
        connector=time_transformations.TimeTransformations(
            shift_parameters=time_transformations.ShiftParameters(shift_period=shift_period)
        ),
    )
    shift_operation.create_input(time_series, group_name=time_transformations.GroupName.OPERANDS)

    return shift_operation.create_time_series(
        title=f"{time_series.title} [{shift_period} Shifted]", sample_period=SAMPLE_PERIOD
    )

Interpolated Weather Data

The NoTS in this tutorial generates predictions every 15 minutes. Therefore, we use interpolation to resample AG2 weather, which is currently available only at the hourly level, to 15 minute intervals. The method below shows how this is accomplished.

def create_interpolated_time_series(time_series: myst.TimeSeries) -> myst.TimeSeries:
    interpolate_operation = project.create_operation(
        title=f"PT15M Interpolate",
        connector=resampling.Resampling(
            sample_period=SAMPLE_PERIOD,
            resampling_function=resampling.ResamplingFunction.INTERPOLATE,
        ),
    )
    interpolate_operation.create_input(time_series, group_name=time_transformations.GroupName.OPERANDS)

    return interpolate_operation.create_time_series(title=time_series.title, sample_period=SAMPLE_PERIOD)

Rolling Weather Features

The method below creates a time series with the rolling average of the input time series. This is useful for features, such as weather, that can fluctuate greatly between samples. This graph includes weather features such as cloud cover and humidity, which can impact solar generation.

def create_rolling_time_series(time_series: myst.TimeSeries, rolling_period: myst.TimeDelta) -> myst.TimeSeries:
    rolling_operation = project.create_operation(
        title=f"{rolling_period} Rolling Average",
        connector=time_transformations.TimeTransformations(
            rolling_window_parameters=time_transformations.RollingWindowParameters(
                window_period=rolling_period,
                min_periods=1,
                centered=False,
                aggregation_function=time_transformations.AggregationFunction.MEAN,
            )
        ),
    )
    rolling_operation.create_input(time_series, group_name=time_transformations.GroupName.OPERANDS)

    return rolling_operation.create_time_series(
        title=f"{time_series.title} [{rolling_period} Rolling Average]", sample_period=SAMPLE_PERIOD
    )

Solar Data

Solar data, such as elevation and azimuth is included in the forecast. This data is easily accessible in the Myst platform. The script below pulls this data into the forecast.

solar_position_source = project.create_source(
    title="Solar Position",
    connector=solar_position.SolarPosition(
        sample_period=SAMPLE_PERIOD, 
        latitude=43.6249, 
        longitude=-72.3086, 
        fields=[
            solar_position.Field.ELEVATION,
            solar_position.Field.AZIMUTH,
        ]
    ),
)
solar_elevation_ts = solar_position_source.create_time_series(
    title="Solar Elevation", sample_period=SAMPLE_PERIOD, label_indexer=solar_position.Field.ELEVATION
)
solar_azimuth_ts = solar_position_source.create_time_series(
    title="Solar Azimuth", sample_period=SAMPLE_PERIOD, label_indexer=solar_position.Field.AZIMUTH
)

Time Trends

We have found epoch to be a particularly useful feature when predicting solar production. It can help encode changes to solar production over time from impacts such as dust. Day of the year and hour of the day are more obvious: there are regular times when the sun shines more than others.

# Create multiple time series from a time trends source.
time_trends_source = project.create_source(
    title="Time Trends",
    connector=time_trends.TimeTrends(
        sample_period=SAMPLE_PERIOD,
        time_zone=TIME_ZONE,
        fields=[
            time_trends.Field.HOUR_OF_DAY,
            time_trends.Field.DAY_OF_YEAR,
            time_trends.Field.EPOCH,
        ],
    ),
)
hour_of_day_ts = time_trends_source.create_time_series(
    title="Hour of Day", sample_period=SAMPLE_PERIOD, label_indexer=time_trends.Field.HOUR_OF_DAY
)
day_of_year_ts = time_trends_source.create_time_series(
    title="Day of Year", sample_period=SAMPLE_PERIOD, label_indexer=time_trends.Field.DAY_OF_YEAR
)
epoch_ts = time_trends_source.create_time_series(
    title="Epoch", sample_period=SAMPLE_PERIOD, label_indexer=time_trends.Field.EPOCH
)

Putting It All Together

The script below contains all code needed to build the solar generation graph.

import myst
from myst.connectors.model_connectors import xgboost
from myst.connectors.source_connectors import time_trends
from myst.connectors.source_connectors import solar_position
from myst.connectors.operation_connectors import resampling
from myst.connectors.operation_connectors import time_transformations
from myst.recipes.time_series_recipes import the_weather_company

SAMPLE_PERIOD = myst.TimeDelta("PT15M")
RUN_START_TIME = myst.TimeDelta("PT15M")
RUN_END_TIME = myst.TimeDelta("PT24H")
TIME_ZONE = "America/New_York"

myst.authenticate()


def create_shifted_time_series(time_series: myst.TimeSeries, shift_period: myst.TimeDelta) -> myst.TimeSeries:
    shift_operation = project.create_operation(
        title=f"{shift_period} Shift",
        connector=time_transformations.TimeTransformations(
            shift_parameters=time_transformations.ShiftParameters(shift_period=shift_period)
        ),
    )
    shift_operation.create_input(time_series, group_name=time_transformations.GroupName.OPERANDS)

    return shift_operation.create_time_series(
        title=f"{time_series.title} [{shift_period} Shifted]", sample_period=SAMPLE_PERIOD
    )


def create_rolling_time_series(time_series: myst.TimeSeries, rolling_period: myst.TimeDelta) -> myst.TimeSeries:
    rolling_operation = project.create_operation(
        title=f"{rolling_period} Rolling Average",
        connector=time_transformations.TimeTransformations(
            rolling_window_parameters=time_transformations.RollingWindowParameters(
                window_period=rolling_period,
                min_periods=1,
                centered=False,
                aggregation_function=time_transformations.AggregationFunction.MEAN,
            )
        ),
    )
    rolling_operation.create_input(time_series, group_name=time_transformations.GroupName.OPERANDS)

    return rolling_operation.create_time_series(
        title=f"{time_series.title} [{rolling_period} Rolling Average]", sample_period=SAMPLE_PERIOD
    )


def create_interpolated_time_series(time_series: myst.TimeSeries) -> myst.TimeSeries:
    interpolate_operation = project.create_operation(
        title=f"PT15M Interpolate",
        connector=resampling.Resampling(
            sample_period=SAMPLE_PERIOD,
            resampling_function=resampling.ResamplingFunction.INTERPOLATE,
        ),
    )
    interpolate_operation.create_input(time_series, group_name=time_transformations.GroupName.OPERANDS)

    return interpolate_operation.create_time_series(title=time_series.title, sample_period=SAMPLE_PERIOD)


project = myst.Project.create(title="Solar Example")

# Create a time series that will contain the historical solar data.
# You will insert data into this time series with time_series.insert_time_array
historical_solar_ts = project.create_time_series(title="Historical Solar Generation", sample_period=SAMPLE_PERIOD)

# Create a transformation of the historical solar time series.
shifted_historical_solar_24h_ts = create_shifted_time_series(
    time_series=historical_solar_ts, shift_period=myst.TimeDelta("PT24H")
)

# Create multiple time series from a time trends source.
time_trends_source = project.create_source(
    title="Time Trends",
    connector=time_trends.TimeTrends(
        sample_period=SAMPLE_PERIOD,
        time_zone=TIME_ZONE,
        fields=[
            time_trends.Field.HOUR_OF_DAY,
            time_trends.Field.DAY_OF_YEAR,
            time_trends.Field.EPOCH,
        ],
    ),
)
hour_of_day_ts = time_trends_source.create_time_series(
    title="Hour of Day", sample_period=SAMPLE_PERIOD, label_indexer=time_trends.Field.HOUR_OF_DAY
)
day_of_year_ts = time_trends_source.create_time_series(
    title="Day of Year", sample_period=SAMPLE_PERIOD, label_indexer=time_trends.Field.DAY_OF_YEAR
)

# We've found epoch to be useful because solar generation can change over time
epoch_ts = time_trends_source.create_time_series(
    title="Epoch", sample_period=SAMPLE_PERIOD, label_indexer=time_trends.Field.EPOCH
)

# Create two time series from a solar position source.
# It uses a generic METAR station now, you would update lat/long as needed
solar_position_source = project.create_source(
    title="Solar Position",
    connector=solar_position.SolarPosition(
        sample_period=SAMPLE_PERIOD, 
        latitude=43.6249, 
        longitude=-72.3086, 
        fields=[
            solar_position.Field.ELEVATION,
            solar_position.Field.AZIMUTH,
        ]
    ),
)
solar_elevation_ts = solar_position_source.create_time_series(
    title="Solar Elevation", sample_period=SAMPLE_PERIOD, label_indexer=solar_position.Field.ELEVATION
)
solar_azimuth_ts = solar_position_source.create_time_series(
    title="Solar Azimuth", sample_period=SAMPLE_PERIOD, label_indexer=solar_position.Field.AZIMUTH
)

# Create multiple time series containing weather signals.
# This has a generic METAR station. You would update it as necessary
kleb_humidity_ts = create_interpolated_time_series(
    time_series=project.create_time_series_from_recipe(
        recipe=the_weather_company.TheWeatherCompany(
            metar_station=the_weather_company.MetarStation.KLEB, field=the_weather_company.Field.RELATIVE_HUMIDITY
        )
    )
)
kleb_cloud_coverage_ts = create_interpolated_time_series(
    time_series=project.create_time_series_from_recipe(
        recipe=the_weather_company.TheWeatherCompany(
            metar_station=the_weather_company.MetarStation.KLEB, field=the_weather_company.Field.CLOUD_COVERAGE
        )
    )
)

# Create rolling averages of cloud coverage.
rolling_5h_kleb_cloud_coverage_ts = create_rolling_time_series(
    time_series=kleb_cloud_coverage_ts, rolling_period="PT5H"
)

# Create rolling averages of humidity.
rolling_5h_kleb_humidity_ts = create_rolling_time_series(
    time_series=kleb_humidity_ts, rolling_period="PT5H"
)

# Create an XGBoost model and add the features and target.
model = project.create_model(
    title="Model", connector=xgboost.XGBoost(num_boost_round=249, max_depth=4, min_child_weight=100, learning_rate=0.0276)
)
for time_series in [
    # Time features
    day_of_year_ts,
    epoch_ts,
    hour_of_day_ts,
    # Shifted solar features
    shifted_historical_solar_24h_ts, 
    # Solar elevation feature
    solar_azimuth_ts,
    solar_elevation_ts,   
    # Cloud coverage features
    kleb_cloud_coverage_ts,
    rolling_5h_kleb_cloud_coverage_ts, 
    # Humidity features
    kleb_humidity_ts,
    rolling_5h_kleb_humidity_ts,
]:
    model.create_input(time_series, group_name=xgboost.GroupName.FEATURES)
model.create_input(historical_solar_ts, group_name=xgboost.GroupName.TARGETS)

# Add a fit policy to the model.
model.create_fit_policy(
    start_timing=myst.TimeDelta("-P1Y"), 
    end_timing=myst.TimeDelta("-PT15M"), 
    schedule_timing=myst.TimeDelta("PT24H")
)

# Create a time series with for the model predictions.
forecast_ts = model.create_time_series(title="Forecast", sample_period=SAMPLE_PERIOD)

# Add a run policy to the time series.
forecast_ts.create_run_policy(
    start_timing=RUN_START_TIME,
    end_timing=RUN_END_TIME,
    schedule_timing=SAMPLE_PERIOD,
)

Insert Dummy Data

Use the following code to insert dummy data into the historical solar time series – so you can observe how data flows through your NoTS and generates forecasts.

Note: Insert this data before deploying to ensure the model has data available for training.

import numpy as np
import pandas as pd

# Create the dummy data array.
data = np.arange(0, 48)
data = (
    np.tile(np.concatenate((data, np.flip(data))), 365) 
    + (data * np.random.normal(scale=0.1, size=data.size))
)
data = np.where(data < 0, 0, data)

# Create a pandas timestamp for the current time.
now = pd.Timestamp.utcnow().floor('15T')

# Insert the time array into the source time series.
historical_solar_ts.insert_time_array(
    time_array=myst.TimeArray(
        sample_period=SAMPLE_PERIOD,
        start_time=now - pd.Timedelta(days=365),
        end_time=now,
        as_of_time=now,
        values=data,
    )
)

Deploy your NoTS

Once you've finished creating your NoTS, you can go ahead and deploy your Project either through the Web Application or through the client library.

Web Application

To create a new Deployment, click the Deploy button in the top right corner of the Project Create page. Specify a title for your Deployment and then click the Deploy button.

Once you’ve deployed a project, your Model Fit Policy and Time Series Run Policy will begin to run according to their schedules. This means that your Model will be fitted every day and your Time Series will be run every 15 minutes. Note that these Policies will run until you deactivate your Deployment.

Your model will also fit immediately on deployment to ensure that any predictions scheduled before your first scheduled fit succeed. If you would like to generate predictions before your first predictions are scheduled to generate, you can create an ad hoc run from your forecast time series on the Create page (click "Run node" from the overflow menu).

To track and verify the results of your Deployment, navigate to the Project Monitor space by clicking on the Monitor tab at the top of the Project page. The results table shows a list of ongoing results that are being generated by your fit and run policies. You can refresh the table by clicking on the refresh icon.

Client Library

The code below will deploy your project, creating a first model fit and time series run immediately.

project.deploy(title="My Deployment")

forecast_ts.run(
    start_timing=myst.TimeDelta("PT1H"),
    end_timing=myst.TimeDelta("PT49H"),
)