MLP Regression

This topic explains how the MLP Regression model works

MLP Regression

The MLP Regression model provides a PyTorch implementation of a multilayer perceptron model. MLP models are also referred to as "vanilla" artificial neural networks (ANNs) or deep neural networks (DNNs) that consist of one or more hidden layers chained together with non-linear activation functions (the Platform uses the ReLU function). This model type can be a good choice for modeling non-linear functions.

Inputs

The MLP Regression model takes in categorical and continuous features separately. The model will learn embeddings for each categorical feature value, and those embeddings will be used in the input layer of the MLP. The continuous features are standardized within the model implementation.

The MLP Regression model does not support categorical feature values that were not seen during training. For many common categorical features, such as hour of day or day of week, all feature values will be seen in even a modest number of training samples. However, certain categorical features such as day or week of year, have rarer values (think February 29) that may not show up even in several years of training data. We recommend using a modulo operator on such features to ensure that those rare values are mapped to values in the expected range of the categorical feature. You can see how this is done in the example graph below.

Outputs

The MLP Regression model takes a single target series to train on and outputs a single target series. The target data is standardized for training, and the standardization is undone at prediction time, so the outputs you see are in the same range as the outputs provided.

Training and Convergence

The MLP Regression model trains using Adagrad and trains for at most the maximum number of epochs specified in the model parameterization. The connector uses Kaiming normalization and it also provides a mechanism for early stopping based on training loss. This early stopping mechanism is controlled by the minimum_relative_training_loss_change and patience parameters.

The connector computes the average training loss across batches in each epoch and tracks the change in average training loss between consecutive epochs. If the relative change in training loss is less than minimum_relative_training_loss_change for patience number of epochs in a row, training will stop.

It can be difficult to automatically determine how long a model needs to train before it is converged or how to parameterize early stopping without visualizing the loss curves. To that end, training loss logs that can be visualized with TensorBoard are coming soon!

❗️

Backtesting

MLP models tend to take longer to fit than linear or tree models. When backtesting the MLP Regression model, we currently limit the number of fits to once weekly in a year. In practice, we recommend fitting the MLP Regression model once monthly.

Parameters

ParameterDescriptionDefault Value
batch_sizeThe number of examples in a training batch.512
max_training_epochsThe maximum number of training epochs to run training for.30
min_relative_loss_changeThe minimum relative change in batch-averaged training loss across epochs to trigger early stopping. For example, a value of 0.002 triggers early stopping when the current epoch's batch-averaged training loss improvement has been < 0.2% of the previous epoch's loss for patience number of epochs. A value <= 0 triggers early stopping only when training loss increases across epochs.0
patienceThe maximum number of epochs in a row to observe a relative train loss change less than
min_relative_loss_change. Training will stop after this many epochs of no train loss improvement. Setting this equal to max_training_epochs disables early stopping.
2
embedding_dimensionThe number of dimensions to use for each categorical feature's embedding.8
hidden_layer_dimensionsThe dimensions of the hidden layers in the multi-layer perceptron. For example (128,) indicates just one layer of size 128; (256, 128) indicates the first hidden layer has size 256 and the second has size 128.(128,)
dropout_rateThe probability of dropping out any dimension in a layer. Dropout is applied to the activations of each hidden layer with this probability.0
learning_rateThe learning rate used by the Adagrad optimizer to control the size of the update steps during training.1e-3
weight_decayThe weight decay parameter (L2 penalty) used by the Adagrad optimizer.0

Tutorial

This example shows how to build a simple demand forecasting model using the MLP Regression model.

import myst
import numpy as np
from myst.connectors.model_connectors import mlp_regression
from myst.connectors.source_connectors import time_trends
from myst.connectors.operation_connectors import numerical_expression
from myst.recipes.time_series_recipes import the_weather_company

myst.authenticate()

# Create a new project.
project = myst.Project.create(title="MLP Regression")

# Create an hour of day and day of year time series from a time trends source.
time_trends_source = project.create_source(
    title="Time Trends",
    connector=time_trends.TimeTrends(
        sample_period=myst.TimeDelta("PT1H"),
        time_zone="UTC",
        fields=[
            time_trends.Field.HOUR_OF_DAY,
            time_trends.Field.DAY_OF_YEAR,
        ],
    ),
)
hour_of_day_time_series = time_trends_source.create_time_series(
    title="Hour of Day",
    sample_period=myst.TimeDelta("PT1H"),
    label_indexer=time_trends.Field.HOUR_OF_DAY,
)
day_of_year_time_series = time_trends_source.create_time_series(
    title="Day of Year",
    sample_period=myst.TimeDelta("PT1H"),
    label_indexer=time_trends.Field.DAY_OF_YEAR,
)

# Create a time series that's the day of year modulo 365, so that day 366 (on leap years), is mapped to day 1.
day_of_year_mod_operation = project.create_operation(
    title=f"{day_of_year_time_series.title} % 365",
    connector=numerical_expression.NumericalExpression(
        variable_names=["day_of_year"], math_expression="day_of_year % 365",
    ),
)
day_of_year_mod_operation.create_input(
    time_series=day_of_year_time_series, group_name="day_of_year",
)
day_of_year_mod_time_series = day_of_year_mod_operation.create_time_series(
    title=f"{day_of_year_time_series.title} % 365", sample_period=myst.TimeDelta("PT1H")
)

# Create a temperature time series using a The Weather Company recipe.
temperature_time_series = project.create_time_series_from_recipe(
    recipe=the_weather_company.TheWeatherCompany(
        metar_station=the_weather_company.MetarStation.KSFO,
        field=the_weather_company.Field.TEMPERATURE,
    )
)

# Create a target time series and insert random data.
# TODO: Replace this with your historical demand data.
target_time_series = project.create_time_series(
    title="Historical Demand", sample_period=myst.TimeDelta("PT1H")
)
target_time_series.insert_time_array(
    time_array=myst.TimeArray(
        sample_period=myst.TimeDelta("PT1H"),
        start_time=myst.Time("2021-03-01T00:00:00Z"),
        end_time=myst.Time("2022-03-01T00:00:00Z"),
        as_of_time=myst.Time("2022-03-15T00:00:00Z"),
        values=np.random.random(365 * 24),
    )
)

# Create an MLP Regression model.
model = project.create_model(
    title="Demand Model",
    connector=mlp_regression.MLPRegression(
        max_training_epochs=50,
        min_relative_loss_change=0.002,
        patience=3,
        batch_size=512,
        embedding_dimension=8,
        hidden_layer_dimensions=(128,),
        dropout_rate=0.2,
        learning_rate=1e-3,
        weight_decay=0,
    )
)

# Add the time series as inputs to the model.
model.create_input(hour_of_day_time_series, group_name=mlp_regression.GroupName.CATEGORICAL_FEATURES)
model.create_input(day_of_year_mod_time_series, group_name=mlp_regression.GroupName.CATEGORICAL_FEATURES)
model.create_input(temperature_time_series, group_name=mlp_regression.GroupName.CONTINUOUS_FEATURES)
model.create_input(target_time_series, group_name=mlp_regression.GroupName.TARGETS)

# Add a fit policy to the model.
model.create_fit_policy(
    start_timing=myst.Time("2021-03-01T00:00:00Z"),
    end_timing=myst.Time("2022-03-01T00:00:00Z"),
    schedule_timing=myst.TimeDelta("PT1H"),
)

# Create a time series with the model predictions.
forecast_time_series = model.create_time_series(
    title="Demand Forecast", sample_period=myst.TimeDelta("PT1H")
)

# Add a run policy to the time series.
forecast_time_series.create_run_policy(
    start_timing=myst.TimeDelta("PT1H"),
    end_timing=myst.TimeDelta("PT169H"),
    schedule_timing=myst.TimeDelta("PT1H"),
)