MLP Regression
This topic explains how the MLP Regression model works
MLP Regression
The MLP Regression model provides a PyTorch implementation of a multilayer perceptron model. MLP models are also referred to as "vanilla" artificial neural networks (ANNs) or deep neural networks (DNNs) that consist of one or more hidden layers chained together with non-linear activation functions (the Platform uses the ReLU function). This model type can be a good choice for modeling non-linear functions.
Inputs
The MLP Regression model takes in categorical and continuous features separately. The model will learn embeddings for each categorical feature value, and those embeddings will be used in the input layer of the MLP. The continuous features are standardized within the model implementation.
The MLP Regression model does not support categorical feature values that were not seen during training. For many common categorical features, such as hour of day or day of week, all feature values will be seen in even a modest number of training samples. However, certain categorical features such as day or week of year, have rarer values (think February 29) that may not show up even in several years of training data. We recommend using a modulo operator on such features to ensure that those rare values are mapped to values in the expected range of the categorical feature. You can see how this is done in the example graph below.
Outputs
The MLP Regression model takes a single target series to train on and outputs a single target series. The target data is standardized for training, and the standardization is undone at prediction time, so the outputs you see are in the same range as the outputs provided.
Training and Convergence
The MLP Regression model trains using Adagrad and trains for at most the maximum number of epochs specified in the model parameterization. The connector uses Kaiming normalization and it also provides a mechanism for early stopping based on training loss. This early stopping mechanism is controlled by the minimum_relative_training_loss_change
and patience
parameters.
The connector computes the average training loss across batches in each epoch and tracks the change in average training loss between consecutive epochs. If the relative change in training loss is less than minimum_relative_training_loss_change
for patience
number of epochs in a row, training will stop.
It can be difficult to automatically determine how long a model needs to train before it is converged or how to parameterize early stopping without visualizing the loss curves. To that end, training loss logs that can be visualized with TensorBoard are coming soon!
Backtesting
MLP models tend to take longer to fit than linear or tree models. When backtesting the MLP Regression model, we currently limit the number of fits to once weekly in a year. In practice, we recommend fitting the MLP Regression model once monthly.
Parameters
Parameter | Description | Default Value |
---|---|---|
batch_size | The number of examples in a training batch. | 512 |
max_training_epochs | The maximum number of training epochs to run training for. | 30 |
min_relative_loss_change | The minimum relative change in batch-averaged training loss across epochs to trigger early stopping. For example, a value of 0.002 triggers early stopping when the current epoch's batch-averaged training loss improvement has been < 0.2% of the previous epoch's loss for patience number of epochs. A value <= 0 triggers early stopping only when training loss increases across epochs. | 0 |
patience | The maximum number of epochs in a row to observe a relative train loss change less thanmin_relative_loss_change . Training will stop after this many epochs of no train loss improvement. Setting this equal to max_training_epochs disables early stopping. | 2 |
embedding_dimension | The number of dimensions to use for each categorical feature's embedding. | 8 |
hidden_layer_dimensions | The dimensions of the hidden layers in the multi-layer perceptron. For example (128,) indicates just one layer of size 128; (256, 128) indicates the first hidden layer has size 256 and the second has size 128. | (128,) |
dropout_rate | The probability of dropping out any dimension in a layer. Dropout is applied to the activations of each hidden layer with this probability. | 0 |
learning_rate | The learning rate used by the Adagrad optimizer to control the size of the update steps during training. | 1e-3 |
weight_decay | The weight decay parameter (L2 penalty) used by the Adagrad optimizer. | 0 |
Tutorial
This example shows how to build a simple demand forecasting model using the MLP Regression model.
import myst
import numpy as np
from myst.connectors.model_connectors import mlp_regression
from myst.connectors.source_connectors import time_trends
from myst.connectors.operation_connectors import numerical_expression
from myst.recipes.time_series_recipes import the_weather_company
myst.authenticate()
# Create a new project.
project = myst.Project.create(title="MLP Regression")
# Create an hour of day and day of year time series from a time trends source.
time_trends_source = project.create_source(
title="Time Trends",
connector=time_trends.TimeTrends(
sample_period=myst.TimeDelta("PT1H"),
time_zone="UTC",
fields=[
time_trends.Field.HOUR_OF_DAY,
time_trends.Field.DAY_OF_YEAR,
],
),
)
hour_of_day_time_series = time_trends_source.create_time_series(
title="Hour of Day",
sample_period=myst.TimeDelta("PT1H"),
label_indexer=time_trends.Field.HOUR_OF_DAY,
)
day_of_year_time_series = time_trends_source.create_time_series(
title="Day of Year",
sample_period=myst.TimeDelta("PT1H"),
label_indexer=time_trends.Field.DAY_OF_YEAR,
)
# Create a time series that's the day of year modulo 365, so that day 366 (on leap years), is mapped to day 1.
day_of_year_mod_operation = project.create_operation(
title=f"{day_of_year_time_series.title} % 365",
connector=numerical_expression.NumericalExpression(
variable_names=["day_of_year"], math_expression="day_of_year % 365",
),
)
day_of_year_mod_operation.create_input(
time_series=day_of_year_time_series, group_name="day_of_year",
)
day_of_year_mod_time_series = day_of_year_mod_operation.create_time_series(
title=f"{day_of_year_time_series.title} % 365", sample_period=myst.TimeDelta("PT1H")
)
# Create a temperature time series using a The Weather Company recipe.
temperature_time_series = project.create_time_series_from_recipe(
recipe=the_weather_company.TheWeatherCompany(
metar_station=the_weather_company.MetarStation.KSFO,
field=the_weather_company.Field.TEMPERATURE,
)
)
# Create a target time series and insert random data.
# TODO: Replace this with your historical demand data.
target_time_series = project.create_time_series(
title="Historical Demand", sample_period=myst.TimeDelta("PT1H")
)
target_time_series.insert_time_array(
time_array=myst.TimeArray(
sample_period=myst.TimeDelta("PT1H"),
start_time=myst.Time("2021-03-01T00:00:00Z"),
end_time=myst.Time("2022-03-01T00:00:00Z"),
as_of_time=myst.Time("2022-03-15T00:00:00Z"),
values=np.random.random(365 * 24),
)
)
# Create an MLP Regression model.
model = project.create_model(
title="Demand Model",
connector=mlp_regression.MLPRegression(
max_training_epochs=50,
min_relative_loss_change=0.002,
patience=3,
batch_size=512,
embedding_dimension=8,
hidden_layer_dimensions=(128,),
dropout_rate=0.2,
learning_rate=1e-3,
weight_decay=0,
)
)
# Add the time series as inputs to the model.
model.create_input(hour_of_day_time_series, group_name=mlp_regression.GroupName.CATEGORICAL_FEATURES)
model.create_input(day_of_year_mod_time_series, group_name=mlp_regression.GroupName.CATEGORICAL_FEATURES)
model.create_input(temperature_time_series, group_name=mlp_regression.GroupName.CONTINUOUS_FEATURES)
model.create_input(target_time_series, group_name=mlp_regression.GroupName.TARGETS)
# Add a fit policy to the model.
model.create_fit_policy(
start_timing=myst.Time("2021-03-01T00:00:00Z"),
end_timing=myst.Time("2022-03-01T00:00:00Z"),
schedule_timing=myst.TimeDelta("PT1H"),
)
# Create a time series with the model predictions.
forecast_time_series = model.create_time_series(
title="Demand Forecast", sample_period=myst.TimeDelta("PT1H")
)
# Add a run policy to the time series.
forecast_time_series.create_run_policy(
start_timing=myst.TimeDelta("PT1H"),
end_timing=myst.TimeDelta("PT169H"),
schedule_timing=myst.TimeDelta("PT1H"),
)
Updated almost 2 years ago