Time Series Forecasting with CNNs in PyTorch and TensorFlow
Convolutional Neural Networks (CNNs) are commonly used for image recognition tasks, but recent advances have shown that CNNs can also be effective for time series forecasting. This approach leverages the ability of CNNs to extract patterns from sequences of data, offering a powerful alternative to traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks. In this article, we’ll explore how CNNs can be adapted for time series forecasting and compare them with traditional models like RNNs and LSTMs.
Table of Contents
- Why Use CNNs for Time Series Forecasting?
- CNN Architecture for Time Series Forecasting
- CNN Model for Time Series Forecasting in TensorFlow
- Evaluation Metrics Beyond MSE
- Hyperparameter Tuning
- CNN Model for Time Series Forecasting in PyTorch
- Best Practices and Final Thoughts
- Conclusion
1. Why Use CNNs for Time Series Forecasting?
CNNs are traditionally associated with image processing, but they have been successfully adapted for time series data because of their ability to learn features through convolutional layers. The advantages of using CNNs for time series forecasting include:
- Parallel Processing: Unlike RNNs and LSTMs, CNNs don’t need to maintain a sequence order and can process multiple parts of the data in parallel.
- Efficiency: CNNs are computationally more efficient, making them suitable for large datasets.
- Feature Extraction: The convolutional layers automatically learn hierarchical features in the time series, capturing both local and global patterns.
However, CNNs may struggle to capture long-term dependencies in the data. RNNs and LSTMs often perform better in this area by maintaining state over longer sequences.
2. CNN Architecture for Time Series Forecasting
A CNN architecture designed for time series forecasting typically uses 1D convolutions. Each time step is treated as a “spatial” dimension, with the goal of detecting features within the time domain.
In this section, we’ll work with a more complex multivariate time series dataset that includes seasonality and multiple features such as temperature, humidity, and pressure.
2.1 Data Generation and Preprocessing
We start by generating synthetic multivariate time series data.
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# Set random seed for reproducibility
np.random.seed(42)
# Generate synthetic time series data with seasonality and multiple features
time_steps = 1000
time = np.arange(0, time_steps)
temperature = np.sin(time * 0.1) + np.random.normal(0, 0.1, time_steps)
humidity = np.cos(time * 0.1) + np.random.normal(0, 0.1, time_steps)
pressure = np.sin(time * 0.05) + np.random.normal(0, 0.05, time_steps)
# Stack features into a multivariate dataset
data = np.column_stack((temperature, humidity, pressure))
Note: Setting a random seed ensures that the results are reproducible.
2.2 Data Scaling and Windowing
Next, we normalize the data and create a dataset using a sliding window approach.
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
data_scaled = scaler.fit_transform(data)
# Create dataset with sliding window
def create_multivariate_dataset(data, window_size):
x, y = [], []
for i in range(len(data) - window_size):
x.append(data[i:i+window_size])
y.append(data[i+window_size, 0]) # Predicting the first feature (temperature)
return np.array(x), np.array(y)
window_size = 20
x, y = create_multivariate_dataset(data_scaled, window_size)
Splitting the Data:
We split the data into training and testing sets to evaluate the model’s performance on unseen data.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size=0.2, random_state=42, shuffle=False
)
Note: We set shuffle=False
to maintain the time order in time series data.
3. CNN Model for Time Series Forecasting in TensorFlow
3.1 Building the Model
We define a CNN model using the Conv1D
layer, suitable for time series data.
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(
filters=64,
kernel_size=3,
activation='relu',
input_shape=(window_size, x.shape[2])
),
tf.keras.layers.MaxPooling1D(pool_size=2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(100, activation='relu'),
tf.keras.layers.Dense(1) # Predict the first feature (temperature)
])
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae', 'mape'])
model.summary()
Model Architecture Explanation:
- Conv1D Layer: Extracts local patterns over the time steps.
- MaxPooling1D Layer: Reduces dimensionality and captures the most important features.
- Flatten Layer: Converts the 2D output into a 1D array for the Dense layers.
- Dense Layers: Learn higher-level representations and perform regression to predict the target value.
3.2 Training the Model
history = model.fit(
x_train, y_train,
epochs=10,
validation_data=(x_test, y_test)
)
Note: We use x_train
and y_train
for training and x_test
and y_test
for validation.
4. Evaluation Metrics Beyond MSE
While Mean Squared Error (MSE) is a common metric for regression tasks, using additional metrics like Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Squared Percentage Error (RMSPE) can provide a more comprehensive understanding of the model’s performance.
4.1 Making Predictions and Rescaling
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
# Make predictions on the test set
predictions = model.predict(x_test)
# Fit scaler only on the target variable (temperature)
target_scaler = MinMaxScaler(feature_range=(0, 1))
target_scaler.fit(data[:, 0].reshape(-1, 1))
# Rescale the predictions and true values back to original scale
predictions_rescaled = target_scaler.inverse_transform(predictions)
y_true_rescaled = target_scaler.inverse_transform(y_test.reshape(-1, 1))
4.2 Calculating Evaluation Metrics
import numpy as np
# Calculate various metrics
mse = mean_squared_error(y_true_rescaled, predictions_rescaled)
mae = mean_absolute_error(y_true_rescaled, predictions_rescaled)
mape = mean_absolute_percentage_error(y_true_rescaled, predictions_rescaled)
# Root Mean Squared Percentage Error (RMSPE)
epsilon = 1e-10 # To prevent division by zero
rmspe = np.sqrt(np.mean(np.square((y_true_rescaled - predictions_rescaled) / (y_true_rescaled + epsilon))))
print(f'MSE: {mse}')
print(f'MAE: {mae}')
print(f'MAPE: {mape}')
print(f'RMSPE: {rmspe}')
Note: Adding a small epsilon to y_true_rescaled
prevents division by zero in RMSPE calculation.
5. Hyperparameter Tuning
Hyperparameter tuning is a critical part of optimizing your model. Here, we use Keras Tuner to tune hyperparameters such as filters, kernel size, and the number of dense units.
5.1 Using Keras Tuner
import keras_tuner as kt # Adjust the import based on your Keras Tuner version
def build_model(hp):
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(
filters=hp.Int('filters', min_value=32, max_value=128, step=32),
kernel_size=hp.Choice('kernel_size', values=[3, 5]),
activation='relu',
input_shape=(window_size, x_train.shape[2])
),
tf.keras.layers.MaxPooling1D(pool_size=2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(
hp.Int('units', min_value=50, max_value=150, step=50),
activation='relu'
),
tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
return model
# Define the Keras Tuner
tuner = kt.RandomSearch(
build_model,
objective='val_mae',
max_trials=10,
executions_per_trial=2,
directory='my_dir',
project_name='time_series_tuning'
)
# Perform the hyperparameter search
tuner.search(x_train, y_train, epochs=10, validation_data=(x_test, y_test))
# Get the best model
best_model = tuner.get_best_models(num_models=1)[0]
Notes:
- We increased
max_trials
for better exploration of hyperparameters. - The objective metric is set to
'val_mae'
. - Ensure that
keras_tuner
is installed and correctly imported.
6. CNN Model for Time Series Forecasting in PyTorch
To provide a complete picture, we’ll implement a similar CNN model using PyTorch.
6.1 Data Preparation in PyTorch
import torch
from torch.utils.data import TensorDataset, DataLoader
# Convert NumPy arrays to PyTorch tensors
x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
x_test_tensor = torch.tensor(x_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)
# Create TensorDatasets
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)
test_dataset = TensorDataset(x_test_tensor, y_test_tensor)
# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
Note: We set shuffle=False
to maintain the sequence order.
6.2 Building the Model
import torch.nn as nn
import torch.optim as optim
class CNNTimeSeries(nn.Module):
def __init__(self):
super(CNNTimeSeries, self).__init__()
self.conv1 = nn.Conv1d(in_channels=x_train.shape[2], out_channels=64, kernel_size=3)
self.relu1 = nn.ReLU()
self.pool = nn.MaxPool1d(kernel_size=2)
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(64 * ((window_size - 2) // 2), 100)
self.relu2 = nn.ReLU()
self.fc2 = nn.Linear(100, 1)
def forward(self, x):
x = x.permute(0, 2, 1) # Reshape to (batch_size, channels, sequence_length)
x = self.conv1(x)
x = self.relu1(x)
x = self.pool(x)
x = self.flatten(x)
x = self.fc1(x)
x = self.relu2(x)
x = self.fc2(x)
return x
model = CNNTimeSeries()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
Model Explanation:
- Conv1d Layer: Processes the time series data with 1D convolution.
- MaxPool1d Layer: Reduces the dimensionality and focuses on the most significant features.
- Flatten Layer: Prepares data for the fully connected layers.
- Fully Connected Layers: Learn higher-level representations and perform regression.
6.3 Training the Model
num_epochs = 10
model.train()
for epoch in range(num_epochs):
running_loss = 0.0
for inputs, targets in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs.squeeze(), targets)
loss.backward()
optimizer.step()
running_loss += loss.item() * inputs.size(0)
epoch_loss = running_loss / len(train_loader.dataset)
print(f'Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}')
6.4 Evaluating the Model
model.eval()
predictions = []
true_values = []
with torch.no_grad():
for inputs, targets in test_loader:
outputs = model(inputs)
predictions.extend(outputs.squeeze().numpy())
true_values.extend(targets.numpy())
# Rescale predictions and true values
predictions_rescaled = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1))
y_true_rescaled = target_scaler.inverse_transform(np.array(true_values).reshape(-1, 1))
# Calculate evaluation metrics
mse = mean_squared_error(y_true_rescaled, predictions_rescaled)
mae = mean_absolute_error(y_true_rescaled, predictions_rescaled)
mape = mean_absolute_percentage_error(y_true_rescaled, predictions_rescaled)
epsilon = 1e-10
rmspe = np.sqrt(np.mean(np.square((y_true_rescaled - predictions_rescaled) / (y_true_rescaled + epsilon))))
print(f'MSE: {mse}')
print(f'MAE: {mae}')
print(f'MAPE: {mape}')
print(f'RMSPE: {rmspe}')
7. Best Practices and Final Thoughts
Here are some best practices for using CNNs in time series forecasting:
-
Preprocessing: Always normalize or standardize your data before feeding it into a CNN.
-
Data Splitting: Maintain the temporal order when splitting data; avoid shuffling for time series data.
-
Cross-Validation: Use time series cross-validation techniques like TimeSeriesSplit to get a more reliable estimate of model performance.
from sklearn.model_selection import TimeSeriesSplit tscv = TimeSeriesSplit(n_splits=5) for train_index, test_index in tscv.split(x): x_train_cv, x_test_cv = x[train_index], x[test_index] y_train_cv, y_test_cv = y[train_index], y[test_index] # Train and evaluate your model
-
Hyperparameter Tuning: Tuning the architecture of the CNN, such as the number of filters, kernel size, and pooling strategy, is essential for optimizing performance.
-
Regularization Techniques: Apply dropout layers and L2 regularization to prevent overfitting.
# Adding Dropout layer in TensorFlow model tf.keras.layers.Dropout(0.2)
-
Early Stopping: Use early stopping during training to prevent overfitting.
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5) model.fit(x_train, y_train, epochs=50, validation_data=(x_test, y_test), callbacks=[early_stopping])
-
Monitor Metrics: Include evaluation metrics like MAE and MAPE in
model.compile
to monitor them during training.
8. Conclusion
CNNs offer a powerful and efficient alternative to traditional RNNs and LSTMs for time series forecasting, especially when local patterns in the data are important. They are computationally less expensive and can capture both short- and medium-term dependencies. However, for tasks where long-term dependencies are critical, RNNs and LSTMs may still have the edge due to their ability to maintain state across time steps.
In practice, the choice of model will depend on the specific characteristics of your dataset and the problem you’re solving. CNNs are well-suited for applications where quick feature extraction from sequences is needed, while LSTMs and RNNs excel in handling complex temporal dependencies.
Experimenting with different architectures, tuning hyperparameters, and applying proper regularization will yield the best results, ensuring that your model generalizes well to unseen data. The use of cross-validation and more advanced evaluation metrics will also provide a clearer picture of your model’s performance, helping you make informed decisions about its deployment.