Time Series Forecasting with CNNs in PyTorch and TensorFlow

Feb 26, 2024

Convolutional Neural Networks (CNNs) are commonly used for image recognition tasks, but recent advances have shown that CNNs can also be effective for time series forecasting. This approach leverages the ability of CNNs to extract patterns from sequences of data, offering a powerful alternative to traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks. In this article, we’ll explore how CNNs can be adapted for time series forecasting and compare them with traditional models like RNNs and LSTMs.

Why Use CNNs for Time Series Forecasting?
CNN Architecture for Time Series Forecasting
- 2.1 Data Generation and Preprocessing
- 2.2 Data Scaling and Windowing
CNN Model for Time Series Forecasting in TensorFlow
- 3.1 Building the Model
- 3.2 Training the Model
Evaluation Metrics Beyond MSE
- 4.1 Making Predictions and Rescaling
- 4.2 Calculating Evaluation Metrics
Hyperparameter Tuning
- 5.1 Using Keras Tuner
CNN Model for Time Series Forecasting in PyTorch
Best Practices and Final Thoughts
Conclusion

1. Why Use CNNs for Time Series Forecasting?

CNNs are traditionally associated with image processing, but they have been successfully adapted for time series data because of their ability to learn features through convolutional layers. The advantages of using CNNs for time series forecasting include:

Parallel Processing: Unlike RNNs and LSTMs, CNNs don’t need to maintain a sequence order and can process multiple parts of the data in parallel.
Efficiency: CNNs are computationally more efficient, making them suitable for large datasets.
Feature Extraction: The convolutional layers automatically learn hierarchical features in the time series, capturing both local and global patterns.

However, CNNs may struggle to capture long-term dependencies in the data. RNNs and LSTMs often perform better in this area by maintaining state over longer sequences.

2. CNN Architecture for Time Series Forecasting

A CNN architecture designed for time series forecasting typically uses 1D convolutions. Each time step is treated as a “spatial” dimension, with the goal of detecting features within the time domain.

In this section, we’ll work with a more complex multivariate time series dataset that includes seasonality and multiple features such as temperature, humidity, and pressure.

2.1 Data Generation and Preprocessing

We start by generating synthetic multivariate time series data.

import numpy as np
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# Set random seed for reproducibility
np.random.seed(42)

# Generate synthetic time series data with seasonality and multiple features
time_steps = 1000
time = np.arange(0, time_steps)
temperature = np.sin(time * 0.1) + np.random.normal(0, 0.1, time_steps)
humidity = np.cos(time * 0.1) + np.random.normal(0, 0.1, time_steps)
pressure = np.sin(time * 0.05) + np.random.normal(0, 0.05, time_steps)

# Stack features into a multivariate dataset
data = np.column_stack((temperature, humidity, pressure))

Note: Setting a random seed ensures that the results are reproducible.

2.2 Data Scaling and Windowing

Next, we normalize the data and create a dataset using a sliding window approach.

# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
data_scaled = scaler.fit_transform(data)

# Create dataset with sliding window
def create_multivariate_dataset(data, window_size):
    x, y = [], []
    for i in range(len(data) - window_size):
        x.append(data[i:i+window_size])
        y.append(data[i+window_size, 0])  # Predicting the first feature (temperature)
    return np.array(x), np.array(y)

window_size = 20
x, y = create_multivariate_dataset(data_scaled, window_size)

Splitting the Data:

We split the data into training and testing sets to evaluate the model’s performance on unseen data.

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42, shuffle=False
)

Note: We set shuffle=False to maintain the time order in time series data.

3. CNN Model for Time Series Forecasting in TensorFlow

3.1 Building the Model

We define a CNN model using the Conv1D layer, suitable for time series data.

model = tf.keras.Sequential([
    tf.keras.layers.Conv1D(
        filters=64,
        kernel_size=3,
        activation='relu',
        input_shape=(window_size, x.shape[2])
    ),
    tf.keras.layers.MaxPooling1D(pool_size=2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(1)  # Predict the first feature (temperature)
])

model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae', 'mape'])
model.summary()

Model Architecture Explanation:

Conv1D Layer: Extracts local patterns over the time steps.
MaxPooling1D Layer: Reduces dimensionality and captures the most important features.
Flatten Layer: Converts the 2D output into a 1D array for the Dense layers.
Dense Layers: Learn higher-level representations and perform regression to predict the target value.

3.2 Training the Model

history = model.fit(
    x_train, y_train,
    epochs=10,
    validation_data=(x_test, y_test)
)

Note: We use x_train and y_train for training and x_test and y_test for validation.

4. Evaluation Metrics Beyond MSE

While Mean Squared Error (MSE) is a common metric for regression tasks, using additional metrics like Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Squared Percentage Error (RMSPE) can provide a more comprehensive understanding of the model’s performance.

4.1 Making Predictions and Rescaling

from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error

# Make predictions on the test set
predictions = model.predict(x_test)

# Fit scaler only on the target variable (temperature)
target_scaler = MinMaxScaler(feature_range=(0, 1))
target_scaler.fit(data[:, 0].reshape(-1, 1))

# Rescale the predictions and true values back to original scale
predictions_rescaled = target_scaler.inverse_transform(predictions)
y_true_rescaled = target_scaler.inverse_transform(y_test.reshape(-1, 1))

4.2 Calculating Evaluation Metrics

import numpy as np

# Calculate various metrics
mse = mean_squared_error(y_true_rescaled, predictions_rescaled)
mae = mean_absolute_error(y_true_rescaled, predictions_rescaled)
mape = mean_absolute_percentage_error(y_true_rescaled, predictions_rescaled)

# Root Mean Squared Percentage Error (RMSPE)
epsilon = 1e-10  # To prevent division by zero
rmspe = np.sqrt(np.mean(np.square((y_true_rescaled - predictions_rescaled) / (y_true_rescaled + epsilon))))

print(f'MSE: {mse}')
print(f'MAE: {mae}')
print(f'MAPE: {mape}')
print(f'RMSPE: {rmspe}')

Note: Adding a small epsilon to y_true_rescaled prevents division by zero in RMSPE calculation.

5. Hyperparameter Tuning

Hyperparameter tuning is a critical part of optimizing your model. Here, we use Keras Tuner to tune hyperparameters such as filters, kernel size, and the number of dense units.

5.1 Using Keras Tuner

import keras_tuner as kt  # Adjust the import based on your Keras Tuner version

def build_model(hp):
    model = tf.keras.Sequential([
        tf.keras.layers.Conv1D(
            filters=hp.Int('filters', min_value=32, max_value=128, step=32),
            kernel_size=hp.Choice('kernel_size', values=[3, 5]),
            activation='relu',
            input_shape=(window_size, x_train.shape[2])
        ),
        tf.keras.layers.MaxPooling1D(pool_size=2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(
            hp.Int('units', min_value=50, max_value=150, step=50),
            activation='relu'
        ),
        tf.keras.layers.Dense(1)
    ])
    
    model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
    return model

# Define the Keras Tuner
tuner = kt.RandomSearch(
    build_model,
    objective='val_mae',
    max_trials=10,
    executions_per_trial=2,
    directory='my_dir',
    project_name='time_series_tuning'
)

# Perform the hyperparameter search
tuner.search(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

# Get the best model
best_model = tuner.get_best_models(num_models=1)[0]

Notes:

We increased max_trials for better exploration of hyperparameters.
The objective metric is set to 'val_mae'.
Ensure that keras_tuner is installed and correctly imported.

6. CNN Model for Time Series Forecasting in PyTorch

To provide a complete picture, we’ll implement a similar CNN model using PyTorch.

6.1 Data Preparation in PyTorch

import torch
from torch.utils.data import TensorDataset, DataLoader

# Convert NumPy arrays to PyTorch tensors
x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
x_test_tensor = torch.tensor(x_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

# Create TensorDatasets
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)
test_dataset = TensorDataset(x_test_tensor, y_test_tensor)

# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

Note: We set shuffle=False to maintain the sequence order.

6.2 Building the Model

import torch.nn as nn
import torch.optim as optim

class CNNTimeSeries(nn.Module):
    def __init__(self):
        super(CNNTimeSeries, self).__init__()
        self.conv1 = nn.Conv1d(in_channels=x_train.shape[2], out_channels=64, kernel_size=3)
        self.relu1 = nn.ReLU()
        self.pool = nn.MaxPool1d(kernel_size=2)
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(64 * ((window_size - 2) // 2), 100)
        self.relu2 = nn.ReLU()
        self.fc2 = nn.Linear(100, 1)
        
    def forward(self, x):
        x = x.permute(0, 2, 1)  # Reshape to (batch_size, channels, sequence_length)
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.pool(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu2(x)
        x = self.fc2(x)
        return x

model = CNNTimeSeries()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Model Explanation:

Conv1d Layer: Processes the time series data with 1D convolution.
MaxPool1d Layer: Reduces the dimensionality and focuses on the most significant features.
Flatten Layer: Prepares data for the fully connected layers.
Fully Connected Layers: Learn higher-level representations and perform regression.

6.3 Training the Model

num_epochs = 10
model.train()

for epoch in range(num_epochs):
    running_loss = 0.0
    for inputs, targets in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs.squeeze(), targets)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)
    epoch_loss = running_loss / len(train_loader.dataset)
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}')

6.4 Evaluating the Model

model.eval()
predictions = []
true_values = []

with torch.no_grad():
    for inputs, targets in test_loader:
        outputs = model(inputs)
        predictions.extend(outputs.squeeze().numpy())
        true_values.extend(targets.numpy())

# Rescale predictions and true values
predictions_rescaled = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1))
y_true_rescaled = target_scaler.inverse_transform(np.array(true_values).reshape(-1, 1))

# Calculate evaluation metrics
mse = mean_squared_error(y_true_rescaled, predictions_rescaled)
mae = mean_absolute_error(y_true_rescaled, predictions_rescaled)
mape = mean_absolute_percentage_error(y_true_rescaled, predictions_rescaled)
epsilon = 1e-10
rmspe = np.sqrt(np.mean(np.square((y_true_rescaled - predictions_rescaled) / (y_true_rescaled + epsilon))))

print(f'MSE: {mse}')
print(f'MAE: {mae}')
print(f'MAPE: {mape}')
print(f'RMSPE: {rmspe}')

7. Best Practices and Final Thoughts

Here are some best practices for using CNNs in time series forecasting:

Preprocessing: Always normalize or standardize your data before feeding it into a CNN.
Data Splitting: Maintain the temporal order when splitting data; avoid shuffling for time series data.

Cross-Validation: Use time series cross-validation techniques like TimeSeriesSplit to get a more reliable estimate of model performance.

from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
for train_index, test_index in tscv.split(x):
    x_train_cv, x_test_cv = x[train_index], x[test_index]
    y_train_cv, y_test_cv = y[train_index], y[test_index]
    # Train and evaluate your model

Hyperparameter Tuning: Tuning the architecture of the CNN, such as the number of filters, kernel size, and pooling strategy, is essential for optimizing performance.
Regularization Techniques: Apply dropout layers and L2 regularization to prevent overfitting.
```
# Adding Dropout layer in TensorFlow model
tf.keras.layers.Dropout(0.2)
```

Early Stopping: Use early stopping during training to prevent overfitting.

early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
model.fit(x_train, y_train, epochs=50, validation_data=(x_test, y_test), callbacks=[early_stopping])

Monitor Metrics: Include evaluation metrics like MAE and MAPE in model.compile to monitor them during training.

8. Conclusion

CNNs offer a powerful and efficient alternative to traditional RNNs and LSTMs for time series forecasting, especially when local patterns in the data are important. They are computationally less expensive and can capture both short- and medium-term dependencies. However, for tasks where long-term dependencies are critical, RNNs and LSTMs may still have the edge due to their ability to maintain state across time steps.

In practice, the choice of model will depend on the specific characteristics of your dataset and the problem you’re solving. CNNs are well-suited for applications where quick feature extraction from sequences is needed, while LSTMs and RNNs excel in handling complex temporal dependencies.

Experimenting with different architectures, tuning hyperparameters, and applying proper regularization will yield the best results, ensuring that your model generalizes well to unseen data. The use of cross-validation and more advanced evaluation metrics will also provide a clearer picture of your model’s performance, helping you make informed decisions about its deployment.