The recently updated system or environment doesn't have the necessary CUDA library or drivers installed for cuDNN #5080

mariuslesniak · 2025-01-31T12:08:20Z

I am training a Deep Neural Network with TensorFlow's Keras API using T4 instance. This was working well for the last year or so until a day ago when the problem has emerged relating to CUDA error.

Describe the current behavior
The error is:
"InvalidArgumentError: Graph execution error:

Detected at node sequential_1/bidirectional_1/forward_lstm_1/CudnnRNNV3 defined at (most recent call last)"
The rest of the error code is contained in the attached file.

Describe the expected behavior
The expected behaviour would be to use GPU with no such error. "Dnn is not supported" indicates that the LSTM layer in the model is attempting to use the CuDNN implementation, which is optimized for NVIDIA GPUs. However, either there is no compatible GPU available in your environment or CuDNN is not properly configured. The fact that. the code worked well till yesterday suggests an unaccounted change in the system or the environment.

What web browser you are using
I am using Chrome

Additional context
Link to a minimal, public, self-contained notebook that reproduces this issue.

Share the file using your GitHub account using File > Save a copy as a GitHub Gist.
or Share Drive notebooks using the Share button then 'Get Shareable Link'.

mariuslesniak · 2025-01-31T20:35:39Z

I have also attached some elements of the code, as per below:

Imports and setting thge environment

!pip install -Uqq fastai

import numpy as np
import pandas as pd
import os
import string
from IPython.display import FileLink
from datetime import date
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, Bidirectional, Dropout

The rest of the code ....................................

    # Creating training data set
    # --------------------------
    print('Creating the training set')
    train = df.copy()
    train.head(window_length+1)

    train.tail(window_length+1)

    train_rows = train.values.shape[0]
    train_samples = np.empty([ train_rows - factor_a * window_length, window_length, number_of_features], dtype=float)
    train_labels = np.empty([ train_rows - factor_a * window_length, number_of_features], dtype=float)
    for i in range(0, train_rows - factor_a * window_length):
        train_samples[i] = train.iloc[i : i+window_length, 0 : number_of_features]
        train_labels[i] = train.iloc[i+window_length : i+window_length+1, 0 : number_of_features]


    print('Creating scales samples')
    scaler = StandardScaler()
    transformed_dataset = scaler.fit_transform(train.values)
    scaled_train_samples = pd.DataFrame(data=transformed_dataset, index=train.index)
    scaled_train_samples.head(window_length+1)
    x_train = np.empty([ train_rows - factor_a * window_length, window_length, number_of_features], dtype=float)
    y_train = np.empty([ train_rows - factor_a * window_length, number_of_features], dtype=float)

    for i in range(0, train_rows - factor_a * window_length):
        x_train[i] = scaled_train_samples.iloc[i : i+window_length, 0 : number_of_features]
        y_train[i] = scaled_train_samples.iloc[i+window_length : i+window_length+1, 0 : number_of_features]

    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import LSTM, Dense, Bidirectional, Dropout
    from tensorflow.keras.optimizers import Adam
    from tensorflow.keras.metrics import mse

    physical_devices = tf.config.list_physical_devices('GPU')
    print("Num GPUs Available: ", len(physical_devices))
    try:
        tf.config.experimental.set_memory_growth(physical_devices[0], True)
    except:
    #   Invalid device or cannot modify virtual devices once initialized.
        pass

    # Initialising the RNN
    model = Sequential()
    # Adding the input layer and the LSTM layer
    model.add(Bidirectional(LSTM(240,
                            input_shape = (window_length, number_of_features),
                            return_sequences = True)))
    # Adding a first Dropout layer
    model.add(Dropout(0.2))
    # Adding a second LSTM layer
    model.add(Bidirectional(LSTM(240,
                            input_shape = (window_length, number_of_features),
                            return_sequences = True)))
    # Adding a second Dropout layer
    model.add(Dropout(0.2))
    # Adding a third LSTM layer
    model.add(Bidirectional(LSTM(240,
                            input_shape = (window_length, number_of_features),
                            return_sequences = True)))
    # Adding a fourth LSTM layer
    model.add(Bidirectional(LSTM(240,
                            input_shape = (window_length, number_of_features),
                            return_sequences = False)))
    # Adding a third Dropout layer
    model.add(Dropout(0.2))
    # Adding the first output layer
    model.add(Dense(70))
    # Adding the last output layer
    model.add(Dense(number_of_features))

    model.compile(optimizer=Adam(learning_rate=0.0001), loss ='mse', metrics=['accuracy'])

    model.fit(x=x_train, y=y_train, batch_size=100, epochs=2000, verbose=2)

The rest of the code .........................................................................

metrizable · 2025-01-31T22:30:44Z

@mariuslesniak Thanks for filing the issue and thanks for using Colab.

Thanks for some of the code. You mentioned:

"The rest of the error code is contained in the attached file"

Could you upload this file so that we can help troubleshoot? Also, are you able to provide a minimal reproducible example that we can run and debug? Thanks!

mariuslesniak · 2025-02-01T22:49:30Z

Thanks for your message. In response, I have attached three files as follows:

parametric_colab_forecasting_loop.ipynb, a cutdown version of my code that you can run to see the problem (uploaded as a .txt file
source-data-history.csv, a set of example input data
predict-data-template.csv, an output data template required by the programme
In file (1) lines 28,31,36,45-47, 57, 242 and 265 refere to the relevant (2) and (3) file names. In my case the (2) nd (3) files were placed on a google-drive in a directory called MyDrive/Colab_files, as mounted on the system /content/drive

Hope this will help.

Kind regards

parametric_colab_forecasting_loop.txt

source-data-history.csv
predict-data-template.csv

metrizable · 2025-02-03T16:32:09Z

@mariuslesniak Thanks for the example. I was able to successfully run the code in parametric_colab_forecasting_loop.txt with your provided .csv files. I did make two modifications: 1) I lowered the cycle count limit to finish in a timely fashion, and 2) updated the code to fix the warning: "Do not pass an input_shape/input_dim argument to a layer. When using Sequential models, prefer using an Input(shape) object as the first layer in the model instead":

# Initialising the RNN
model = Sequential()
model.add(keras.Input(shape=(window_length, number_of_features)))

I invoked your sample code on a GPU T4 runtime and did not see any errors (the one cited in the OP ("InvalidArgumentError: Graph execution error) or otherwise) in the output or in the final:

It may be that your larger cycle count causes later errors, but that would seem unrelated to CUDA not configured correctly. Are you able to share a notebook with output saved that includes the error?

mariuslesniak · 2025-02-04T10:12:36Z

Hi, I was able to do the suggested correction regarding "input_shape" and rerun the code. Unfortunately, I still have the same problem when running it in the available (latest) Jupyter Notebook. I have attached the edited code (parametric_colab_forecasting_loop.txt) as well as the resulting error (output.txt).
I used T4 GPU and also checked the versions of TF: 2.18.0 and nvidia drivers: 550.54.15 and CUDA Version: 12.4.
If I run the code in the fallback runtime version, as available via the Command Palette and the "Use fallback runtime version" command, everything seems to be working fine.

parametric_colab_forecasting_loop.txt

output.txt

Kind regards,

mariuslesniak added the bug label Jan 31, 2025

metrizable added the reply-needed label Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The recently updated system or environment doesn't have the necessary CUDA library or drivers installed for cuDNN #5080

The recently updated system or environment doesn't have the necessary CUDA library or drivers installed for cuDNN #5080

mariuslesniak commented Jan 31, 2025 •

edited

Loading

mariuslesniak commented Jan 31, 2025

metrizable commented Jan 31, 2025

mariuslesniak commented Feb 1, 2025 •

edited

Loading

metrizable commented Feb 3, 2025 •

edited

Loading

mariuslesniak commented Feb 4, 2025 •

edited

Loading

The recently updated system or environment doesn't have the necessary CUDA library or drivers installed for cuDNN #5080

The recently updated system or environment doesn't have the necessary CUDA library or drivers installed for cuDNN #5080

Comments

mariuslesniak commented Jan 31, 2025 • edited Loading

mariuslesniak commented Jan 31, 2025

Imports and setting thge environment

The rest of the code ....................................

The rest of the code .........................................................................

metrizable commented Jan 31, 2025

mariuslesniak commented Feb 1, 2025 • edited Loading

metrizable commented Feb 3, 2025 • edited Loading

mariuslesniak commented Feb 4, 2025 • edited Loading

mariuslesniak commented Jan 31, 2025 •

edited

Loading

mariuslesniak commented Feb 1, 2025 •

edited

Loading

metrizable commented Feb 3, 2025 •

edited

Loading

mariuslesniak commented Feb 4, 2025 •

edited

Loading