For the case of the discrete model, specifically the model definition in the file kdd99_model.py; why is the prediction layer activation function sigmoid and not softmax as the KDD99 problem is a multi-class classification problem?
|
pred = tf.keras.layers.Dense(n_labels, activation='sigmoid')(net) |
Also, why is the from_logits parameter set to True in the SparseCategoricalCrossentropy loss function, if the prediction layer of the model already has a sigmoid activation function?
|
model_full.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), \ |
|
metrics=['accuracy'], |
|
optimizer='adam') |