Skip to content

haeun161/GAN_DataAugmentation-UNSW_NB15_Dataset

Repository files navigation

GAN-UNSW-NB15-Dataset

Compartion of Models(SMOTE, GAN) for Data Augmentation with UNSW-NB15 Dataset

  • Proposed Model: GAN Data Augmentation + Security-related Feature elimination
  • Compare Model:
    1. Data Augmentation x
    2. SMOTE Data Augmentation
    3. GAN Data Augmentation

About Dataset

UNSW-NB15 UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains raw network packets. The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal. Link: https://paperswithcode.com/dataset/unsw-nb15

Train set & Test set image

  • In this experiment, we used the same amount of data as the image above. (*Used Train Set for Data Augmentation))

Backgroud (Data Analysis -> Proposed Solution)

(1) Data Analysis In this dataset, the number of Attack Category datasets (Backdoor, Analysis, ShellCode, Worms) is significantly smaller than others. When the number of instances for each category is highly imbalanced during classification, several problems can arise.

Such as..

  • 1. Model Bias: The model may become biased towards the majority class, leading to poor performance on minority classes.
  • 2. Poor Generalization: The model might not learn the characteristics of the minority classes well, resulting in poor generalization when making predictions on new data.
  • 3. Skewed Metrics: Evaluation metrics such as accuracy may be misleading, as a high accuracy can be achieved by simply predicting the majority class.
  • 4. Overfitting: The model may overfit the majority class data, capturing noise instead of the underlying patterns.

(2) Proposed Solution

  1. Make balanced data by augmenting data
  2. To enhance security, remove some security-related features during training

Model Selction

*Notation: the linked Notion Page is written in Korean.

  1. Research on Gernerative AI for Data Augmentation -> select Gernerative AI Model
  2. Related Work
    • Network Intrusion Detection Based on Supervised Adversarial Variational Auto-Encoder With Regularization

Experiments and analysis

1. Data Augmentation x -> Train

  • Evaluation Metric: Accuracy image

  • Data: ['SMOTE oversampled Data', 'GAN oversampled Data']

  • Training Accuracy: [99.9045918367347, 100.0]

  • Test Accuracy: [85.42678571428571, 99.995290349927]

2. GAN & SMOTE Data Augmentation o -> Train

  • Evaluation Metric: Accuracy, Memory Usage, Elapsed Time image
    • Working on solving Overfitting Problem

3. Security-related Feature elimination -> GAN & SMOTE Data Augmentation o -> Train

  • Evaluation Metric: Accuracy, Memory Usage, Elapsed Time image
    • Working on solving Overfitting Problem

Usage

Dataset:

used preprocessed UNSW-NB15 Dataset as datset.csv

Run

run python main.py

About

GAN을 통한 UNSW-NB15 Data Augmentation -> 학습 결과 비교

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages