GAN-UNSW-NB15-Dataset

Compartion of Models(SMOTE, GAN) for Data Augmentation with UNSW-NB15 Dataset

Proposed Model: GAN Data Augmentation + Security-related Feature elimination
Compare Model:
1. Data Augmentation x
2. SMOTE Data Augmentation
3. GAN Data Augmentation

About Dataset

UNSW-NB15 UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains raw network packets. The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal. Link: https://paperswithcode.com/dataset/unsw-nb15

Train set & Test set

In this experiment, we used the same amount of data as the image above. (*Used Train Set for Data Augmentation))

Backgroud (Data Analysis -> Proposed Solution)

(1) Data Analysis In this dataset, the number of Attack Category datasets (Backdoor, Analysis, ShellCode, Worms) is significantly smaller than others. When the number of instances for each category is highly imbalanced during classification, several problems can arise.

Such as..

1. Model Bias: The model may become biased towards the majority class, leading to poor performance on minority classes.
2. Poor Generalization: The model might not learn the characteristics of the minority classes well, resulting in poor generalization when making predictions on new data.
3. Skewed Metrics: Evaluation metrics such as accuracy may be misleading, as a high accuracy can be achieved by simply predicting the majority class.
4. Overfitting: The model may overfit the majority class data, capturing noise instead of the underlying patterns.

(2) Proposed Solution

Make balanced data by augmenting data
To enhance security, remove some security-related features during training

Model Selction

*Notation: the linked Notion Page is written in Korean.

Research on Gernerative AI for Data Augmentation -> select Gernerative AI Model
- https://button-breeze-d77.notion.site/AI-Generative-AI-data-augmentation-c6391a3b082f403591e913ae3cd94661?pvs=4
Related Work
- Network Intrusion Detection Based on Supervised Adversarial Variational Auto-Encoder With Regularization

Experiments and analysis

1. Data Augmentation x -> Train

Evaluation Metric: Accuracy
Data: ['SMOTE oversampled Data', 'GAN oversampled Data']
Training Accuracy: [99.9045918367347, 100.0]
Test Accuracy: [85.42678571428571, 99.995290349927]

2. GAN & SMOTE Data Augmentation o -> Train

Evaluation Metric: Accuracy, Memory Usage, Elapsed Time
- Working on solving Overfitting Problem

3. Security-related Feature elimination -> GAN & SMOTE Data Augmentation o -> Train

Evaluation Metric: Accuracy, Memory Usage, Elapsed Time
- Working on solving Overfitting Problem

Usage

Dataset:

used preprocessed UNSW-NB15 Dataset as datset.csv

Run

run python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
1. Preprocessing		1. Preprocessing
2. Models		2. Models
3. Combine(Origin+Generated Data)		3. Combine(Origin+Generated Data)
4. Evaluation		4. Evaluation
Data/1. Origin Dataset		Data/1. Origin Dataset
__pycache__		__pycache__
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GAN-UNSW-NB15-Dataset

About Dataset

Backgroud (Data Analysis -> Proposed Solution)

Model Selction

Experiments and analysis

Usage

Dataset:

Run

About

Uh oh!

Releases

Packages

Languages

haeun161/GAN_DataAugmentation-UNSW_NB15_Dataset

Folders and files

Latest commit

History

Repository files navigation

GAN-UNSW-NB15-Dataset

About Dataset

Backgroud (Data Analysis -> Proposed Solution)

Model Selction

Experiments and analysis

Usage

Dataset:

Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages