Skip to content

dyinjin/IPMN_1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Requirement

python packages

pip install -r requirements.txt

source data SAML-D.csv - 950 MB https://www.kaggle.com/datasets/berkanoztas/synthetic-transaction-monitoring-dataset-aml

config your path in ./ipmn_proflow/config_82d.json

{
  "DATAPATH": "E:/default_download/IPMN_pro/IPMN_1/data/",
  "ORI_ALL_CSV": "SAML-D.csv",
  ...
}

Run in Preset Config

run in SAML-D dataset split in 8:2, with multi_window(14d_4d 7d_7d), build basic, graph and pattern features.

python ./ipmn_proflow/main.py --config_path ./config_82d.json

Custom Configs in JSON

You can build your own .json config just follow the pattern.

General Configuration

Key Type Default Description
DATAPATH string "E:/default_download/IPMN_pro/IPMN_1/data/" Root directory for data files
ORI_ALL_CSV string "SAML-D.csv" Original dataset filename
IBM_CSV string "sampled_IBM.csv" IBM-specific dataset filename
SAVE_TRANS string "saved_transformer.pkl" Filename to save the feature transformer
SAVE_MODEL string "saved_model.pkl" Filename to save the trained model
RANDOM_SEED int 42 Random seed for reproducibility
SAVE_LEVEL int -1 Custom save level flag, -1: no model save, 0: only model save, 1: dataset save, 2:data with pred
SHOW_LEVEL int 1 Verbosity level for logging/output, 0: show nothing, 1: show importance, 2: save tree, 3: Permutation importance, 4: SHAP summary, 5: SHAP waterfall

Feature & Label Configuration

Key Type Default Description
STANDARD_INPUT_PARAM list[string] See below List of standard input features
STANDARD_INPUT_LABEL string "Is_laundering" Binary classification label
MULTI_CLASS_LABEL string "Laundering_type" Multi-class classification label
STANDARD_DROP_PARAM list[string] ["Date", "Timestamp", "Year", "Month"] Columns to drop during preprocessing

Default STANDARD_INPUT_PARAM:

[
  "Is_laundering",
  "Laundering_type",
  "Date",
  "Time",
  "Sender_account",
  "Receiver_account",
  "Amount",
  "Payment_currency",
  "Received_currency",
  "Payment_type"
]

Sliding Window Configuration

Key Type Default Description
STEP_UNIT string "d" Unit of time step (e.g., day)
WINDOW_SIZE int 10 Size of the sliding window
SLIDER_STEP int 1 Step size for sliding window

Model Tuning Parameters

Key Type Default Description
PARAM_GRID dict { "max_depth": [14, 16], "eta": [0.12, 0.14] } Grid for hyperparameter tuning
TPR float 0.95 Target True Positive Rate
TPR_SET int 0 Flag to enable TPR adjustment (0 = disabled)

Dataset Modes

Key Type Default Description
DATASET_MODES string "quick_test" Mode for selecting training/testing data

Valid values:

  • quick_test
  • all_d73
  • all_d82
  • first_2_d73
  • first_4_d73
  • IBM_d73
  • specific_train_specific_test

Parameter Modes

Key Type Default Description
PARAMETER_MODES string "basic" Mode for feature processing

Valid values:

  • origin
  • basic
  • window_graph
  • multi_window_graph
  • window_all
  • multi_window_all

Quick Test Mode Parameters

Used only when DATASET_MODES = "quick_test".

Key Type Default Description
QT_TRAIN_START string "2022/11/01" Start date for training data
QT_TRAIN_END string "2022/11/30" End date for training data
QT_TEST_START string "2023/04/01" Start date for testing data
QT_TEST_END string "2023/04/30" End date for testing data

Specific Train/Test Mode Parameters

Used only when DATASET_MODES = "specific_train_specific_test".

Key Type Default Description
SP_TRAIN_FILE string "2022-11.csv" Specific training file
SP_TEST_FILE string "2023-06.csv" Specific testing file

Custom function model

Just match the input and output. in main.py

    config = load_config()

    train_set, test_set = load_dataset(config)

    y_train, y_test, X_train, X_test = split_label(config, train_set, test_set)

    X_train, X_test = add_parameter(config, X_train, X_test)

    save_feature_data2csv(config, y_train, y_test, X_train, X_test)

    X_train, X_test, model_save_id = encode_feature(config, X_train, X_test)

    grid_search_model = config_model(config)

    trained_grid_search_model = train_model(grid_search_model, X_train, y_train)

    best_model = search_best_save(config, trained_grid_search_model, model_save_id)

    feature_analysis(config, best_model, X_test, y_test)

    test_probabilities = test_model(best_model, X_test)

    save_predict_data2csv_float(config, test_probabilities, X_test, y_test)

    y_pred = analysis_performance(config, y_test, test_probabilities)

    save_predict_data2csv_bool(config, y_pred, X_test, y_test)

IBM dataset

HI-Medium_Trans.csv - 2.82 GB https://www.kaggle.com/datasets/ealtman2019/ibm-transactions-for-anti-money-laundering-aml

need run ./data/sample.py split to sampled_IBM.csv and sampled_IBM_pred.csv

Directory Structure


IPMN_1/
├── data/
│   ├── SAML-D.csv (need download)
│   ├── HI-Medium_Trans.csv (need download)
│   ├── sampled_IBM.csv (need run sample)
│   ├── sampled_IBM_pred.csv (need run sample)
│   └── sample.py
├── ipmn_proflow/
│   ├── parameter_handler/ (handle window and feature)
│   ├── xgb_trees/ (tree structure store path)
│   ├── main.py
│   ├── config_82d.json
│   ├── __init__.py
│   ├── analysis.py
│   ├── config.py
│   ├── dataloader.py
│   ├── datasaver.py
│   ├── imports.py ("from imports import *")
│   ├── model.py
│   ├── param_feature.py
│   ├── predictor.py
├── utili/ (tools that have been used)
├── requirements.txt
└── README.md

License

https://github.com/dyinjin/IPMN_1

This project is open-sourced under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages