Requirement

python packages

pip install -r requirements.txt

source data SAML-D.csv - 950 MB https://www.kaggle.com/datasets/berkanoztas/synthetic-transaction-monitoring-dataset-aml

config your path in ./ipmn_proflow/config_82d.json

{
  "DATAPATH": "E:/default_download/IPMN_pro/IPMN_1/data/",
  "ORI_ALL_CSV": "SAML-D.csv",
  ...
}

Run in Preset Config

run in SAML-D dataset split in 8:2, with multi_window(14d_4d 7d_7d), build basic, graph and pattern features.

python ./ipmn_proflow/main.py --config_path ./config_82d.json

Custom Configs in JSON

You can build your own .json config just follow the pattern.

General Configuration

Key	Type	Default	Description
`DATAPATH`	`string`	`"E:/default_download/IPMN_pro/IPMN_1/data/"`	Root directory for data files
`ORI_ALL_CSV`	`string`	`"SAML-D.csv"`	Original dataset filename
`IBM_CSV`	`string`	`"sampled_IBM.csv"`	IBM-specific dataset filename
`SAVE_TRANS`	`string`	`"saved_transformer.pkl"`	Filename to save the feature transformer
`SAVE_MODEL`	`string`	`"saved_model.pkl"`	Filename to save the trained model
`RANDOM_SEED`	`int`	`42`	Random seed for reproducibility
`SAVE_LEVEL`	`int`	`-1`	Custom save level flag, -1: no model save, 0: only model save, 1: dataset save, 2:data with pred
`SHOW_LEVEL`	`int`	`1`	Verbosity level for logging/output, 0: show nothing, 1: show importance, 2: save tree, 3: Permutation importance, 4: SHAP summary, 5: SHAP waterfall

Feature & Label Configuration

Key	Type	Default	Description
`STANDARD_INPUT_PARAM`	`list[string]`	See below	List of standard input features
`STANDARD_INPUT_LABEL`	`string`	`"Is_laundering"`	Binary classification label
`MULTI_CLASS_LABEL`	`string`	`"Laundering_type"`	Multi-class classification label
`STANDARD_DROP_PARAM`	`list[string]`	`["Date", "Timestamp", "Year", "Month"]`	Columns to drop during preprocessing

Default STANDARD_INPUT_PARAM:

[
  "Is_laundering",
  "Laundering_type",
  "Date",
  "Time",
  "Sender_account",
  "Receiver_account",
  "Amount",
  "Payment_currency",
  "Received_currency",
  "Payment_type"
]

Sliding Window Configuration

Key	Type	Default	Description
`STEP_UNIT`	`string`	`"d"`	Unit of time step (e.g., day)
`WINDOW_SIZE`	`int`	`10`	Size of the sliding window
`SLIDER_STEP`	`int`	`1`	Step size for sliding window

Model Tuning Parameters

Key	Type	Default	Description
`PARAM_GRID`	`dict`	`{ "max_depth": [14, 16], "eta": [0.12, 0.14] }`	Grid for hyperparameter tuning
`TPR`	`float`	`0.95`	Target True Positive Rate
`TPR_SET`	`int`	`0`	Flag to enable TPR adjustment (0 = disabled)

Dataset Modes

Key	Type	Default	Description
`DATASET_MODES`	`string`	`"quick_test"`	Mode for selecting training/testing data

Valid values:

quick_test
all_d73
all_d82
first_2_d73
first_4_d73
IBM_d73
specific_train_specific_test

Parameter Modes

Key	Type	Default	Description
`PARAMETER_MODES`	`string`	`"basic"`	Mode for feature processing

Valid values:

origin
basic
window_graph
multi_window_graph
window_all
multi_window_all

Quick Test Mode Parameters

Used only when DATASET_MODES = "quick_test".

Key	Type	Default	Description
`QT_TRAIN_START`	`string`	`"2022/11/01"`	Start date for training data
`QT_TRAIN_END`	`string`	`"2022/11/30"`	End date for training data
`QT_TEST_START`	`string`	`"2023/04/01"`	Start date for testing data
`QT_TEST_END`	`string`	`"2023/04/30"`	End date for testing data

Specific Train/Test Mode Parameters

Used only when DATASET_MODES = "specific_train_specific_test".

Key	Type	Default	Description
`SP_TRAIN_FILE`	`string`	`"2022-11.csv"`	Specific training file
`SP_TEST_FILE`	`string`	`"2023-06.csv"`	Specific testing file

Custom function model

Just match the input and output. in main.py

    config = load_config()

    train_set, test_set = load_dataset(config)

    y_train, y_test, X_train, X_test = split_label(config, train_set, test_set)

    X_train, X_test = add_parameter(config, X_train, X_test)

    save_feature_data2csv(config, y_train, y_test, X_train, X_test)

    X_train, X_test, model_save_id = encode_feature(config, X_train, X_test)

    grid_search_model = config_model(config)

    trained_grid_search_model = train_model(grid_search_model, X_train, y_train)

    best_model = search_best_save(config, trained_grid_search_model, model_save_id)

    feature_analysis(config, best_model, X_test, y_test)

    test_probabilities = test_model(best_model, X_test)

    save_predict_data2csv_float(config, test_probabilities, X_test, y_test)

    y_pred = analysis_performance(config, y_test, test_probabilities)

    save_predict_data2csv_bool(config, y_pred, X_test, y_test)

IBM dataset

HI-Medium_Trans.csv - 2.82 GB https://www.kaggle.com/datasets/ealtman2019/ibm-transactions-for-anti-money-laundering-aml

need run ./data/sample.py split to sampled_IBM.csv and sampled_IBM_pred.csv

Directory Structure


IPMN_1/
├── data/
│   ├── SAML-D.csv (need download)
│   ├── HI-Medium_Trans.csv (need download)
│   ├── sampled_IBM.csv (need run sample)
│   ├── sampled_IBM_pred.csv (need run sample)
│   └── sample.py
├── ipmn_proflow/
│   ├── parameter_handler/ (handle window and feature)
│   ├── xgb_trees/ (tree structure store path)
│   ├── main.py
│   ├── config_82d.json
│   ├── __init__.py
│   ├── analysis.py
│   ├── config.py
│   ├── dataloader.py
│   ├── datasaver.py
│   ├── imports.py ("from imports import *")
│   ├── model.py
│   ├── param_feature.py
│   ├── predictor.py
├── utili/ (tools that have been used)
├── requirements.txt
└── README.md

License

https://github.com/dyinjin/IPMN_1

This project is open-sourced under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Requirement

Run in Preset Config

Custom Configs in JSON

General Configuration

Feature & Label Configuration

Sliding Window Configuration

Model Tuning Parameters

Dataset Modes

Parameter Modes

Quick Test Mode Parameters

Specific Train/Test Mode Parameters

Custom function model

IBM dataset

Directory Structure

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
data		data
ipmn_proflow		ipmn_proflow
utili		utili
README.md		README.md
interim pre.pptx		interim pre.pptx
requirements.txt		requirements.txt

dyinjin/IPMN_1

Folders and files

Latest commit

History

Repository files navigation

Requirement

Run in Preset Config

Custom Configs in JSON

General Configuration

Feature & Label Configuration

Sliding Window Configuration

Model Tuning Parameters

Dataset Modes

Parameter Modes

Quick Test Mode Parameters

Specific Train/Test Mode Parameters

Custom function model

IBM dataset

Directory Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages