|
|
|
|
Poly-MigrationBench is a follow-up work of MigrationBench.
While MigrationBench focuses exclusively on Java, the real-world code migration problem spans multiple ecosystems. To address this broader scope, we develop Poly-MigrationBench, an extension that introduces additional languages and platforms. We applied a similar data curation process as MigrationBench to additionally collect
- 100 .NET Framework repositories. The target is to be migrated to .NET core.
- 74 Node.js repositories with Node.js version less than 22. The target is to be migrated to Node.js 22.
- 83 Python repositories with Python version less than 3.13. The target is to be migrated to Python 3.13.
For more details on the problem formulation, dataset curation pipeline and evaluation framework, read our paper: MigrationBench: Repository-Level Code Migration Benchmark from Java 8
There are three datasets in Poly-MigrationBench:
- All repositories included in the datasets are under the
MITorApache-2.0license.
| Index | Dataset | Size | Notes |
|---|---|---|---|
| 1 | Poly-MigrationBench-dotnet |
100 | Under .NET Framework |
| 2 | Poly-MigrationBench-node |
74 | Under Node.js <= 20 |
| 3 | Poly-MigrationBench-python |
83 | Under Python <= 3.12 |
Metadata is provided in the csv file for each dataset.
repo (str): The original repo URL without thehttps://github.com/prefixbase_commit (str): Base commit id- At this commit the repository is under .NET Framework. The repositories can build and pass unit tests
num_cs_files (int): Number of*.csfiles in the repository atbase_commitroot_sln_or_csproj_files (str): The list of*.slnand*.csprojfiles under the project root directory atbase_commit. Files are separated by;verify_command (str): The command used to verify migration success, which is also the verifiervintroduced in Section 3.1 of the paper MigrationBench: Repository-Level Code Migration Benchmark from Java 8.license (str): The license of the repository, either MIT or Apache-2.0 for the whole dataset
repo (str): The original repo URL without thehttps://github.com/prefixbase_commit (str): Base commit id- At this commit the repository is under Node.js <= 20
num_js_and_ts_files (int): Number of*.jsand*.tsfiles in the repository atbase_commitnum_loc (int): Number of lines of code for*.jsand*.tsfiles in the repository atbase_commitnum_unit_test (int): Number of unit testsinstall_command (str): The command used to install dependenciesbuild_command (Optional[str]): The command used to build the project and might be empty if there is no build commandtest_command (str): The command used to run unit testsdeploy_command (Optional[str]): the command used to deploy the AWS Lambda project and might be empty if there no deploy command. Depending on the migration goal, any non-empty combination of theinstall_command,build_command,test_command, ordeploy_commandcan serve as the verifiervfor migration success.license (str): The license of the repository, either MIT or Apache2.0 for the whole dataset
repo (str): The original repo URL without thehttps://github.com/prefixbase_commit (str): Base commit id- At this commit the repository is under Python <= 3.12
num_files (int): ,num_loc,build_commandnum_py_files (int): Number of*.pyfiles in the repository atbase_commitnum_loc (int): Number of lines of code for*.pyfiles in the repository atbase_commitlicense (str): The license of the repository, either MIT or Apache-2.0 for the whole dataset
@misc{liu2025migrationbenchrepositorylevelcodemigration,
title={MigrationBench: Repository-Level Code Migration Benchmark from Java 8},
author={Linbo Liu and Xinle Liu and Qiang Zhou and Lin Chen and Yihan Liu and Hoan Nguyen and Behrooz Omidvar-Tehrani and Xi Shen and Jun Huan and Omer Tripp and Anoop Deoras},
year={2025},
eprint={2505.09569},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2505.09569},
}