Skip to content

Latest commit

 

History

History

btw2025-incsliceline-p111

Reproducibility Submission BTW 2025, Paper 111

Authors: Frederic Caspar Zoepffel, Christina Dionysio, and Matthias Boehm

Paper Name: Incremental SliceLine for Iterative ML Model Debugging under Updates

Paper Links:

Source Code Info:

Datasets Used:

Hardware and Software Info: The experiments use a single scale-out server node:

  • Scale-out node: single AMD EPYC 7443P CPU at 2.8-4.0 GHz (24 physical/48 virtual cores), 256GB DDR4 RAM at 3.2 GHz balanced across 8 memory channels, 2 x 480 GB SATA SSDs (system), 8 x 2TB SATA HDDs (data).
  • The software stack comprises Ubuntu 20.04.6, Apache Hadoop 3.3, Apache Spark 3.5, OpenJDK 11 (with -Xmx200g -Xms200g), and Apache SystemDS from the main branch (as of Jan 03, 2025).

Setup and Experiments: The repository is pre-populated with the paper's experimental results (./results), individual plots (./plots), and SystemDS source code. The entire experimental evaluation can be run via ./runAll.sh, which deletes the results and plots and performs setup, dataset download and preparation, local experiments, and plotting. However, for a more controlled evaluation, we recommend running the individual steps separately (e.g., via nohup ./run5Experiments.sh & which takes approximately 20h):

./run1SetupDependencies.sh;
./run2SetupSystemDS.sh;
./run3DownloadData.sh;
./run4PrepareLocalData.sh;
./run5Experiments.sh;      # timed measurements
./run6PlotResults.sh;

Last Update: Jan 13, 2025 (initial setup)


[1] Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthoer, Kevin Innerebner, Florijan Klezin, Stefanie N. Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqui, and Sebastian Benjamin Wrede: SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle, CIDR 2020