The benchmarks evaluate the performance of getML's FastProp algorithm against five other open-source libraries for automated feature engineering on relational data and time series.
Data-Sets:
air_pollution
The dataset contains hourly data on air pollution and weather in Beijing, China. The challenge is to predict the pm2.5 concentration for the next hour.
S. De Vito, E. Massera, M. Piga, L. Martinotto, and G. Di Francia,“On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario,” Sensors and Actuators B: Chemical, vol. 129, no. 2, pp. 750–757, 2008. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925400507007691
A detailed demonstration of how to handle this data-set can be found in the getML-demo repository
dodgers
The dataset contains five-minute measurements of traffic near Los Angeles. The traffic volume can be affected by a game hosted by the LA Dodgers in the nearby stadium, but not to the extent that it is very obvious to spot such an event in the data. The LA Dodgers are a popular baseball team from Los Angeles. The challenge is to predict the traffic volume for the next five-minute interval.
A. Ihler, J. Hutchins, and P. Smyth, “Adaptive event detection with time-varying poisson processes,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 207–216.
A detailed demonstration of how to handle this data-set can be found in the getML-demo repository
energy
The dataset contains measurements of the electricity consumption of a single household in ten-minute-intervals. The challenge is to predict the energy consumption of all household appliances for the next ten-minute interval.
interstate94
The dataset contains hourly data on traffic volume on the Interstate 94 from Minneapolis to StPaul. The challenge is to predict the traffic volume for the next hour.
A detailed demonstration of how to handle this data-set can be found in the getML-demo repository
tetouan
The dataset contains the electricity consumption of three different zones in Tetouan City, north Morocco measured in ten-minute intervals. The challenge is to predict the electricity consumption in Zone 1 for the next ten-minute interval.
A. Salam and A. El Hibaoui, “Comparison of machine learning algorithms for the power consumption prediction:-case study of tetouan city–,” in 2018 6th International Renewable and Sustainable Energy Conference (IRSEC). IEEE, 2018, pp. 1–5.
Libraries:
Build the image with necessary Python version and libraries installed
$ docker compose buildImportant
Because of the used libraries, this benchmarks only run on the x86_64 architecture.
Run the benchmarks inside a container based on the build image
$ docker compose up benchmarksThe logs written by the benchmarks are written to the terminal in the Run step. They can be recalled later via
$ docker compose logs benchmarksFor better reading and scrolling use for example
$ docker compose logs benchmarks 2>&1 | sed 's/\r/\n/g' | lessRemove the container, its images, logs and volumes
$ docker compose down --volumes --rmi allNote
--rmi for removing images does currently not work with podman compose.
Images have to be removed separately for example via podman image prune --all