Releases: oneapi-src/oneDAL
Intel® oneAPI Data Analytics Library 2021.3
The release introduces the following changes:
📚 Support Materials
The following additional materials were created:
-
Medium blogs:
- Superior Machine Learning Performance on the Latest Intel Xeon Scalable Processors
- Leverage Intel Optimizations in Scikit-Learn (SVM Performance Training and Inference)
- Optimizing CatBoost Performance
- Performance Optimizations for End-to-End AI Pipelines
- Optimizing the End-to-End Training Pipeline on Apache Spark Clusters
-
Kaggle kernels:
- [Tabular Playground Series - Apr 2021] RF with Intel Extension for Scikit-learn
- [Tabular Playground Series - Apr 2021] SVM with Intel Extension for Scikit-learn
- [Tabular Playground Series - Apr 2021] SVM with scikit-learn-intelex
-
Samples that illustrate the usage of Intel Extension for Scikit-learn
🛠️ Library Engineering
- Introduced a new Python package, Intel® Extension for Scikit-learn*. The scikit-learn-intelex package contains scikit-learn patching functionality that was originally available in daal4py package. All future updates for the patches will be available only in Intel® Extension for Scikit-learn. We recommend using scikit-learn-intelex package instead of daal4py.
- Download the extension using one of the following commands:
pip install scikit-learn-intelex
conda install scikit-learn-intelex -c conda-forge
- Enable Scikit-learn patching:
from sklearnex import patch_sklearn
patch_sklearn()
- Download the extension using one of the following commands:
- Introduced optional dependencies on DPC++ runtime to daal4py. To enable DPC++ backend, install dpcpp_cpp_rt package. It reduces the default package size with all dependencies from 1.2GB to 400 MB.
- Added the support of building oneDAL-based applications with /MD and /MDd options on Windows. The -d suffix is used in the names of oneDAL libraries that are built with debug run-time (/MDd).
🚨 What's New
Introduced new oneDAL and daal4py functionality:
- CPU:
- SVM Regression algorithm
- NuSVM algorithm for both Classification and Regression tasks
- Polynomial kernel support for all SVM algorithms (SVC, SVR, NuSVC, NuSVR)
- Minkowski and Chebyshev distances for kNN Brute-force
- The brute-force method and the voting mode support for kNN algorithm in oneDAL interfaces
- Multiclass support for SVM algorithms in oneDAL interfaces
- CSR-matrix support for SVM algorithms in oneDAL interfaces
- Subgraph Isomorphism algorithm technical preview
- Single Source Shortest Path (SSSP) algorithm technical preview
Improved oneDAL and daal4py performance for the following algorithms:
- CPU:
- Support Vector Machines training and prediction
- Linear, Ridge, ElasticNet, and LASSO regressions prediction
- GPU:
- Decision Forest training and prediction
- Principal Components Analysis training
Introduced the support of scikit-learn 1.0 version in Intel Extension for Scikit-learn.
- The 2021.3 release of Intel Extension for Scikit-learn supports the latest scikit-learn releases: 0.22.X, 0.23.X, 0.24.X and 1.0.X.
Introduced new functionality for Intel Extension for Scikit-learn:
- General:
- The support of
patch_sklearn
for all algorithms
- The support of
- CPU:
- Acceleration of SVR estimator
- Acceleration of NuSVC and NuSVR estimators
- Polynomial kernel support in SVM algorithms
Improved the performance of the following scikit-learn estimators via scikit-learn patching:
- SVM algorithms training and prediction
- Linear, Ridge, ElasticNet, and Lasso regressions prediction
Fixed the following issues:
- General:
- Fixed binary incompatibility for the versions of numpy earlier than 1.19.4
- Fixed an issue with a very large number of trees (> 7000) for Random Forest algorithm.
- Fixed
patch_sklearn
to patch both fit and predict methods of Logistic Regression when the algorithm is given as a single parameter topatch_sklearn
- CPU:
- Improved numerical stability of training for Alternating Least Squares (ALS) and Linear and Ridge regressions with Normal Equations method
- Reduced the memory consumption of SVM prediction
- GPU:
- Fixed an issue with kernel compilation on the platforms without hardware FP64 support
❗ Known Issues
- Intel® Extension for Scikit-learn and daal4py packages installed from PyPI repository can’t be found on Debian systems (including Google Collab). Mitigation: add “site-packages” folder into Python packages searching before importing the packages:
import sys
import os
import site
sys.path.append(os.path.join(os.path.dirname(site.getsitepackages()[0]), "site-packages"))
Intel® oneAPI Data Analytics Library 2021.2
The release introduces the following changes:
Library Engineering:
- Enabled new PyPI distribution channel for daal4py:
- Four latest Python versions (3.6, 3.7, 3.8, 3.9) are supported on Linux, Windows and MacOS.
- Support of both CPU and GPU is included in the package.
- You can download daal4py using the following command:
pip install daal4py
- Introduced CMake support for oneDAL examples
Support Materials
The following additional materials were created:
- Medium blogs:
- Kaggle kernels:
What's New
Introduced new oneDAL and daal4py functionality:
- CPU:
- Hist method for Decision Forest Classification and Regression, which outperforms the existing exact method
- Bit-to-bit results reproducibility for: Linear and Ridge regressions, LASSO and ElasticNet, KMeans training and initialization, PCA, SVM, kNN Brute Force method, Decision Forest Classification and Regression
- GPU:
- Multi-node multi-GPU algorithms: KMeans (batch), Covariance (batch and online), Low order moments (batch and online) and PCA
- Sparsity support for SVM algorithm
Improved oneDAL and daal4py performance for the following algorithms:
- CPU:
- Decision Forest training Classification and Regression
- Support Vector Machines training and prediction
- Logistic Regression, Logistic Loss and Cross Entropy for non-homogeneous input types
- GPU:
- Decision Forest training Classification and Regression
- All algorithms with GPU kernels (as a result of migration to Unified Shared Memory data management)
- Reduced performance overhead for oneAPI C++ interfaces on CPU and oneAPI DPC++ interfaces on GPU
Added technical preview features in Graph Analytics:
- CPU:
- Local and Global Triangle Counting
Introduced new functionality for scikit-learn patching through daal4py:
- CPU:
- Patches for four latest scikit-learn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
- Acceleration of
roc_auc_score
function - Bit-to-bit results reproducibility for:
LinearRegression
,Ridge
,SVC
,KMeans
,PCA
,Lasso
,ElasticNet
,tSNE
,KNeighborsClassifier
,KNeighborsRegressor
,NearestNeighbors
,RandomForestClassifier
,RandomForestRegressor
Improved performance of the following scikit-learn estimators via scikit-learn patching:
- CPU
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators: training and prediction- Principal Component Analysis (PCA) scikit-learn estimator: training
- Support Vector Classification (SVC) scikit-learn estimators: training and prediction
- Support Vector Classification (SVC) scikit-learn estimator with the
probability==True
parameter: training and prediction
Fixed the following issues:
-
Scikit-learn patching:
- Improved accuracy of
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators - Fixed patching issues with
pairwise_distances
- Fixed the behavior of the
patch_sklearn
andunpatch_sklearn
functions - Fixed unexpected behavior that made accelerated functionality unavailable through scikit-learn patching if the unput was not of
float32
orfloat64
data types. Scikit-learn patching now works with all numpy data types. - Fixed a memory leak that appeared when
DataFrame
from pandas was used as an input type - Fixed performance issue for interoperability with Modin
- Improved accuracy of
-
daal4py:
- Fixed the crash of SVM and kNN algorithms on Windows on GPU
-
oneDAL:
- Improved accuracy of Decision Forest Classification and Regression on CPU
- Improved accuracy of KMeans algorithm on GPU
- Improved stability of Linear Regression and Logistic Regression algorithms on GPU
Known Issues
- oneDAL
vars.sh
script does not support kornShell
Intel® oneAPI Data Analytics Library 2021.1
The release contains all functionality of Intel® DAAL. See Intel® DAAL release notes for more details.
What's New
Library Engineering:
- Renamed the library from
Intel® Data Analytics Acceleration Library
toIntel® oneAPI Data Analytics Library
and changed the package names to reflect this. - Deprecated 32-bit version of the library.
- Introduced Intel GPU support for both
OpenCL
andLevel Zero
backends. - Introduced
Unified Shared Memory
(USM
) support
Introduced new Intel® oneDAL and daal4py functionality:
- GPU:
- Batch algorithms:
K-means
,Covariance, PCA
,Logistic Regression
,Linear Regression
,Random Forest Classification
andRegression
,Gradient Boosting Classification
andRegression
,kNN
,SVM
,DBSCAN
andLow-order moments
- Online algorithms:
Covariance
,PCA
,Linear Regression
andLow-order moments
- Added
Data Management
functionality to supportDPC++ APIs
: a new table type for representation ofSYCL-based
numeric tables (SyclNumericTable
) and an optimizedCSV data source
- Batch algorithms:
Improved Intel® oneDAL and daal4py performance for the following algorithms:
- CPU:
Logistic Regression
training and predictionk-Nearest Neighbors
prediction withBrute Force
methodLogistic Loss
andCross Entropy objective functions
Added Technical Preview Features in Graph Analytics:
- CPU:
- Undirected graph without edge and vertex weights (
undirected_adjacency_array_graph
), where vertex indices can only be of type int32 Jaccard Similarity Coefficients
for all pairs of vertices, a batch algorithm that processes the graph by blocks
- Undirected graph without edge and vertex weights (
Aligned the library with Intel® oneDAL Specification 1.0 for the following algorithms:
- CPU/GPU:
K-means
,PCA
,kNN
Introduced new functionality for scikit-learn patching through daal4py:
- CPU:
- Acceleration of
NearestNeighbors
andKNeighborsRegressor
scikit-learn estimators withBrute Force
andK-D tree
methods - Acceleration of
TSNE
scikit-learn estimator
- Acceleration of
- GPU:
- Intel GPU support in scikit-learn for
DBSCAN
,K-means
,Linear
andLogistic Regression
- Intel GPU support in scikit-learn for
Improved performance of the following scikit-learn estimators via scikit-learn patching:
- CPU:
LogisticRegression
fit, predict and predict_proba methodsKNeighborsClassifier
predict, predict_proba and kneighbors methods with“brute”
method
Known Issues
Intel® oneDAL DPC++ APIs
does not work onGEN12
graphics withOpenCL
backend. UseLevel Zero
backend for such cases.train_test_split
indaal4py
patches forScikit-learn
can produce incorrect shuffling on Windows*
Intel® DAAL 2020 Update 3
What's New in Intel® DAAL 2020 Update 3:
Introduced new Intel® DAAL and daal4py functionality:
- Brute Force method for
k-Nearest Neighbors
classification algorithm, which for datasets with more than 13 features demonstrates a better performance than the existing K-D tree method k-Nearest Neighbors
search for K-D tree and Brute Force methods with computation of distances to nearest neighbors and their indices
Extended existing Intel® DAAL and daal4py functionality:
- Voting methods for prediction in
k-Nearest Neighbors
classification and search: based on inverse-distance and uniform weighting - New parameters in
Decision Forest
classification and regression: minObservationsInSplitNode, minWeightFractionInLeafNode, minImpurityDecreaseInSplitNode, maxLeafNodes with best-first strategy and sample weights - Support of Support Vector Machine (
SVM
) decision function for Multi-class Classifier
Improved Intel® DAAL and daal4py performance for the following algorithms:
SVM
training and predictionDecision Forest
classification trainingRBF
andLinear
kernel functions
Introduced new daal4py functionality:
- Conversion of trained
XGBoost
* andLightGBM
* models into a daal4py Gradient Boosted Trees model for fast prediction - Support of
Modin
* DataFrame as an input
Introduced new functionality for scikit-learn patching through daal4py:
- Acceleration of
KNeighborsClassifier
scikit-learn estimator with Brute Force and K-D tree methods - Acceleration of
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators - Sparse input support for
KMeans
and Support Vector Classification (SVC
) scikit-learn estimators - Prediction of probabilities for
SVC
scikit-learn estimator - Support of ‘normalize’ parameter for
Lasso
andElasticNet
scikit-learn estimators
Improved performance of the following functionality for scikit-learn patching through daal4py:
train_test_split()
- Support Vector Classification (
SVC
) fit and prediction
Dependencies
fix one-algorithm build and spicific prediction case after probabilit…
DAAL 2020
DAAL 2019 Update 4
Revision: 33235
Linux* (32-bit and 64-bit binary): l_daal_oss_p_2019.4.007.tgz
macOS* (32-bit and 64-bit binary): m_daal_oss_p_2019.4.007.tgz
Note: Please, use Git client with enabled Git LFS module to clone repository if you want to get sources. We are working with GitHub support to enable correct work of archives ”Source code (zip)" and "Source code (tar.gz)".