# IntelÂ® Extension for Scikit-learn Random Forest for Yolanda dataset

In [1]:
from time import time
from sklearn import metrics
from sklearn.model_selection import train_test_split

In [2]:
from sklearn.datasets import fetch_openml
x, y = fetch_openml(name='Yolanda', return_X_y=True)

In [3]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=72)
x_train.shape, x_test.shape

((280000, 100), (120000, 100))

Intel Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock scikit-learn package. You can take advantage of the performance optimizations of Intel Extension for Scikit-learn by adding just two lines of code before the usual scikit-learn imports:

In [4]:
from sklearnex import patch_sklearn
patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


Intel(R) Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the [list of supported algorithms and parameters](https://intel.github.io/scikit-learn-intelex/algorithms.html) for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, [submit an issue on GitHub](https://github.com/intel/scikit-learn-intelex/issues).

In [5]:
params = {
    'n_estimators': 150,
    'random_state': 44,
    'n_jobs': -1
}

Training and predict Random Forest algorithm with Intel(R) Extension for Scikit-learn for Yolanda dataset

In [6]:
start = time()
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(**params).fit(x_train, y_train)
pred = rf.predict(x_test)
f"Intel(R) extension for Scikit-learn time: {(time() - start):.2f} s"

'Intel(R) extension for Scikit-learn time: 47.02 s'

In [7]:
print('Root Mean Squared Error: {:.4f}'.format(metrics.mean_squared_error(y_test, pred)))

Root Mean Squared Error: 83.6223


In order to cancel optimizations, we use *unpatch_sklearn* and reimport the class RandomForestRegressor.

In [8]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

Training and predict Random Forest algorithm with original scikit-learn library for Yolanda dataset

In [9]:
start = time()
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(**params).fit(x_train, y_train)
pred = rf.predict(x_test)
f"Original Scikit-learn time: {(time() - start):.2f} s"

'Original Scikit-learn time: 193.25 s'

In [10]:
print('Root Mean Squared Error: {:.4f}'.format(metrics.mean_squared_error(y_test, pred)))

Root Mean Squared Error: 83.8013


With scikit-learn-intelex patching you can:

- Use your scikit-learn code for training and prediction with minimal changes (a couple of lines of code);
- Fast execution training and prediction of scikit-learn models;
- Get the same quality;
- Get speedup more than **4** times.