Skip to content

sklearn StandardScaler vs dask StandardScaler. #979

@Arunes007

Description

@Arunes007

I am getting different results from sklearn StandardScaler and dask StandardScaler.

scaler_sk = sklearn.preprocessing.StandardScaler()
scaler_d = dask_ml.preprocessing.StandardScaler()

scaler_sk.fit(df_pd[["SUMMESSAGECOUNT"]])
scaler_d.fit(df_dask[["SUMMESSAGECOUNT"]])

Dask scaler

scaler_d.mean_[0], scaler_d.var_[0]
output: (19.157653421114507, 47431.17794342375)

Sklearn Scaler

scaler_sk.mean_[0], scaler_sk.var_[0]
output: (19.157653421114507, 47431.17794342373)

I know the difference is negligible. But it is influencing my model training on prophet. Could you please suggest any way to make them identical without using compute().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions