Release Version 1.6.0 · databricks/koalas

Improved Plotly backend support

We improved plotting support by implementing pie, histogram and box plots with Plotly plot backend. Koalas now can plot data with Plotly via:

DataFrame.plot.pie and Series.plot.pie (#1971)
DataFrame.plot.hist and Series.plot.hist (#1999)
Series.plot.box (#2007)

In addition, we optimized histogram calculation as a single pass in DataFrame (#1997) instead of launching each job to calculate each Series in DataFrame.

Operations between Series and Index

The operations between Series and Index are now supported as below (#1996):

>>> kser = ks.Series([1, 2, 3, 4, 5, 6, 7])
>>> kidx = ks.Index([0, 1, 2, 3, 4, 5, 6])

>>> (kser + 1 + 10 * kidx).sort_index()
0     2
1    13
2    24
3    35
4    46
5    57
6    68
dtype: int64
>>> (kidx + 1 + 10 * kser).sort_index()
0    11
1    22
2    33
3    44
4    55
5    66
6    77
dtype: int64

Support setting to a `Series` via attribute access

We have added the support of setting a column via attribute assignment in DataFrame, (#1989).

>>> kdf = ks.DataFrame({'A': [1, 2, 3, None]})
>>> kdf.A = kdf.A.fillna(kdf.A.median())
>>> kdf
     A
0  1.0
1  2.0
2  3.0
3  2.0

Other new features, improvements and bug fixes

We added the following new features:

Series:

factorize (#1972)
sem (#1993)

DataFrame

insert (#1983)
sem (#1993)

In addition, we also implement new parameters:

Add min_count parameter for Frame.sum. (#1978)
Added ddof parameter for GroupBy.std() and GroupBy.var() (#1994)
Support ddof parameter for std and var. (#1986)

Along with the following fixes:

Fix stat functions with no numeric columns. (#1967)
Fix DataFrame.replace with NaN/None values (#1962)
Fix cumsum and cumprod. (#1982)
Use Python type name instead of Spark's in error messages. (#1985)
Use object.__setattr__ in Series. (#1991)
Adjust Series.mode to match pandas Series.mode (#1995)
Adjust data when all the values in a column are nulls. (#2004)
Fix as_spark_type to not support "bigint". (#2011)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.6.0

Improved Plotly backend support

Operations between Series and Index

Support setting to a `Series` via attribute access

Other new features, improvements and bug fixes

Version 1.6.0

Improved Plotly backend support

Operations between Series and Index

Support setting to a Series via attribute access

Other new features, improvements and bug fixes

Support setting to a `Series` via attribute access