Skip to content

Version 1.6.0

Compare
Choose a tag to compare
@HyukjinKwon HyukjinKwon released this 22 Jan 09:40
· 111 commits to master since this release

Improved Plotly backend support

We improved plotting support by implementing pie, histogram and box plots with Plotly plot backend. Koalas now can plot data with Plotly via:

  • DataFrame.plot.pie and Series.plot.pie (#1971)
    Screen Shot 2021-01-22 at 6 32 48 PM

  • DataFrame.plot.hist and Series.plot.hist (#1999)
    Screen Shot 2021-01-22 at 6 32 38 PM

  • Series.plot.box (#2007)
    Screen Shot 2021-01-22 at 6 32 31 PM

In addition, we optimized histogram calculation as a single pass in DataFrame (#1997) instead of launching each job to calculate each Series in DataFrame.

Operations between Series and Index

The operations between Series and Index are now supported as below (#1996):

>>> kser = ks.Series([1, 2, 3, 4, 5, 6, 7])
>>> kidx = ks.Index([0, 1, 2, 3, 4, 5, 6])

>>> (kser + 1 + 10 * kidx).sort_index()
0     2
1    13
2    24
3    35
4    46
5    57
6    68
dtype: int64
>>> (kidx + 1 + 10 * kser).sort_index()
0    11
1    22
2    33
3    44
4    55
5    66
6    77
dtype: int64

Support setting to a Series via attribute access

We have added the support of setting a column via attribute assignment in DataFrame, (#1989).

>>> kdf = ks.DataFrame({'A': [1, 2, 3, None]})
>>> kdf.A = kdf.A.fillna(kdf.A.median())
>>> kdf
     A
0  1.0
1  2.0
2  3.0
3  2.0

Other new features, improvements and bug fixes

We added the following new features:

Series:

DataFrame

In addition, we also implement new parameters:

  • Add min_count parameter for Frame.sum. (#1978)
  • Added ddof parameter for GroupBy.std() and GroupBy.var() (#1994)
  • Support ddof parameter for std and var. (#1986)

Along with the following fixes:

  • Fix stat functions with no numeric columns. (#1967)
  • Fix DataFrame.replace with NaN/None values (#1962)
  • Fix cumsum and cumprod. (#1982)
  • Use Python type name instead of Spark's in error messages. (#1985)
  • Use object.__setattr__ in Series. (#1991)
  • Adjust Series.mode to match pandas Series.mode (#1995)
  • Adjust data when all the values in a column are nulls. (#2004)
  • Fix as_spark_type to not support "bigint". (#2011)