Version 1.6.0
Improved Plotly backend support
We improved plotting support by implementing pie, histogram and box plots with Plotly plot backend. Koalas now can plot data with Plotly via:
-
DataFrame.plot.pie
andSeries.plot.pie
(#1971)
-
DataFrame.plot.hist
andSeries.plot.hist
(#1999)
-
Series.plot.box
(#2007)
In addition, we optimized histogram calculation as a single pass in DataFrame
(#1997) instead of launching each job to calculate each Series
in DataFrame
.
Operations between Series and Index
The operations between Series
and Index
are now supported as below (#1996):
>>> kser = ks.Series([1, 2, 3, 4, 5, 6, 7])
>>> kidx = ks.Index([0, 1, 2, 3, 4, 5, 6])
>>> (kser + 1 + 10 * kidx).sort_index()
0 2
1 13
2 24
3 35
4 46
5 57
6 68
dtype: int64
>>> (kidx + 1 + 10 * kser).sort_index()
0 11
1 22
2 33
3 44
4 55
5 66
6 77
dtype: int64
Support setting to a Series
via attribute access
We have added the support of setting a column via attribute assignment in DataFrame
, (#1989).
>>> kdf = ks.DataFrame({'A': [1, 2, 3, None]})
>>> kdf.A = kdf.A.fillna(kdf.A.median())
>>> kdf
A
0 1.0
1 2.0
2 3.0
3 2.0
Other new features, improvements and bug fixes
We added the following new features:
Series:
DataFrame
In addition, we also implement new parameters:
- Add min_count parameter for Frame.sum. (#1978)
- Added ddof parameter for GroupBy.std() and GroupBy.var() (#1994)
- Support ddof parameter for std and var. (#1986)
Along with the following fixes:
- Fix stat functions with no numeric columns. (#1967)
- Fix DataFrame.replace with NaN/None values (#1962)
- Fix cumsum and cumprod. (#1982)
- Use Python type name instead of Spark's in error messages. (#1985)
- Use object.__setattr__ in Series. (#1991)
- Adjust Series.mode to match pandas Series.mode (#1995)
- Adjust data when all the values in a column are nulls. (#2004)
- Fix as_spark_type to not support "bigint". (#2011)