Version 1.3.0
pandas 1.1 support
We verified the behaviors of pandas 1.1 in Koalas. Koalas now supports pandas 1.1 officially (#1688, #1822, #1829).
Support for non-string names
Now we support for non-string names (#1784). Previously names in Koalas, e.g., df.columns
, df.colums.names
, df.index.names
, needed to be a string or a tuple of string, but it should allow other data types which are supported by Spark.
Before:
>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Index(['0', '1'], dtype='object')
After:
>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Int64Index([0, 1], dtype='int64')
Improve distributed-sequence
default index
The performance is improved when creating a distributed-sequence
as a default index type by avoiding the interaction between Python and JVM (#1699).
Standardize binary operations between int and str columns
Make behaviors of binary operations (+
, -
, *
, /
, //
, %
) between int
and str
columns consistent with respective pandas behaviors (#1828).
It standardizes binary operations as follows:
+
: raiseTypeError
between int column and str column (or string literal)*
: act as spark SQLrepeat
between int column(or int literal) and str columns; raiseTypeError
if a string literal is involved-
,/
,//
,%(modulo)
: raiseTypeError
if a str column (or string literal) is involved
Other new features and improvements
We added the following new features:
DataFrame:
Series:
reindex
(#1737)explode
(#1777)pad
(#1786)argmin
(#1790)argmax
(#1790)argsort
(#1793)backfill
(#1798)
Index:
inferred_type
(#1745)item
(#1744)is_unique
(#1766)asi8
(#1764)is_type_compatible
(#1765)view
(#1788)insert
(#1804)
MultiIndex:
inferred_type
(#1745)item
(#1744)is_unique
(#1766)asi8
(#1764)is_type_compatible
(#1765)from_frame
(#1762)view
(#1788)insert
(#1804)
GroupBy:
get_group
(#1783)
Other improvements
- Fix DataFrame.mad to work properly (#1749)
- Fix Series name after binary operations. (#1753)
- Fix GroupBy.cum~ for matching with pandas' behavior (#1708)
- Fix cumprod to work properly with Integer columns. (#1750)
- Fix DataFrame.join for MultiIndex (#1771)
- Exception handling for from_frame properly (#1791)
- Fix iloc for slice(None, 0) (#1767)
- Fix Series.__repr__ when Series.name is None. (#1796)
- DataFrame.reindex supports koalas Index parameter (#1741)
- Fix Series.fillna with inplace=True on non-nullable column. (#1809)
- Input check in various APIs (#1808, #1810, #1811, #1812, #1813, #1814, #1816, #1824)
- Fix to_list work properly in pandas==0.23 (#1823)
- Fix Series.astype to work properly (#1818)
- Frame.groupby supports dropna (#1815)