Release Version 1.3.0 · databricks/koalas

pandas 1.1 support

We verified the behaviors of pandas 1.1 in Koalas. Koalas now supports pandas 1.1 officially (#1688, #1822, #1829).

Support for non-string names

Now we support for non-string names (#1784). Previously names in Koalas, e.g., df.columns, df.colums.names, df.index.names, needed to be a string or a tuple of string, but it should allow other data types which are supported by Spark.

Before:

>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Index(['0', '1'], dtype='object')

After:

>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Int64Index([0, 1], dtype='int64')

Improve `distributed-sequence` default index

The performance is improved when creating a distributed-sequence as a default index type by avoiding the interaction between Python and JVM (#1699).

Standardize binary operations between int and str columns

Make behaviors of binary operations (+, -, *, /, //, %) between int and str columns consistent with respective pandas behaviors (#1828).

It standardizes binary operations as follows:

+: raise TypeError between int column and str column (or string literal)
*: act as spark SQL repeat between int column(or int literal) and str columns; raise TypeError if a string literal is involved
-, /, //, %(modulo): raise TypeError if a str column (or string literal) is involved

Other new features and improvements

We added the following new features:

DataFrame:

product (#1739)
from_dict (#1778)
pad (#1786)
backfill (#1798)

Series:

reindex (#1737)
explode (#1777)
pad (#1786)
argmin (#1790)
argmax (#1790)
argsort (#1793)
backfill (#1798)

Index:

inferred_type (#1745)
item (#1744)
is_unique (#1766)
asi8 (#1764)
is_type_compatible (#1765)
view (#1788)
insert (#1804)

MultiIndex:

inferred_type (#1745)
item (#1744)
is_unique (#1766)
asi8 (#1764)
is_type_compatible (#1765)
from_frame (#1762)
view (#1788)
insert (#1804)

GroupBy:

get_group (#1783)

Other improvements

Fix DataFrame.mad to work properly (#1749)
Fix Series name after binary operations. (#1753)
Fix GroupBy.cum~ for matching with pandas' behavior (#1708)
Fix cumprod to work properly with Integer columns. (#1750)
Fix DataFrame.join for MultiIndex (#1771)
Exception handling for from_frame properly (#1791)
Fix iloc for slice(None, 0) (#1767)
Fix Series.__repr__ when Series.name is None. (#1796)
DataFrame.reindex supports koalas Index parameter (#1741)
Fix Series.fillna with inplace=True on non-nullable column. (#1809)
Input check in various APIs (#1808, #1810, #1811, #1812, #1813, #1814, #1816, #1824)
Fix to_list work properly in pandas==0.23 (#1823)
Fix Series.astype to work properly (#1818)
Frame.groupby supports dropna (#1815)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.3.0

pandas 1.1 support

Support for non-string names

Improve `distributed-sequence` default index

Standardize binary operations between int and str columns

Other new features and improvements

Other improvements

Version 1.3.0

pandas 1.1 support

Support for non-string names

Improve distributed-sequence default index

Standardize binary operations between int and str columns

Other new features and improvements

Other improvements

Improve `distributed-sequence` default index