Skip to content

pwwang/datar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Feb 28, 2025
e2d23fc · Feb 28, 2025
Aug 10, 2023
Feb 19, 2025
Feb 28, 2025
Feb 28, 2025
Mar 14, 2024
Dec 15, 2022
Mar 23, 2022
Dec 2, 2022
Nov 28, 2020
Dec 29, 2023
Mar 3, 2021
Mar 8, 2021
Feb 19, 2025
Feb 28, 2025
Feb 28, 2025
Sep 3, 2021
Aug 11, 2023

Repository files navigation

datar

A Grammar of Data Manipulation in python

Pypi Github Building Docs and API Codacy Codacy coverage Downloads

Documentation | Reference Maps | Notebook Examples | API

datar is a re-imagining of APIs for data manipulation in python with multiple backends supported. Those APIs are aligned with tidyverse packages in R as much as possible.

Installation

pip install -U datar

# install with a backend
pip install -U datar[pandas]

# More backends support coming soon

Backends

Repo Badges
datar-numpy 3 18
datar-pandas 4 19
datar-arrow 23 24

Example usage

# with pandas backend
from datar import f
from datar.dplyr import mutate, filter_, if_else
from datar.tibble import tibble
# or
# from datar.all import f, mutate, filter_, if_else, tibble

df = tibble(
    x=range(4),  # or c[:4]  (from datar.base import c)
    y=['zero', 'one', 'two', 'three']
)
df >> mutate(z=f.x)
"""# output
        x        y       z
  <int64> <object> <int64>
0       0     zero       0
1       1      one       1
2       2      two       2
3       3    three       3
"""

df >> mutate(z=if_else(f.x>1, 1, 0))
"""# output:
        x        y       z
  <int64> <object> <int64>
0       0     zero       0
1       1      one       0
2       2      two       1
3       3    three       1
"""

df >> filter_(f.x>1)
"""# output:
        x        y
  <int64> <object>
0       2      two
1       3    three
"""

df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter_(f.z==1)
"""# output:
        x        y       z
  <int64> <object> <int64>
0       2      two       1
1       3    three       1
"""
# works with plotnine
# example grabbed from https://github.com/has2k1/plydata
import numpy
from datar import f
from datar.base import sin, pi
from datar.tibble import tibble
from datar.dplyr import mutate, if_else
from plotnine import ggplot, aes, geom_line, theme_classic

df = tibble(x=numpy.linspace(0, 2 * pi, 500))
(
    df
    >> mutate(y=sin(f.x), sign=if_else(f.y >= 0, "positive", "negative"))
    >> ggplot(aes(x="x", y="y"))
    + theme_classic()
    + geom_line(aes(color="sign"), size=1.2)
)

example

# very easy to integrate with other libraries
# for example: klib
import klib
from pipda import register_verb
from datar import f
from datar.data import iris
from datar.dplyr import pull

dist_plot = register_verb(func=klib.dist_plot)
iris >> pull(f.Sepal_Length) >> dist_plot()

example

Testimonials

@coforfe:

Thanks for your excellent package to port R (dplyr) flow of processing to Python. I have been using other alternatives, and yours is the one that offers the most extensive and equivalent to what is possible now with dplyr.