This file documents notable changes between versions of quivr.
experimental.shmem
has been removed. Users are encouraged to use tools such asray
for parallel processing and shared memory of pyarrow types.
Table.invalid_mask
,Table.separate_invalid
have been added to allow users to select rows that fail validation checks.concatenate
now supports passing thevalidate
argument, if you want to postpone automatic validation of a table until after concatenation.Table.drop_duplicates
has been added to remove duplicate rows from a table.Table.unique_indices
has been added to return the indices of the first/last occurrence of each unique row or subset of columns.
- Concatenating empty quivr tables will no longer raise an error with incompatible attributes. Instead, attributes will be taken from the first non-empty table.
0.7.2 - 2023-10-18
- The
from_data
,from_list
,from_rows
, andfrom_pydict
constructors, which were deprecated in 0.6.0, have been removed;from_kwargs
is generally preferred when constructing from Python values. (#33)
- Quivr tables will now round-trip correctly with all types. Previously, FixedLengthLists, LargeStrings, LargeBinary, and other unusual types could be incorrectly handled when loading from flattened dataframes (#58).
0.7.0 - 2023-10-03
-
quivr.experimental.shmem
provides new utilities for run functions against quivr Tables with multiple processes in shared memory:to_shared_memory
andfrom_shared_memory
can be used to read and write a quivr Table in shared memory. This allows separate processes to work off of slices of a Table without a copy of any data, or with redundant memory usage.execute_parallel
is a function that simplifies running a function against a Table's data with multiple processes. The Table's data will be split up using a configurable partitioning strategy, and each partition will be passed to a separate worker. Results are returned as they are completed in a streaming iterator.ChunkedPartitioning
andGroupedPartitioning
are classes which represent two possible partitioning strategies: uniform chunks of fixed size, or partitions which share a common particular value. Additional partitioning strategies can be provided by providing a subclass implementation of thePartitioning
class.
-
Conversion of Tables to and from pandas DataFrames now can preserve Table attributes (#56). Three possible approaches are available:
- "add_columns": Store attribute values by repeating them in every row of the dataframe. Each attribute gets a separate column. For subtables, attributes are stored under a dot-delimited prefix.
- "attrs": Use the experimental pandas.DataFrame.attrs API to store attributes directly on the DataFrame as a dictionary.
- "drop": remove the attributes entirely. This was the old behavior.
These are enabled in the
Table.to_dataframe
method. When a dataframe is loaded withfrom_dataframe
orfrom_flat_dataframe
, attributes are inferred.
-
Column names can be dot-delimited in
Table.select
andTable.column
to reference subtables (#53). -
Column names can be dot-delimited in
Table.sort_by
to reference subtables (#54).
0.6.6 - 2023-09-27
- Columns which are masked to hide all data now can be accessed (#51).
0.6.5 - 2023-08-30
- Concatenating empty tables no longer returns a ValueError.
0.6.4 - 2023-08-24
Table.set_column
was added. This is a new method that returns a copy of the table, but with a single column replaced.
- Accessing columns is now much faster. (#47).
- Setting columns through normal Python assignment statements
(
table.x = ...
) is no longer possible. It was an accident that it worked in the first place. Instead, useTable.set_column
.
0.6.3 - 2023-08-16
- Indexing a table with a negative integer (
table[-1]
, for example) now works like it does for other Python structures: by pulling from the back. (#40)
0.6.2 - 2023-08-08
This patch release is an addendum to 0.6.1, which attempted to resolve issues with nullable subtables with non-nullable fields, but which had several issues.
- Several more fixes for loading null subtables with non-null columns: correctly handling round-tripping, loading from PyArrow tables, and more.
0.6.1 - 2023-08-08
- When a Table contains a nullable sub-table column, and that
subtable has non-nullable fields,
from_kwargs
would reject a null value for the sub-table column. This is now fixed. (#37)
0.6.0 - 2023-08-07
This release has several major changes:
Linkage
s are added.- Columns are now non-nullable by default.
- Many Table constructors are deprecated.
combine_linkages
andcombine_multikeylinkages
, utility functions for concatenating linkages, were added. (#27)- Columns now accept default values which are used when null. (#25)
- New
from_pyarrow
constructor for loading data from a PyArrow table. (#24)
- Schemas are more rigorously enforced when loading data, particularly from PyArrow Tables. All columns of the source data are checked for consistency.
- Columns are now non-nullable by default. (#35)
- Several attribute names are now reserved by quivr, and it is an error to name columns using the reserved names, which include "table", "schema", and a few other more obscure internal names. (#36)
- Column validators are run by default when constructing a Table instance. (#32)
- Attributes are now immutable by default. They can optionally be made
mutable by passing
mutable=True
in their constructors. (#28)
- The
from_data
,from_list
,from_rows
, andfrom_pydict
constructors are now deprecated;from_kwargs
is generally preferred when constructing from Python values. (#33)
- The unadvertised experimental
StringIndex
structure has been removed. Linkages do the same job much better. - The unadvertised experimental
quivr.matrix
module has been removed. Use FixedSizeList columns instead.
0.5.0 - 2023-07-21
Linkage
andMultiKeyLinkage
, two constructs for working with multiple Table instances with common keys, were added. (#21)- Documentation is now generated and sent to readthedocs. Find it at https://quivr.readthedocs.org/.
0.4.3 - 2023-07-18
Table.equals
, which checks for equality of two Table instances, has been added. (#17)- All public modules have full type annotations, now, which are verified with mypy.
- Column validators no longer crash on columns with all null values (#13)
0.4.2 - 2023-06-05
- Pandas Series objects can now be passed in to Table constructors
like
from_kwargs
. (edb7482)
0.4.1 - 2023-06-05
- Table Attributes and Columns can now be accessed as class-level
attributes. This access the
Column
orAttribute
itself rather than the data it points to.
0.4.0 - 2023-06-05
- Table "Fields" are renamed to "Columns."
0.3.4 - 2023-05-26
- Changes made to support Python 3.9 and 3.10.
0.3.3 - 2023-05-26
- Attributes are added: scalar values that can be attached to an entire Table instance. These are serialized in Table metadata so they survive encoding and decoding.
- Added
Table.empty()
method which creates a table with length zero.
0.3.2 - 2023-05-18
- Column Validators are added: tools for ensuring that the data in a Table passes checks.
0.3.1 - 2023-05-18
- Added a
from_parquet
method to Table.
- Correctly cast inputs to the right schema type when constructing a Table instance.
0.3.0 - 2023-05-17
- Added support for instance-level attributes via the
with_table
pattern.
0.2.3 - 2023-05-17
Extra release to fix an issue publishing to PyPI.
0.2.2 - 2023-05-17
- Allow nullable columns to be passed in as None via from_kwargs. (#2)
- Import Table at the package level. (#1)
0.2.1 - 2023-05-05
- Added
py.typed
file to package, hooking in to type checkers. - Make SubTableField a Generic type.
0.2.0 - 2023-05-04
Instead of naming a pyarrow.Schema
as a class-level attribute, quivr
now supports an explicit Field type which is used to describe the
fields used in a Table. Implementations of many Fields based on
PyArrow types are provided.
0.1.1 - 2023-05-02
Added a variety of convenience constructors for Table instances.
0.1.0 - 2023-05-01
First tagged release. Many, many changes to the core concept.
Initial commit - 2023-04-08
Initial commit of the original idea, implemented via metaclasses.