Skip to content

Commit

Permalink
Merge pull request #315 from t20100/update-c-blosc2
Browse files Browse the repository at this point in the history
Updated `c-blosc2` v2.15.0 and `ZStd` library v1.5.6
  • Loading branch information
t20100 authored Jul 23, 2024
2 parents d93971f + 0a19402 commit 193a0ac
Show file tree
Hide file tree
Showing 338 changed files with 10,738 additions and 5,762 deletions.
6 changes: 3 additions & 3 deletions doc/information.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ HDF5 compression filters and compression libraries sources were obtained from:
* `hdf5-blosc plugin <https://github.com/Blosc/hdf5-blosc>`_ (v1.0.1)
using `c-blosc <https://github.com/Blosc/c-blosc>`_ (v1.21.5), LZ4, Snappy, ZLib and ZStd.
* hdf5-blosc2 plugin (from `PyTables <https://github.com/PyTables/PyTables/>`_ v3.9.2)
using `c-blosc2 <https://github.com/Blosc/c-blosc2>`_ (v2.13.2), LZ4, ZLib and ZStd.
using `c-blosc2 <https://github.com/Blosc/c-blosc2>`_ (v2.15.0), LZ4, ZLib and ZStd.
* `FCIDECOMP plugin <https://gitlab.eumetsat.int/open-source/data-tailor-plugins/fcidecomp>`_
(`v2.0.1 <https://gitlab.eumetsat.int/open-source/data-tailor-plugins/fcidecomp/-/tree/e88f83c03bafcd0769c167dca14aa7aabf728e1b>`_)
using `CharLS <https://github.com/team-charls/charls>`_ (v2.1.0).
Expand All @@ -94,9 +94,9 @@ HDF5 compression filters and compression libraries sources were obtained from:

Sources of compression libraries shared accross multiple filters were obtained from:

* `LZ4 v1.9.4 <https://github.com/Blosc/c-blosc2/tree/v2.13.2/internal-complibs/lz4-1.9.4>`_
* `LZ4 v1.9.4 <https://github.com/Blosc/c-blosc2/tree/v2.15.0/internal-complibs/lz4-1.9.4>`_
* `Snappy v1.2.1 <https://github.com/google/snappy>`_
* `ZStd v1.5.5 <https://github.com/Blosc/c-blosc2/tree/v2.13.2/internal-complibs/zstd-1.5.5>`_
* `ZStd v1.5.6 <https://github.com/Blosc/c-blosc2/tree/v2.15.0/internal-complibs/zstd-1.5.6>`_
* `ZLib v1.2.13 <https://github.com/Blosc/c-blosc/tree/v1.21.5/internal-complibs/zlib-1.2.13>`_

When compiled with Intel IPP, the LZ4 compression library is replaced with `LZ4 v1.9.3 <https://github.com/lz4/lz4/releases/tag/v1.9.3>`_ patched with a patch from Intel IPP 2021.7.0.
Expand Down
2 changes: 1 addition & 1 deletion src/c-blosc2/.github/workflows/fuzz.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: CI Fuzz
on: [pull_request]
on: [push, pull_request]
jobs:
Fuzzing:
runs-on: ubuntu-latest
Expand Down
11 changes: 5 additions & 6 deletions src/c-blosc2/ANNOUNCE.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# Announcing C-Blosc2 2.13.2
# Announcing C-Blosc2 2.15.0
A fast, compressed and persistent binary data store library for C.

## What is new?

This is a patch release for improving of SSSE3 detection on Visual Studio.
Also, documentation for the globally registered filters and codecs has been
added:
https://www.blosc.org/c-blosc2/reference/utility_variables.html#codes-for-filters
https://www.blosc.org/c-blosc2/reference/utility_variables.html#compressor-codecs
This is a minor release in which a new io mode
was added to memory-map files. Furthermore, the `io_cb` read API was changed, so the
`SOVERSION` was bumped. In addition, the internal zstd sources were updated to 1.5.6 and some
other improvements were made.

For more info, please see the release notes in:

Expand Down
46 changes: 20 additions & 26 deletions src/c-blosc2/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Blosc - Blocked Shuffling and Compression Library
#
# Copyright (c) 2021 The Blosc Development Team <[email protected]>
# Copyright (c) 2021 Blosc Development Team <[email protected]>
# https://blosc.org
# License: BSD 3-Clause (see LICENSE.txt)
#
Expand Down Expand Up @@ -271,53 +271,49 @@ endif()
# for newer hardware on older machines as well as cross-compilation.
message(STATUS "Building for system processor ${CMAKE_SYSTEM_PROCESSOR}")
message(STATUS "Building for compiler ID ${CMAKE_C_COMPILER_ID}")
if(CMAKE_SYSTEM_PROCESSOR STREQUAL i386 OR
CMAKE_SYSTEM_PROCESSOR STREQUAL i686 OR
CMAKE_SYSTEM_PROCESSOR STREQUAL x86_64 OR
CMAKE_SYSTEM_PROCESSOR STREQUAL amd64 OR
CMAKE_SYSTEM_PROCESSOR STREQUAL AMD64)
if(CMAKE_C_COMPILER_ID STREQUAL GNU)
if(CMAKE_SYSTEM_PROCESSOR MATCHES i386|i686|x86_64|amd64|AMD64)
if(CMAKE_C_COMPILER_ID MATCHES GNU)
# We need C99 (GNU99 more exactly)
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -std=gnu99")
set(COMPILER_SUPPORT_SSE2 TRUE)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER 4.7 OR CMAKE_C_COMPILER_VERSION VERSION_EQUAL 4.7)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL 4.7)
set(COMPILER_SUPPORT_AVX2 TRUE)
else()
set(COMPILER_SUPPORT_AVX2 FALSE)
endif()
# GCC 10.3.2 (the version in manylinux_2014) seems to have issues supporting dynamic dispatching
# of AVX512. GCC 11.4 is the first minimal version that works well here.
# That means that Linux wheels will have AVX512 disabled, but that's life.
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER 11.4 OR CMAKE_C_COMPILER_VERSION VERSION_EQUAL 11.4)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL 11.4)
set(COMPILER_SUPPORT_AVX512 TRUE)
else()
set(COMPILER_SUPPORT_AVX512 FALSE)
endif()
elseif(CMAKE_C_COMPILER_ID STREQUAL Clang OR CMAKE_C_COMPILER_ID STREQUAL AppleClang)
elseif(CMAKE_C_COMPILER_ID MATCHES Clang|AppleClang)
set(COMPILER_SUPPORT_SSE2 TRUE)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER 3.2 OR CMAKE_C_COMPILER_VERSION VERSION_EQUAL 3.2)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL 3.2)
set(COMPILER_SUPPORT_AVX2 TRUE)
else()
set(COMPILER_SUPPORT_AVX2 FALSE)
endif()
# Clang 13 is the minimum version that we know that works with AVX512 dynamic dispatch.
# Perhaps lesser versions work too, better to err on the safe side.
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER 13.0 OR CMAKE_C_COMPILER_VERSION VERSION_EQUAL 13.0)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL 13.0)
set(COMPILER_SUPPORT_AVX512 TRUE)
else()
set(COMPILER_SUPPORT_AVX512 FALSE)
endif()
elseif(CMAKE_C_COMPILER_ID STREQUAL Intel)
elseif(CMAKE_C_COMPILER_ID MATCHES Intel|IntelLLVM)
# All Intel compilers since the introduction of AVX512 in 2016 should support it, so activate all SIMD flavors
set(COMPILER_SUPPORT_SSE2 TRUE)
set(COMPILER_SUPPORT_AVX2 TRUE)
set(COMPILER_SUPPORT_AVX512 TRUE)
elseif(MSVC)
set(COMPILER_SUPPORT_SSE2 TRUE)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER 18.00.30501 OR CMAKE_C_COMPILER_VERSION VERSION_EQUAL 18.00.30501)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL 18.00.30501)
set(COMPILER_SUPPORT_AVX2 TRUE)
# AVX512 starts to be supported since Visual Studio 17 15.0
elseif(CMAKE_C_COMPILER_VERSION VERSION_GREATER 19.10.25017 OR CMAKE_C_COMPILER_VERSION VERSION_EQUAL 19.10.25017)
elseif(CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL 19.10.25017)
set(COMPILER_SUPPORT_AVX512 TRUE)
else()
set(COMPILER_SUPPORT_AVX2 FALSE)
Expand All @@ -329,17 +325,15 @@ if(CMAKE_SYSTEM_PROCESSOR STREQUAL i386 OR
# Unrecognized compiler. Emit a warning message to let the user know hardware-acceleration won't be available.
message(WARNING "Unable to determine which ${CMAKE_SYSTEM_PROCESSOR} hardware features are supported by the C compiler (${CMAKE_C_COMPILER_ID} ${CMAKE_C_COMPILER_VERSION}).")
endif()
elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL armv7l OR
CMAKE_SYSTEM_PROCESSOR STREQUAL aarch64 OR
CMAKE_SYSTEM_PROCESSOR STREQUAL arm64)
if(CMAKE_C_COMPILER_ID STREQUAL GNU)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER 5.2 OR CMAKE_C_COMPILER_VERSION VERSION_EQUAL 5.2)
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES armv7l|aarch64|arm64)
if(CMAKE_C_COMPILER_ID MATCHES GNU)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL 5.2)
set(COMPILER_SUPPORT_NEON TRUE)
else()
set(COMPILER_SUPPORT_NEON FALSE)
endif()
elseif(CMAKE_C_COMPILER_ID STREQUAL Clang OR CMAKE_C_COMPILER_ID STREQUAL AppleClang)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER 3.3 OR CMAKE_C_COMPILER_VERSION VERSION_EQUAL 3.3)
elseif(CMAKE_C_COMPILER_ID MATCHES Clang|AppleClang)
if(CMAKE_C_COMPILER_VERSION VERSION_GREATER_EQUAL 3.3)
set(COMPILER_SUPPORT_NEON TRUE)
else()
set(COMPILER_SUPPORT_NEON FALSE)
Expand All @@ -350,9 +344,9 @@ elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL armv7l OR
message(WARNING "Unable to determine which ${CMAKE_SYSTEM_PROCESSOR} hardware features are supported by the C compiler (${CMAKE_C_COMPILER_ID} ${CMAKE_C_COMPILER_VERSION}).")
endif()
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "^(ppc64le|powerpc64le)")
if(CMAKE_C_COMPILER_ID STREQUAL GNU AND CMAKE_C_COMPILER_VERSION VERSION_GREATER 8)
if(CMAKE_C_COMPILER_ID MATCHES GNU AND CMAKE_C_COMPILER_VERSION VERSION_GREATER 8)
set(COMPILER_SUPPORT_ALTIVEC TRUE)
elseif(CMAKE_C_COMPILER_ID STREQUAL Clang AND CMAKE_C_COMPILER_VERSION VERSION_GREATER 13)
elseif(CMAKE_C_COMPILER_ID MATCHES Clang AND CMAKE_C_COMPILER_VERSION VERSION_GREATER 13)
set(COMPILER_SUPPORT_ALTIVEC TRUE)
else()
set(COMPILER_SUPPORT_ALTIVEC FALSE)
Expand Down Expand Up @@ -381,13 +375,13 @@ endif()

# Set the "-msse2" build flag only if the CMAKE_C_FLAGS is not already set.
# Probably "-msse2" should be appended to CMAKE_C_FLAGS_RELEASE.
if(CMAKE_C_COMPILER_ID STREQUAL GNU OR CMAKE_C_COMPILER_ID STREQUAL Clang OR CMAKE_C_COMPILER_ID STREQUAL Intel)
if(CMAKE_C_COMPILER_ID MATCHES GNU|Clang|Intel|IntelLLVM)
if(NOT CMAKE_C_FLAGS AND COMPILER_SUPPORT_SSE2)
set(CMAKE_C_FLAGS -msse2 CACHE STRING "C flags." FORCE)
endif()
endif()

if(CMAKE_C_COMPILER_ID STREQUAL Intel OR CMAKE_C_COMPILER_ID STREQUAL Clang OR HAIKU)
if(CMAKE_C_COMPILER_ID MATCHES Intel|IntelLLVM|Clang OR HAIKU)
# We need to tell Intel and Clang compilers about what level of POSIX they support
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_XOPEN_SOURCE=600")
endif()
Expand Down
10 changes: 4 additions & 6 deletions src/c-blosc2/DEVELOPING-GUIDE.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
Some conventions used in C-Blosc2
=================================

* Use C99 designated initialization whenever possible (specially in examples).
* Use C99 designated initialization only in examples. Libraries should use C89 initialization, which is more portable, specially with C++ (designated initialization in C++ is supported only since C++20).

* Use _new and _free for memory allocating constructors and destructors and _init and _destroy for non-memory allocating constructors and destructors.

* Lines must not exceed 120 characters. If a line is too long, it must be broken into several lines.

Naming things
-------------
* Conditional bodies must always use braces, even if they are one-liners. The only exception that can be is when the conditional is a single line and the body is a single line:

Naming is one of the most time-consuming tasks, but critical for communicating effectively. Here it is a preliminary list of names that I am not comfortable with:

* We are currently calling `filters` to a data transformation function that essentially produces the same amount of data, but with bytes shuffled or transformed in different ways. Perhaps `transformers` would be a better name?
if (condition) whatever();
2 changes: 1 addition & 1 deletion src/c-blosc2/LICENSE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ BSD License
For Blosc - A blocking, shuffling and lossless compression library

Copyright (c) 2009-2018 Francesc Alted <[email protected]>
Copyright (c) 2019-present The Blosc Development Team <[email protected]>
Copyright (c) 2019-present Blosc Development Team <[email protected]>

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
Expand Down
18 changes: 16 additions & 2 deletions src/c-blosc2/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ A fast, compressed and persistent data store library for C
==========================================================


:Author: The Blosc Development Team
:Author: Blosc Development Team
:Contact: [email protected]
:URL: https://www.blosc.org
:Gitter: |gitter|
Expand Down Expand Up @@ -37,6 +37,7 @@ C-Blosc2 is the new major version of `C-Blosc <https://github.com/Blosc/c-blosc>

See a 3 minutes `introductory video to Blosc2 <https://www.youtube.com/watch?v=ER12R7FXosk>`_.


Blosc2 NDim: an N-Dimensional store
===================================

Expand Down Expand Up @@ -119,6 +120,19 @@ More info about the `improved capabilities of C-Blosc2 can be found in this talk
C-Blosc2 API and format have been frozen, and that means that there is guarantee that your programs will continue to work with future versions of the library, and that next releases will be able to read from persistent storage generated from previous releases (as of 2.0.0).


Open format
===========

The Blosc2 format is open and documented in the next documents:

* [The chunk; the basic building block](https://github.com/Blosc/c-blosc2/blob/main/README_CHUNK_FORMAT.rst)
* [The cframe; this is made of different chunks in contiguous storage](https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst)
* [The sframe; a variation of the cframe for sparse storage](https://github.com/Blosc/c-blosc2/blob/main/README_SFRAME_FORMAT.rst)
* [The b2nd metalayer; info for the n-dimensional data container](https://github.com/Blosc/c-blosc2/blob/main/README_B2ND_METALAYER.rst)

All these documents take less than 1000 lines of text, so they should be easy to read and understand. In our opinion, this is very important for the long-term success of the library, as it allows for third-party implementations of the format, and also for the users to understand what is going on under the hood.


Python wrapper
==============

Expand Down Expand Up @@ -250,4 +264,4 @@ See `THANKS document <https://github.com/Blosc/c-blosc2/blob/main/THANKS.rst>`_.

----

-- The Blosc Development Team. **We make compression better.**
-- Blosc Development Team. **We make compression better.**
6 changes: 3 additions & 3 deletions src/c-blosc2/README_CFRAME_FORMAT.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,12 @@ The header contains information needed to decompress the Blosc chunks contained
+-- [msgpack] int32

The filter pipeline is stored next in the header. It contains 6 slots, one for each filter that can be applied. For
each slot there is a byte used to store the filter code in `filter_codes` and an associated byte used to store any
possible filter meta-info in `filter_meta`::
each slot there is a byte used to store the filter ID in `filters` and an associated byte used to store any
possible filter meta-info in `filters_meta`::


|-45|-46|-47|-48|-49|-4A|-4B|-4C|-4D|-4E|-4F|-50|-51|-52|-53|-54|-55|-56|
| d2| X | filter_codes |_f4|_f5| filter_meta | | |
| d2| X | filters |_f4|_f5| filters_meta | | |
|---|---|-------------------------------|-------------------------------|
^ ^ ^ ^ ^ ^
| | | | | +-- reserved
Expand Down
21 changes: 11 additions & 10 deletions src/c-blosc2/README_CHUNK_FORMAT.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Starting in Blosc 2.0.0, there is an extension of the header above that allows
for encoding blocks with a filter pipeline::

1+|-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-|-8-|-9-|-A-|-B-|-C-|-D-|-E-|-F-|
| filter codes | ^ | ^ | filter meta | ^ | ^ |
| filters | ^ | ^ | filters_meta | ^ | ^ |
| | | |
| +- compcode_meta | +-blosc2_flags
+- user-defined codec +-reserved
Expand Down Expand Up @@ -105,8 +105,8 @@ for encoding blocks with a filter pipeline::
:cbytes:
(``int32``) Compressed size of the buffer (including this header).

:filter_codes:
(``uint8``) Filter code.
:filters:
(``uint8``) Filter ID.

:``0``:
No shuffle (for compatibility with Blosc1).
Expand All @@ -121,11 +121,12 @@ for encoding blocks with a filter pipeline::
:``4``:
Truncate precision filter.
:``5``:
User-defined filter.
Sentinel. IDs larger than this are either global registered or user-defined filters.

The filter pipeline has 6 reserved slots for the filters. They are applied sequentially to the chunk according
to their index in increasing order. The type of filter applied is specified by the `filter_code`. Each
`filter_code` has an associated field in `filter_meta` that can contain metadata about the filter.
The filter pipeline has 6 reserved slots for the filters IDs. They are applied sequentially
to the chunk according to their index (in increasing order). The type of filter applied is
specified by the ID. Each ID has an associated field in `filters_meta` that can contain metadata
about the filter.

:udcodec:
(``uint8``) User-defined codec identifier.
Expand All @@ -135,10 +136,10 @@ for encoding blocks with a filter pipeline::

Metadata associated with the compression codec.

:filter_meta:
(``uint8``) Filter metadata.
:filters_meta:
(``uint8``) Filter metadata associated to each filter ID.

Metadata associated with the filter code.
Metadata associated with the filter ID.

:blosc2_flags:
(``bitfield``) The flags for a Blosc2 buffer.
Expand Down
76 changes: 76 additions & 0 deletions src/c-blosc2/RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,82 @@
Release notes for C-Blosc2
==========================

Changes from 2.14.4 to 2.15.0
=============================

* Removed some duplicated functions. See https://github.com/Blosc/c-blosc2/issues/503.

* Added a new io mode to memory map files. This forced to change the `io_cb` read API.
See https://github.com/Blosc/c-blosc2/blob/main/tests/test_mmap.c to see an example on
how to use it.

* Updated the `SOVERSION` to 4 due to the API change in `io_cb` read.

* Added functions to get cparams, dparams, storage and io defaults respectively.

* Internal zstd sources updated to 1.5.6.

* Fixed a bug when setting a slice using prefilters.


Changes from 2.14.3 to 2.14.4
=============================

* Bumped SONAME due to recent API changes. See https://github.com/Blosc/c-blosc2/issues/581.


Changes from 2.14.2 to 2.14.3
=============================

* More fixes for internal fuzzer.


Changes from 2.14.1 to 2.14.2
=============================

* Fixes for CVE-2024-3203 and CVE-2024-3204.


Changes from 2.14.0 to 2.14.1
=============================

* When loading plugins, first try with `python` and then `python3`.
This is because many linux distros do not have `python` as a
symlink to `python3` anymore.


Changes from 2.13.2 to 2.14.0
=============================

* Fixed a bug preventing buffers to be appended to empty (0-sized) b2nd arrays.

* New acceleration path for `b2nd_append()`. This new path is
much faster (up to 4x) than the previous one, specially for large arrays.
See `bench/bench_stack_append.c` for the bench of use.

* New examples for using the `b2nd_set_slice_cbuffer()` and
`b2nd_append()` functions for adding data into existing b2nd arrays.
See `examples/example_stack_images.c`.

* Now, ``python3`` is used for finding plugins instead of ``python``.
This is because many linux distros do not have ``python`` as a symlink
to ``python3`` anymore.

* New round of fixing warnings. Now, C-Blosc2 should be relatively free of them.

* Small performance tweak for clevel 1 in BloscLZ codec.

* Fixed a leak in frame code. Closes #591. Thanks to @LuMingYinDetect.

* Disable shuffle repeat in filters pipeline. This was broken
since the initial implemented, and it was never documented.
Also, compression ratios do not seem to be improved in our experiments,
so this capability has been removed completely.

* Support for new Intel compilers (2023.0.1 and on). Fixes #533.
Thanks to Nick Papior.


Changes from 2.13.1 to 2.13.2
=============================

Expand Down
Loading

0 comments on commit 193a0ac

Please sign in to comment.