added predictors #86

feefladder · 2025-04-02T02:08:52Z

Added predictors and tests.

Since floating point predictors shuffle horizontal padding into the output, quite a lot more information is needed, so I made a public PredictorInfo struct with public methods that give tile/chunk info.

Summary:

Added crate-public unpredict_float/hdiff functions
- Endianness fixing is together with predictors, since the horizontal predictor fixes endianness first and then does the differencing, whereas the floating point predictor ~~first does the horizontal differencing (on bytes) and then fixes endianness together with the shuffling.~~ always orders bytes be-like spec pdf
- Predictor::None -> use fix_endianness
- Predictor::Horizontal horizontal prediction and endianness, inside unpredict_hdiff
- Predictor::Float floating point prediction and endianness. based on this comment
Because endianness is needed info, include in in IFD and their from_tags function.
Added a PredictorInfo struct, inspired by tiff2.
- provided functions are expected to be used by the user in their own implementation:
  - included PlanarConfiguration, even though no decoding actually supports Planar, only Chunky
  - included SampleFormat, even though we don't test for them in predictors
  - made functions work on Planar, except if bits_per_sample is non-homogeneous
- has all needed info to determine:
  - byte width of a chunk row
  - number of samples and their size(s)
Made the abstraction that a stripped tiff = tiled tiff with chunk_width=image_width
- In image-tiff and tiff2, strips and tiles are kept separate, where the end result is that the same calculations are done through different functions with different implementations.
- I wanted to make this abstraction earlier in tiff2, but realized it too late
- Can be easily undone by erroring when trying to create a PredictorInfo on non-tiled tiff

Some notes:

I did put in the possibility for multiple sample formats, which links our lifetime to the tile... I think it'd also be perfectly fine to accept only a single value, but then I think it'd also make sense to reflect that at IFD?
image-tiff does not support horizontal predictor for floating-point rasters, GDAL does. Since here the two layers are decoupled, I didn't put any checks, but just give the user bogus output if they give bogus input.
output being Bytes, rather than having a &mut [u8] function input. The &mut [u8] would be preferred by me, because then it can be directly read into a user-provided buffer that has also ensured the alignment of the buffer (e.g. using initializing a Vec<f32> buffer and then bytemucking to &mut [u8])

src/tile.rs

src/predictor.rs

kylebarron · 2025-04-02T15:52:29Z

src/predictor.rs

+        predictor_info: &PredictorInfo,
+        tile_x: u32,
+        tile_y: u32,
+    ) -> AsyncTiffResult<Bytes>; // having this Bytes will give alignment issues later on


Why? Bytes is pretty much just Arc<Vec<u8>>

So if:

the tiff is something with alignment>1, e.g. f32

the global allocator gives out a misaligned Vec (which is doesn't often do)
then the user has two options:

copy over the data into a Vec using f32::from_ne_bytes()

use bytemuck and hope for the best
I looked into this quite deeply, and afaik, most "standard" allocators allocate with alignment 8, it would just save a copy in my mind?

I think instead we should discuss: at what points should be storing Vec<u8> and at what points be storing structured array types.

I think to add to my previous comment: the question is where we convert out of Bytes. I think the core networking trait (AsyncFileReader) should remain as it is, where get_bytes returns bytes::Bytes. Most networking code, at least reqwest and object_store return buffers as Bytes.

There's no way to convert from a Bytes to a Vec<u8> zero-copy. (You can sometimes convert from a Bytes to BytesMut zero-copy). So that implies at some point we make a data copy from a Bytes into a Vec<T> in order to be safe. Or we could use similar code as from the Arrow project and build typed interfaces on top of an Arc<Bytes>, such as their Buffer type.

I think to add to my previous comment: the question is where we convert out of Bytes. I think the core networking trait (AsyncFileReader) should remain as it is, where get_bytes returns bytes::Bytes. Most networking code, at least reqwest and object_store return buffers as Bytes.

I'm all up for not changing AsyncFileReader. I would say after the decompression step is where we can have typed arrays, since decompression itself is not zero-copy (except for no compression). In contrast to reqwest and object_store, we actually know the underlying datatype

Then for the endianness fixing and horizontal predictor, it is also nice to have exclusive access to the underlying buffer (&mut[u8]), since that is zero-copy (not the float predictor). Passing typed arrays over to the predictor doesn't make much sense there imho either, since the operations don't differentiate between e.g. f64 and u64 and it's already quite complex as-is.

so I thought something like:

// pseudocode fn Tile.decode(&self, &decoder) -> AsyncTiffResult<DecodingResult> { // tiff2's DecodingResult let mut res: DecodingResult = todo!() // smart buffer sizing decoder.decode(self.compressed_bytes, &mut res.buf_mut()); //bytemuck casting match self.predictor_info.predictor { // mutates in-place Predictor::Horizontal => unpredict_horizontal(&mut res.buf_mut(), &self.predictor_info, self.x) // also mutates in-place, but uses a copy Predictor::Float => unpredict_float(&mut res.buf_mut(), &self.predictor_info, self.x, self.y) } }

But maybe put this in a separate issue/PR? I didn't put it in here, because this PR was already medium-large and "my ideal situation^" would change some non-related things.

There's no way to convert from a Bytes to a Vec<u8> zero-copy. (You can sometimes convert from a Bytes to BytesMut zero-copy). So that implies at some point we make a data copy from a Bytes into a Vec<T> in order to be safe. Or we could use similar code as from the Arrow project and build typed interfaces on top of an Arc<Bytes>, such as their Buffer type.

In the current (this PR) implementation, I already did a BytesMut::From() quite some times, even though we were still in the same decoding step.

Sidenote: there is also some discussion going on over at image-tiff, where they want to just output a Bytes or &[u8], but then also the alignment issue was raised and somewhat ignored.

but... separate issue/PR? I think this discussion is more about optimization/api than predictors? (even though they overlap quite a bit, could also be here?) #87

src/predictor.rs

src/tile.rs

geospatial-jeff · 2025-04-02T19:12:32Z

src/tile.rs

+    }
+
+    /// The number of chunks in the horizontal (x) direction
+    pub fn chunks_across(&self) -> u32 {


To add to my previous comment on removing the PredictorInfo struct; both chunks_across and chunks_down are useful outside of handling predictors. It would make sense to include these on the IFD struct (ex. IFD.chunks_across).

Even if we keep the PredictorInfo, as I think we probably should, we can put this functionality in a shared TileMath trait which both the IFD and PredictorInfo implement

also the PredictorInfo separate made tests easier than having to build a full IFD with partial unused required data.

One other option still is to add all PredictorInfo attributes to Tile and then passing a &Tile into the unpredict_... functions for a bit more of a flat struct layout.

Then implement the TileMath trait for Tile and IFD, which makes a lot of sense api-wise

kylebarron · 2025-04-02T20:46:31Z

I made a PR with suggested changes into this branch. See feefladder#1

feefladder · 2025-04-02T21:42:50Z

temporary sadness: "Ah noo... I just finished incorporating feedback :'("

feefladder · 2025-04-03T00:12:05Z

src/predictor.rs

+    ) -> AsyncTiffResult<Bytes> {
+        let output_row_stride = predictor_info.output_row_stride(tile_x)?;
+        let mut res: BytesMut =
+            BytesMut::zeroed(output_row_stride * predictor_info.output_rows(tile_y)?);


Here now I did an output_rows, which is more if we have Planar planar config. I thought then at least we can give the user the data and they can split it themselves?

So now the Planar output will be:

[ Red..., Green., Blue.., ]

where each band is chunk_width_pixels()*chunk_height_pixels()

I thought this made sense, because the decoder (decompression) step decodes the entire data, which is like structured like this

feefladder · 2025-04-03T00:12:41Z

src/predictor.rs

+    /// the number of rows the output has, taking padding and PlanarConfiguration into account.
+    fn output_rows(&self, y: u32) -> AsyncTiffResult<usize> {
+        match self.planar_configuration {
+            PlanarConfiguration::Chunky => Ok(self.chunk_height_pixels(y)? as usize),
+            PlanarConfiguration::Planar => {
+                Ok((self.chunk_height_pixels(y)? as usize)
+                    .saturating_mul(self.samples_per_pixel as _))
+            }
+        }
+    }
+
+    fn bits_per_pixel(&self) -> usize {
+        match self.planar_configuration {
+            PlanarConfiguration::Chunky => {
+                self.bits_per_sample as usize * self.samples_per_pixel as usize
+            }
+            PlanarConfiguration::Planar => self.bits_per_sample as usize,
+        }
+    }


these depend on planar_config

feefladder · 2025-04-03T00:14:23Z

Thanks @kylebarron for the changes!

… to fix_endianness

feefladder · 2025-04-03T04:38:14Z

src/predictor.rs

+    #[cfg(target_endian = "little")]
+    if let Endianness::LittleEndian = byte_order {
+        return buffer;
+    }
+    #[cfg(target_endian = "big")]
+    if let Endianness::BigEndian = byte_order {
+        return buffer;
+    }


I thought splitting the cfg->noop up here was a bit clearer, before cfgs were mixed inside the match statement

feefladder · 2025-04-03T04:41:21Z

src/tile.rs

+            Predictor::None => Ok(fix_endianness(
+                decoded_tile,
+                self.predictor_info.endianness(),
+                self.predictor_info.bits_per_sample(),
+            )),
+            Predictor::Horizontal => {
+                unpredict_hdiff(decoded_tile, &self.predictor_info, self.x as _)
+            }
+            Predictor::FloatingPoint => {
+                unpredict_float(decoded_tile, &self.predictor_info, self.x as _, self.y as _)
+            }


I removed the trait and only have crate-public functions now

This should also help with less-copy as mocked up in #87, since float predictor doesn't do in-place modification and having a shared trait doesn't allow the differentiation

feefladder · 2025-04-03T13:00:54Z

resolved: On another note: Since now there is this abstraction of "Chunks" for tiles and strips, the name Tile doesn't make much sense anymore for stripped tiffs, but I think that's fine, and is merely a documentation issue

weiji14

Code looks reasonable to me, only some minor suggestions around the docs, and one question I had around the reverse floating point prediction implementation.

src/tiff/error.rs

weiji14 · 2025-06-03T02:13:06Z

src/predictor.rs

+        input[i] = input[i].wrapping_add(input[i - samples]);
+    }
+    // reverse byte shuffle and fix endianness
+    for (i, chunk) in output.chunks_mut(4).enumerate() {


Would chunks_exact_mut be ok to use here? Is there a chance that chunks may not have exactly 4 bytes at the end?

should be ok, since the chunk-ness is based on f32 being 4 bytes, also at the f64 then

src/predictor.rs

Co-authored-by: Wei Ji <[email protected]>

weiji14

Looks good to me, I'll merge in ~24 hours if there are no objections. Thanks again @feefladder!

feefladder · 2025-06-12T10:24:01Z

maybe ping @kylebarron who is/was main dev of this repo?

weiji14 · 2025-06-12T10:29:31Z

Oops, saw your message too late. Kyle's said he hasn't got much time for async-tiff lately, so he asked me to review. Feel free to open more patches as needed and I'll try my best to review them 😄

feefladder · 2025-06-12T10:46:12Z

ah ok great, I have one laying around for #87, lets see if that goes...

there was also a thingy around #82, where he put a lot of work in and I didn't appreciate (I was more like "why don't you merge my changes"). Turns out the direction he was going was actually better, (allowing for stateless reading and diversifying tag reading in downstream code) but #82 doesn't allow me to do, since some functions/structs I need are private. now since there were emotions involved and I think Github doesn't cater well to those types of conversations it all stalled a bit (I think also #83). I think all very solveable, but a bit sensitive I think? <- do you know a good way forward there?

but first lemme PR for #87

weiji14 · 2025-06-12T11:43:39Z

I'm not well versed enough in Rust to know whether these low-level implementation details matter, and we're all probably desiring slightly different things out of this library for very different use-cases (e.g. web-tiling, game-engines, gpu-shaders, etc). I say we can try and work on some low hanging fruit, and work on some core abstractions (e.g. I really want #100 to be solved), and anything super specific could live in our own personal repos like https://github.com/feefladder/tiff2, https://github.com/weiji14/cog3pio, etc. How does that sound?

feefladder · 2025-06-12T20:17:45Z

Sounds good! tiff2 also implements all low-level stuff asyncly, and I think that's better to do only in one place to prevent fragmentation of the ecosystem. Because I spent most time on tiff2, where the biggest chunk was low-level stuff, I don't really know that much about what would be needed for parsing geo tags or making a registry for that (what is needed for most tag parsing?). I could chip in on #100 a bit with some thoughts.

I already made a start at something called async-tiff-wasm (maybe async-cog-wasm is a better name) that would do the more specific stuff for reading cogs into wasm games etc. I think I'll open an issue about making #82 work for me when the need arises, for now I need to finish a report about these things.

kylebarron · 2025-06-13T20:31:13Z

src/predictor.rs

+                64 => rev_predict_f64(in_buf, out_buf, predictor_info.samples_per_pixel as _),
+                _ => {
+                    return Err(AsyncTiffError::General(format!(
+                        "thou shalt not predict f{bit_depth:?}"


In my opinion this is unprofessional language for a serious repository

feefladder added 4 commits April 2, 2025 03:58

added predictors

d49342c

improved tests, fixed some bugs

df407ba

added docs+doctests

c37e067

fixed clippy, doctest improvements

9f704d8

feefladder force-pushed the predictors branch 2 times, most recently from 19f36f8 to e0a80af Compare April 2, 2025 14:17

added padding test, fixed corresponding bugs, removed printlns

6c9a8cf

feefladder force-pushed the predictors branch from e0a80af to 6c9a8cf Compare April 2, 2025 14:20

kylebarron reviewed Apr 2, 2025

View reviewed changes

src/tile.rs Outdated Show resolved Hide resolved

kylebarron reviewed Apr 2, 2025

View reviewed changes

src/tile.rs Outdated Show resolved Hide resolved

kylebarron reviewed Apr 2, 2025

View reviewed changes

src/predictor.rs Show resolved Hide resolved

kylebarron reviewed Apr 2, 2025

View reviewed changes

src/predictor.rs Outdated Show resolved Hide resolved

geospatial-jeff reviewed Apr 2, 2025

View reviewed changes

kylebarron added 11 commits April 2, 2025 15:16

Remove registry

a3291b0

Rename to Unpredict

b3640c0

Change to pub(crate) fields

67eca9c

Change to pub(crate)

a50fb81

Remove lifetime and store a single element for bits_per_pixel

2b3fe03

Remove unused planar configuration

d754e14

chunk_width and chunk_height without unwrap

cda2bcc

Move PredictorInfo into predictor.rs

4787983

Remove unnecessary doctests and unused code

931f58d

Only call chunks_* once

2ac868a

Ensure no copies when endianness matches system endianness

be74506

feefladder force-pushed the predictors branch from fddeed5 to 6c9a8cf Compare April 2, 2025 23:00

added planar_configuration back, updated bits_per_pixel and added tests

a0ff033

feefladder mentioned this pull request Apr 3, 2025

Where to go from Bytes to DecodingResult? #87

Open

silly clippy things

26176f9

feefladder commented Apr 3, 2025

View reviewed changes

removed UnPredict trait in favour of separate functions; small change…

111630b

… to fix_endianness

feefladder commented Apr 3, 2025

View reviewed changes

added doc comment clarifying that strips are also tiles

73d4894

feefladder force-pushed the predictors branch from 6162334 to 73d4894 Compare April 3, 2025 13:17

weiji14 mentioned this pull request May 6, 2025

Raise NotFoundError rather than panic on non-existent file #90

Closed

weiji14 reviewed Jun 3, 2025

View reviewed changes

feefladder and others added 3 commits June 10, 2025 12:56

Update src/tiff/error.rs

5286e17

Co-authored-by: Wei Ji <[email protected]>

Apply suggestions from @Weji14

dc46f8a

Co-authored-by: Wei Ji <[email protected]>

made floating point predictor(s) use

9b3afc8

feefladder force-pushed the predictors branch from 2d35f0c to 9b3afc8 Compare June 10, 2025 11:14

weiji14 approved these changes Jun 11, 2025

View reviewed changes

weiji14 merged commit 6fe3da9 into developmentseed:main Jun 12, 2025
6 checks passed

feefladder mentioned this pull request Jun 12, 2025

added DecodingResult struct, changed decoding and predictor API to allow in-place #103

Closed

kylebarron reviewed Jun 13, 2025

View reviewed changes

feefladder mentioned this pull request Jun 14, 2025

removed non-serious language from errors #105

Merged

added predictors #86

added predictors #86

Uh oh!

Conversation

feefladder commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylebarron Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feefladder Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feefladder Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylebarron commented Apr 2, 2025

Uh oh!

feefladder commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feefladder commented Apr 3, 2025

Uh oh!

feefladder Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feefladder Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feefladder commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weiji14 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

weiji14 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feefladder commented Jun 12, 2025

Uh oh!

Uh oh!

feefladder commented Apr 2, 2025 •

edited

Loading

kylebarron Apr 2, 2025 •

edited

Loading

feefladder Apr 2, 2025 •

edited

Loading

feefladder Apr 3, 2025 •

edited

Loading

feefladder commented Apr 2, 2025 •

edited

Loading

feefladder Apr 3, 2025 •

edited

Loading

feefladder Apr 3, 2025 •

edited

Loading

feefladder commented Apr 3, 2025 •

edited

Loading

weiji14 left a comment •

edited

Loading

weiji14 commented Jun 12, 2025 •

edited

Loading