Skip to content

added predictors #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jun 12, 2025
Merged

added predictors #86

merged 23 commits into from
Jun 12, 2025

Conversation

feefladder
Copy link
Contributor

@feefladder feefladder commented Apr 2, 2025

Added predictors and tests.

Since floating point predictors shuffle horizontal padding into the output, quite a lot more information is needed, so I made a public PredictorInfo struct with public methods that give tile/chunk info.

Summary:

  • Added crate-public unpredict_float/hdiff functions
    • Endianness fixing is together with predictors, since the horizontal predictor fixes endianness first and then does the differencing, whereas the floating point predictor first does the horizontal differencing (on bytes) and then fixes endianness together with the shuffling. always orders bytes be-like spec pdf
    • Predictor::None -> use fix_endianness
    • Predictor::Horizontal horizontal prediction and endianness, inside unpredict_hdiff
    • Predictor::Float floating point prediction and endianness. based on this comment
  • Because endianness is needed info, include in in IFD and their from_tags function.
  • Added a PredictorInfo struct, inspired by tiff2.
    • provided functions are expected to be used by the user in their own implementation:
      • included PlanarConfiguration, even though no decoding actually supports Planar, only Chunky
      • included SampleFormat, even though we don't test for them in predictors
      • made functions work on Planar, except if bits_per_sample is non-homogeneous
    • has all needed info to determine:
      • byte width of a chunk row
      • number of samples and their size(s)
  • Made the abstraction that a stripped tiff = tiled tiff with chunk_width=image_width
    • In image-tiff and tiff2, strips and tiles are kept separate, where the end result is that the same calculations are done through different functions with different implementations.
    • I wanted to make this abstraction earlier in tiff2, but realized it too late
    • Can be easily undone by erroring when trying to create a PredictorInfo on non-tiled tiff

Some notes:

  • I did put in the possibility for multiple sample formats, which links our lifetime to the tile... I think it'd also be perfectly fine to accept only a single value, but then I think it'd also make sense to reflect that at IFD?
  • image-tiff does not support horizontal predictor for floating-point rasters, GDAL does. Since here the two layers are decoupled, I didn't put any checks, but just give the user bogus output if they give bogus input.
  • output being Bytes, rather than having a &mut [u8] function input. The &mut [u8] would be preferred by me, because then it can be directly read into a user-provided buffer that has also ensured the alignment of the buffer (e.g. using initializing a Vec<f32> buffer and then bytemucking to &mut [u8])

@feefladder feefladder force-pushed the predictors branch 2 times, most recently from 19f36f8 to e0a80af Compare April 2, 2025 14:17
src/predictor.rs Outdated
predictor_info: &PredictorInfo,
tile_x: u32,
tile_y: u32,
) -> AsyncTiffResult<Bytes>; // having this Bytes will give alignment issues later on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Bytes is pretty much just Arc<Vec<u8>>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if:

  • the tiff is something with alignment>1, e.g. f32
  • the global allocator gives out a misaligned Vec (which is doesn't often do)
    then the user has two options:
  1. copy over the data into a Vec using f32::from_ne_bytes()
  2. use bytemuck and hope for the best
    I looked into this quite deeply, and afaik, most "standard" allocators allocate with alignment 8, it would just save a copy in my mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead we should discuss: at what points should be storing Vec<u8> and at what points be storing structured array types.

Copy link
Member

@kylebarron kylebarron Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to add to my previous comment: the question is where we convert out of Bytes. I think the core networking trait (AsyncFileReader) should remain as it is, where get_bytes returns bytes::Bytes. Most networking code, at least reqwest and object_store return buffers as Bytes.

There's no way to convert from a Bytes to a Vec<u8> zero-copy. (You can sometimes convert from a Bytes to BytesMut zero-copy). So that implies at some point we make a data copy from a Bytes into a Vec<T> in order to be safe. Or we could use similar code as from the Arrow project and build typed interfaces on top of an Arc<Bytes>, such as their Buffer type.

Copy link
Contributor Author

@feefladder feefladder Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to add to my previous comment: the question is where we convert out of Bytes. I think the core networking trait (AsyncFileReader) should remain as it is, where get_bytes returns bytes::Bytes. Most networking code, at least reqwest and object_store return buffers as Bytes.

I'm all up for not changing AsyncFileReader. I would say after the decompression step is where we can have typed arrays, since decompression itself is not zero-copy (except for no compression). In contrast to reqwest and object_store, we actually know the underlying datatype

Then for the endianness fixing and horizontal predictor, it is also nice to have exclusive access to the underlying buffer (&mut[u8]), since that is zero-copy (not the float predictor). Passing typed arrays over to the predictor doesn't make much sense there imho either, since the operations don't differentiate between e.g. f64 and u64 and it's already quite complex as-is.

so I thought something like:

// pseudocode
fn Tile.decode(&self, &decoder) -> AsyncTiffResult<DecodingResult> {
   //  tiff2's DecodingResult
   let mut res: DecodingResult = todo!() // smart buffer sizing
   decoder.decode(self.compressed_bytes, &mut res.buf_mut()); //bytemuck casting
   match self.predictor_info.predictor {
     // mutates in-place
     Predictor::Horizontal => unpredict_horizontal(&mut res.buf_mut(), &self.predictor_info, self.x)
     // also mutates in-place, but uses a copy
     Predictor::Float => unpredict_float(&mut res.buf_mut(), &self.predictor_info, self.x, self.y)
   }
}

But maybe put this in a separate issue/PR? I didn't put it in here, because this PR was already medium-large and "my ideal situation^" would change some non-related things.

There's no way to convert from a Bytes to a Vec<u8> zero-copy. (You can sometimes convert from a Bytes to BytesMut zero-copy). So that implies at some point we make a data copy from a Bytes into a Vec<T> in order to be safe. Or we could use similar code as from the Arrow project and build typed interfaces on top of an Arc<Bytes>, such as their Buffer type.

In the current (this PR) implementation, I already did a BytesMut::From() quite some times, even though we were still in the same decoding step.

Sidenote: there is also some discussion going on over at image-tiff, where they want to just output a Bytes or &[u8], but then also the alignment issue was raised and somewhat ignored.

but... separate issue/PR? I think this discussion is more about optimization/api than predictors? (even though they overlap quite a bit, could also be here?) #87

src/tile.rs Outdated
}

/// The number of chunks in the horizontal (x) direction
pub fn chunks_across(&self) -> u32 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to my previous comment on removing the PredictorInfo struct; both chunks_across and chunks_down are useful outside of handling predictors. It would make sense to include these on the IFD struct (ex. IFD.chunks_across).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we keep the PredictorInfo, as I think we probably should, we can put this functionality in a shared TileMath trait which both the IFD and PredictorInfo implement

Copy link
Contributor Author

@feefladder feefladder Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the PredictorInfo separate made tests easier than having to build a full IFD with partial unused required data.

One other option still is to add all PredictorInfo attributes to Tile and then passing a &Tile into the unpredict_... functions for a bit more of a flat struct layout.

Then implement the TileMath trait for Tile and IFD, which makes a lot of sense api-wise

@kylebarron
Copy link
Member

I made a PR with suggested changes into this branch. See feefladder#1

@feefladder
Copy link
Contributor Author

feefladder commented Apr 2, 2025

temporary sadness: "Ah noo... I just finished incorporating feedback :'("

src/predictor.rs Outdated
) -> AsyncTiffResult<Bytes> {
let output_row_stride = predictor_info.output_row_stride(tile_x)?;
let mut res: BytesMut =
BytesMut::zeroed(output_row_stride * predictor_info.output_rows(tile_y)?);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here now I did an output_rows, which is more if we have Planar planar config. I thought then at least we can give the user the data and they can split it themselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now the Planar output will be:

[
Red...,
Green.,
Blue..,
]

where each band is chunk_width_pixels()*chunk_height_pixels()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this made sense, because the decoder (decompression) step decodes the entire data, which is like structured like this

Comment on lines +107 to +125
/// the number of rows the output has, taking padding and PlanarConfiguration into account.
fn output_rows(&self, y: u32) -> AsyncTiffResult<usize> {
match self.planar_configuration {
PlanarConfiguration::Chunky => Ok(self.chunk_height_pixels(y)? as usize),
PlanarConfiguration::Planar => {
Ok((self.chunk_height_pixels(y)? as usize)
.saturating_mul(self.samples_per_pixel as _))
}
}
}

fn bits_per_pixel(&self) -> usize {
match self.planar_configuration {
PlanarConfiguration::Chunky => {
self.bits_per_sample as usize * self.samples_per_pixel as usize
}
PlanarConfiguration::Planar => self.bits_per_sample as usize,
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these depend on planar_config

@feefladder
Copy link
Contributor Author

Thanks @kylebarron for the changes!

Comment on lines +209 to +216
#[cfg(target_endian = "little")]
if let Endianness::LittleEndian = byte_order {
return buffer;
}
#[cfg(target_endian = "big")]
if let Endianness::BigEndian = byte_order {
return buffer;
}
Copy link
Contributor Author

@feefladder feefladder Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought splitting the cfg->noop up here was a bit clearer, before cfgs were mixed inside the match statement

Comment on lines +81 to +91
Predictor::None => Ok(fix_endianness(
decoded_tile,
self.predictor_info.endianness(),
self.predictor_info.bits_per_sample(),
)),
Predictor::Horizontal => {
unpredict_hdiff(decoded_tile, &self.predictor_info, self.x as _)
}
Predictor::FloatingPoint => {
unpredict_float(decoded_tile, &self.predictor_info, self.x as _, self.y as _)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the trait and only have crate-public functions now

Copy link
Contributor Author

@feefladder feefladder Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also help with less-copy as mocked up in #87, since float predictor doesn't do in-place modification and having a shared trait doesn't allow the differentiation

@feefladder
Copy link
Contributor Author

feefladder commented Apr 3, 2025

resolved: On another note: Since now there is this abstraction of "Chunks" for tiles and strips, the name Tile doesn't make much sense anymore for stripped tiffs, but I think that's fine, and is merely a documentation issue

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks reasonable to me, only some minor suggestions around the docs, and one question I had around the reverse floating point prediction implementation.

src/predictor.rs Outdated
input[i] = input[i].wrapping_add(input[i - samples]);
}
// reverse byte shuffle and fix endianness
for (i, chunk) in output.chunks_mut(4).enumerate() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would chunks_exact_mut be ok to use here? Is there a chance that chunks may not have exactly 4 bytes at the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ok, since the chunk-ness is based on f32 being 4 bytes, also at the f64 then

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I'll merge in ~24 hours if there are no objections. Thanks again @feefladder!

@feefladder
Copy link
Contributor Author

maybe ping @kylebarron who is/was main dev of this repo?

@weiji14 weiji14 merged commit 6fe3da9 into developmentseed:main Jun 12, 2025
6 checks passed
@weiji14
Copy link
Member

weiji14 commented Jun 12, 2025

Oops, saw your message too late. Kyle's said he hasn't got much time for async-tiff lately, so he asked me to review. Feel free to open more patches as needed and I'll try my best to review them 😄

@feefladder
Copy link
Contributor Author

ah ok great, I have one laying around for #87, lets see if that goes...

there was also a thingy around #82, where he put a lot of work in and I didn't appreciate (I was more like "why don't you merge my changes"). Turns out the direction he was going was actually better, (allowing for stateless reading and diversifying tag reading in downstream code) but #82 doesn't allow me to do, since some functions/structs I need are private. now since there were emotions involved and I think Github doesn't cater well to those types of conversations it all stalled a bit (I think also #83). I think all very solveable, but a bit sensitive I think? <- do you know a good way forward there?

but first lemme PR for #87

@weiji14
Copy link
Member

weiji14 commented Jun 12, 2025

I'm not well versed enough in Rust to know whether these low-level implementation details matter, and we're all probably desiring slightly different things out of this library for very different use-cases (e.g. web-tiling, game-engines, gpu-shaders, etc). I say we can try and work on some low hanging fruit, and work on some core abstractions (e.g. I really want #100 to be solved), and anything super specific could live in our own personal repos like https://github.com/feefladder/tiff2, https://github.com/weiji14/cog3pio, etc. How does that sound?

@feefladder
Copy link
Contributor Author

Sounds good! tiff2 also implements all low-level stuff asyncly, and I think that's better to do only in one place to prevent fragmentation of the ecosystem. Because I spent most time on tiff2, where the biggest chunk was low-level stuff, I don't really know that much about what would be needed for parsing geo tags or making a registry for that (what is needed for most tag parsing?). I could chip in on #100 a bit with some thoughts.

I already made a start at something called async-tiff-wasm (maybe async-cog-wasm is a better name) that would do the more specific stuff for reading cogs into wasm games etc. I think I'll open an issue about making #82 work for me when the need arises, for now I need to finish a report about these things.

64 => rev_predict_f64(in_buf, out_buf, predictor_info.samples_per_pixel as _),
_ => {
return Err(AsyncTiffError::General(format!(
"thou shalt not predict f{bit_depth:?}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion this is unprofessional language for a serious repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants