Enable Foyer disk based cache for Parquet warm reads#1
Enable Foyer disk based cache for Parquet warm reads#1vishwasgarg18 wants to merge 1 commit intodatafusionfrom
Conversation
| @@ -0,0 +1,32 @@ | |||
| /* | |||
There was a problem hiding this comment.
Not the right place. Placed here to avoid compilation issue.
Update location post open sourcing Tiered storage.
vishwasgarg18
left a comment
There was a problem hiding this comment.
Foyer cache integration.
| use vectorized_exec_spi::{log_info, log_error, log_debug}; | ||
|
|
||
| // Default page cache budgets — overridden by Java settings via createCache() | ||
| const DEFAULT_PAGE_CACHE_MEMORY_BYTES: usize = 256 * 1024 * 1024; // 256 MB L1 memory |
There was a problem hiding this comment.
Discuss and update.
Foyer provides out of the box memory cache. Can be updated to -1 to set it to only disk based cache.
| let inner: HybridCache<String, CachedBytes> = rt.block_on(async move { | ||
| HybridCacheBuilder::new() | ||
| .with_name("foyer-parquet-page-cache") | ||
| .memory(memory_capacity_bytes) |
There was a problem hiding this comment.
We only need disk cache lets not use memeory
|
|
||
| /// An [`ObjectStore`] wrapper that caches `get_range` / `get_ranges` results | ||
| /// in the Foyer hybrid (memory + disk) page cache. | ||
| pub struct CachingObjectStore { |
There was a problem hiding this comment.
This should be part of tiered object store corret where the cach information is passed and then cache is used
| // Default page cache budgets — overridden by Java settings via createCache() | ||
| const DEFAULT_PAGE_CACHE_MEMORY_BYTES: usize = 256 * 1024 * 1024; // 256 MB L1 memory | ||
| const DEFAULT_PAGE_CACHE_DISK_BYTES: usize = 10 * 1024 * 1024 * 1024; // 10 GB L2 disk | ||
| const DEFAULT_PAGE_CACHE_DIR: &str = "/tmp/foyer-page-cache"; |
There was a problem hiding this comment.
All this should be setting nothing should be default
| * combined with the byte range, e.g. {@code "data/nodes/0/.../parquet/_parquet_0.parquet:4096-8192"}. | ||
| * The exact key format is an implementation detail of the provider. | ||
| */ | ||
| public interface PageCacheProvider { |
There was a problem hiding this comment.
Why we need a page cache provider we have a cache strategy provider correct and for parquet we can have pass through on java and foyer in rust correct
| /** Whether this input has been closed */ | ||
| private boolean closed = false; | ||
|
|
||
| public CachedParquetIndexInput( |
There was a problem hiding this comment.
data fusion is taking care of parquet file reads why is this required?
| /// Parse the eviction_type string for PAGES cache type. | ||
| /// Expected format: "<disk_capacity_bytes>|<disk_dir>" | ||
| /// Falls back to defaults if the string is malformed (e.g. plain "LRU" from old Java code). | ||
| fn parse_page_cache_params(eviction_str: &str) -> (usize, String) { |
There was a problem hiding this comment.
Lets not make changes to liquid cache
| * by this plugin, without any classloader visibility issues. | ||
| */ | ||
| public class DataFusionPlugin extends Plugin implements ActionPlugin, SearchEnginePlugin, AnalyticsBackEndPlugin, ExtensiblePlugin, SearchAnalyticsBackEndPlugin { | ||
| public class DataFusionPlugin extends Plugin implements ActionPlugin, SearchEnginePlugin, AnalyticsBackEndPlugin, ExtensiblePlugin, SearchAnalyticsBackEndPlugin, PageCacheProvider { |
There was a problem hiding this comment.
lets not add more implementation to main plugin. We need to wrap thing around plugin
| // Called by CachedParquetCacheStrategy in the tiered-storage module. | ||
|
|
||
| @Override | ||
| public byte[] getPageRange(String path, int start, int end) { |
There was a problem hiding this comment.
DataFusion plugin should not be aware about caches
| /// which causes `JoinError::Cancelled` panics in `foyer-storage`. We therefore keep | ||
| /// the runtime as an `Arc` field so it is dropped only after the `HybridCache` itself. | ||
| #[derive(Clone)] | ||
| pub struct FoyerDiskPageCache { |
There was a problem hiding this comment.
this should be part of tiered obkect store
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.