Enable Foyer disk based cache for Parquet warm reads by vishwasgarg18 · Pull Request #1 · nishchay21/OpenSearch

vishwasgarg18 · 2026-03-31T04:25:51Z

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

vishwasgarg18 · 2026-03-31T04:26:54Z

...ectorized-exec-spi/src/main/java/org/opensearch/vectorized/execution/jni/PageCacheAware.java

@@ -0,0 +1,32 @@
+/*


Not the right place. Placed here to avoid compilation issue.
Update location post open sourcing Tiered storage.

vishwasgarg18

Foyer cache integration.

vishwasgarg18 · 2026-03-31T05:39:43Z

plugins/engine-datafusion/jni/src/cache_jni.rs

 use vectorized_exec_spi::{log_info, log_error, log_debug};

+// Default page cache budgets — overridden by Java settings via createCache()
+const DEFAULT_PAGE_CACHE_MEMORY_BYTES: usize = 256 * 1024 * 1024;        // 256 MB L1 memory


Discuss and update.

Foyer provides out of the box memory cache. Can be updated to -1 to set it to only disk based cache.

nishchay21 · 2026-03-31T05:19:35Z

plugins/engine-datafusion/jni/src/foyer_cache.rs

+        let inner: HybridCache<String, CachedBytes> = rt.block_on(async move {
+            HybridCacheBuilder::new()
+                .with_name("foyer-parquet-page-cache")
+                .memory(memory_capacity_bytes)


We only need disk cache lets not use memeory

nishchay21 · 2026-03-31T05:24:46Z

plugins/engine-datafusion/jni/src/caching_object_store.rs

+
+/// An [`ObjectStore`] wrapper that caches `get_range` / `get_ranges` results
+/// in the Foyer hybrid (memory + disk) page cache.
+pub struct CachingObjectStore {


This should be part of tiered object store corret where the cach information is passed and then cache is used

nishchay21 · 2026-03-31T05:35:14Z

plugins/engine-datafusion/jni/src/cache_jni.rs

+// Default page cache budgets — overridden by Java settings via createCache()
+const DEFAULT_PAGE_CACHE_MEMORY_BYTES: usize = 256 * 1024 * 1024;        // 256 MB L1 memory
+const DEFAULT_PAGE_CACHE_DISK_BYTES:   usize = 10 * 1024 * 1024 * 1024;  // 10 GB L2 disk
+const DEFAULT_PAGE_CACHE_DIR: &str = "/tmp/foyer-page-cache";


All this should be setting nothing should be default

nishchay21 · 2026-03-31T05:37:27Z

...orized-exec-spi/src/main/java/org/opensearch/vectorized/execution/jni/PageCacheProvider.java

+ * combined with the byte range, e.g. {@code "data/nodes/0/.../parquet/_parquet_0.parquet:4096-8192"}.
+ * The exact key format is an implementation detail of the provider.
+ */
+public interface PageCacheProvider {


Why we need a page cache provider we have a cache strategy provider correct and for parquet we can have pass through on java and foyer in rust correct

nishchay21 · 2026-03-31T05:38:31Z

...s/tiered-storage/src/main/java/org/opensearch/storage/directory/CachedParquetIndexInput.java

+    /** Whether this input has been closed */
+    private boolean closed = false;
+
+    public CachedParquetIndexInput(


data fusion is taking care of parquet file reads why is this required?

nishchay21 · 2026-03-31T05:43:04Z

plugins/engine-datafusion/jni/src/cache_jni.rs

+/// Parse the eviction_type string for PAGES cache type.
+/// Expected format: "<disk_capacity_bytes>|<disk_dir>"
+/// Falls back to defaults if the string is malformed (e.g. plain "LRU" from old Java code).
+fn parse_page_cache_params(eviction_str: &str) -> (usize, String) {


Lets not make changes to liquid cache

nishchay21 · 2026-03-31T05:47:24Z

plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/DataFusionPlugin.java

+ * by this plugin, without any classloader visibility issues.
 */
-public class DataFusionPlugin extends Plugin implements ActionPlugin, SearchEnginePlugin, AnalyticsBackEndPlugin, ExtensiblePlugin, SearchAnalyticsBackEndPlugin {
+public class DataFusionPlugin extends Plugin implements ActionPlugin, SearchEnginePlugin, AnalyticsBackEndPlugin, ExtensiblePlugin, SearchAnalyticsBackEndPlugin, PageCacheProvider {


lets not add more implementation to main plugin. We need to wrap thing around plugin

nishchay21 · 2026-03-31T05:48:14Z

plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/DataFusionPlugin.java

+    // Called by CachedParquetCacheStrategy in the tiered-storage module.
+
+    @Override
+    public byte[] getPageRange(String path, int start, int end) {


DataFusion plugin should not be aware about caches

nishchay21 · 2026-03-31T05:51:27Z

plugins/engine-datafusion/jni/src/foyer_cache.rs

+/// which causes `JoinError::Cancelled` panics in `foyer-storage`. We therefore keep
+/// the runtime as an `Arc` field so it is dropped only after the `HybridCache` itself.
+#[derive(Clone)]
+pub struct FoyerDiskPageCache {


this should be part of tiered obkect store

Enable Foyer disk based cache for Parquet reads

33fe281

vishwasgarg18 commented Mar 31, 2026

View reviewed changes

vishwasgarg18 changed the title ~~Enable Foyer disk based cache for Parquet reads~~ Enable Foyer disk based cache for Parquet warm reads Mar 31, 2026

vishwasgarg18 commented Mar 31, 2026

View reviewed changes

nishchay21 reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Foyer disk based cache for Parquet warm reads#1

Enable Foyer disk based cache for Parquet warm reads#1
vishwasgarg18 wants to merge 1 commit intodatafusionfrom
parquet/foyer-integeration

vishwasgarg18 commented Mar 31, 2026

Uh oh!

vishwasgarg18 Mar 31, 2026

Uh oh!

vishwasgarg18 left a comment

Uh oh!

vishwasgarg18 Mar 31, 2026

Uh oh!

nishchay21 Mar 31, 2026

Uh oh!

nishchay21 Mar 31, 2026

Uh oh!

nishchay21 Mar 31, 2026

Uh oh!

nishchay21 Mar 31, 2026

Uh oh!

nishchay21 Mar 31, 2026

Uh oh!

nishchay21 Mar 31, 2026

Uh oh!

nishchay21 Mar 31, 2026

Uh oh!

nishchay21 Mar 31, 2026

Uh oh!

nishchay21 Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vishwasgarg18 commented Mar 31, 2026

Description

Related Issues

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vishwasgarg18 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants