You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Via investigation of #9058 -- in that issue, it was observed that layers before recently written image layers were being visited by getpage requests.
It seems like under some circumstances, a getpage request to the exact same LSN where an image layer exists can fail to hit that image layer. Not clear if being at the exact same LSN is important or not: it might just be that we don't hit image layers for reads until the current in memory layer is closed?
Lots of uncertainty here, not claiming to have conclusively diagnosed this
In that branch, there are some log lines hacked in to record which layers are visited at INFO level. In the test, there is a checkpoint line commented out:
# Uncomment this checkpoint, and the logs will show getpage requests hitting the image layers we
# just created. However, without the checkpoint, getpage requests will hit one InMemoryLayer and
# one persistent delta layer.
# env.pageserver.http_client().timeline_checkpoint(tenant_id, timeline_id, wait_until_uploaded=True)
The presence or absence of inmemory layers shouldn't make any difference to whether reads hit an image layer, but apparently it does.
The text was updated successfully, but these errors were encountered:
This test does reads at exactly the LSN of the image layer, but I can also reproduce the issue with some writes between generating the image layer and doing the read, so this is not something that only occurs when reading exactly at the image layer's LSN. I suspect our reads are skipping the image layer until the next time we freeze the ephemeral layer.
Perhaps this piece of logic is at fault in get_vectored_reconstruct_data_timeline:
match in_memory_layer {
Some(l) => {
let lsn_range = l.get_lsn_range().start..cont_lsn;
fringe.update(
ReadableLayer::InMemoryLayer(l),
unmapped_keyspace.clone(),
lsn_range,
);
}
...because lsn_range is being constructed from the absolute start of the layer. Our cont_lsn jumps back to the start of the oldest inmemory layer before we start looking at historic layers at all.
Via investigation of #9058 -- in that issue, it was observed that layers before recently written image layers were being visited by getpage requests.
It seems like under some circumstances, a getpage request to the exact same LSN where an image layer exists can fail to hit that image layer. Not clear if being at the exact same LSN is important or not: it might just be that we don't hit image layers for reads until the current in memory layer is closed?
Lots of uncertainty here, not claiming to have conclusively diagnosed this
Branch with experimental test:
https://github.com/neondatabase/neon/tree/jcsp/layer-map-search-at-image-lsn-2
In that branch, there are some log lines hacked in to record which layers are visited at INFO level. In the test, there is a checkpoint line commented out:
The presence or absence of inmemory layers shouldn't make any difference to whether reads hit an image layer, but apparently it does.
The text was updated successfully, but these errors were encountered: