Skip to content

New mapped_private mode: avoid crash by flushing dirty pages when memory pressure gets high #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
91cf743
Bump required `cmake` version as suggested in PR #21.
greg7mdp Oct 5, 2023
302274b
Cleanup compiler feature dependency as suggested in review of PR #21.
greg7mdp Oct 5, 2023
2cf9842
Implement `check_memory_and_flush_if_needed` function
greg7mdp Oct 5, 2023
cf8ff37
Update threshold and comment.
greg7mdp Oct 5, 2023
6663302
Return more data from `check_memory_and_flush_if_needed`
greg7mdp Oct 5, 2023
3460d0e
Always return `oom` score if available.
greg7mdp Oct 5, 2023
90c653b
Don't force immediate flush of all state dirty pages.
greg7mdp Oct 5, 2023
11f9090
Disable `save` messages when processing `oom` event.
greg7mdp Oct 5, 2023
6ee4f83
Recreate copy_on_write mapping after disk file is updated.
greg7mdp Oct 6, 2023
ba91bc2
Destroy old region before creating a new one.
greg7mdp Oct 6, 2023
664b069
Use `mmap` with `MAP_FIXED` to ensure we get the same address
greg7mdp Oct 6, 2023
febb50c
Don't use mmap/munmap when _WIN32 is defined.
greg7mdp Oct 6, 2023
c55e62e
Use `_cow_size` consistently.
greg7mdp Oct 6, 2023
923444c
Remove unnecessary `_cow_size` and rename a couple new variables.
greg7mdp Oct 7, 2023
b782713
Update return value of `check_memory_and_flush_if_needed`
greg7mdp Oct 10, 2023
34a5033
Check for empty `std::optional` before returning `oom_post`.
greg7mdp Oct 10, 2023
408af79
Add test exercising the `flush` of dirty pages when the `oom` score g…
greg7mdp Oct 11, 2023
e63909f
Skip `munmap` to avoid race condition between `munmap` and `mmap`.
greg7mdp Oct 11, 2023
1e56487
Fix mismatched sign comparison.
greg7mdp Oct 12, 2023
580a743
Small cleanup; add wait in test for accurate soft-dirty bit read.
greg7mdp Oct 12, 2023
1819359
Merge branch 'main' of github.com:AntelopeIO/chainbase into gh_1721
greg7mdp Oct 13, 2023
930611e
Update test to insert in the db right after doing `clear_refs`.
greg7mdp Oct 13, 2023
113f0be
Add db consistency check after flush in test.
greg7mdp Oct 15, 2023
f6343e7
Fix wrong delay in test, and remove unnecesary wait.
greg7mdp Oct 19, 2023
3e11aa8
Remove unnecessary `cerr` output in test.
greg7mdp Oct 19, 2023
1160b4f
Merge branch 'main' into gh_1721
greg7mdp Nov 3, 2023
b9968ed
Make oom test work on systems that don't support oom.
greg7mdp Nov 3, 2023
63b76dc
Fix issue with incorrect `database_size` when requesting a smaller `s…
greg7mdp Nov 6, 2023
734dc46
Check for `mmap` failing (could happen if RAM size < database_size).
greg7mdp Nov 7, 2023
7b9a4f0
Make whitespace change for `_database_size` fix same as in `[5.0]` PR.
greg7mdp Nov 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions src/pinnable_mapped_file.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,6 @@ std::optional<int> pinnable_mapped_file::get_oom_score() const {
// When the check is done, return a struct including the number of pages flushed to disk and pre/post oom scores
// -------------------------------------------------------------------------------------------------------------
std::optional<pinnable_mapped_file::memory_check_result> pinnable_mapped_file::check_memory_and_flush_if_needed() {
size_t written_pages {0};
if (_non_file_mapped_mapping || _sharable || !_writable)
return {};

Expand All @@ -284,8 +283,10 @@ std::optional<pinnable_mapped_file::memory_check_result> pinnable_mapped_file::c
check_time = current_time + _oom_delay;

auto oom_score = pagemap_accessor::read_oom_score();
std::optional<int> oom_post;
if (oom_score) {
size_t written_pages {0};
std::optional<int> oom_post;

if (*oom_score >= _oom_threshold) {
// linux returned a high out-of-memory (oom) score for the current process, indicating a high
// probablility that the process will be killed soon (The valid range is from 0 to 1000.
Expand Down Expand Up @@ -403,13 +404,13 @@ size_t pinnable_mapped_file::save_database_file(bool flush, bool closing_db) {
pagemap_accessor pagemap;
size_t written_pages {0};
auto [src, sz] = get_region_to_save();
bool mapped_writable_instance = std::find(_instance_tracker.begin(), _instance_tracker.end(), this) != _instance_tracker.end();
bool cow_instance = std::find(_instance_tracker.begin(), _instance_tracker.end(), this) != _instance_tracker.end();

while(offset != sz) {
size_t copy_size = std::min(_db_size_copy_increment, sz - offset);
if (!mapped_writable_instance ||
if (!cow_instance ||
!pagemap.update_file_from_region({ src + offset, copy_size }, _file_mapping, offset, flush, written_pages)) {
if (mapped_writable_instance)
if (cow_instance)
std::cerr << "CHAINBASE: ERROR: pagemap update of db file failed... using non-pagemap version" << '\n';
if(!all_zeros(src+offset, copy_size)) {
bip::mapped_region dst_rgn(_file_mapping, bip::read_write, offset, copy_size);
Expand All @@ -432,7 +433,7 @@ size_t pinnable_mapped_file::save_database_file(bool flush, bool closing_db) {
}
if (closing_db) {
std::cerr << "CHAINBASE: Writing \"" << _database_name << "\" database file, complete." << '\n';
} else if (mapped_writable_instance) {
} else if (cow_instance) {
#ifndef _WIN32
// we are saving while processing... recreate the copy_on_write mapping with clean pages.
// --------------------------------------------------------------------------------------
Expand Down
9 changes: 7 additions & 2 deletions test/test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#include <iostream>
#include "temp_directory.hpp"
#include <thread>

using namespace chainbase;
using namespace boost::multi_index;
Expand Down Expand Up @@ -146,16 +147,20 @@ BOOST_AUTO_TEST_CASE( oom_flush_dirty_pages ) {
db.create<book>( [i]( book& b ) { b.a = (int)i; b.b = (int)(i+1); } );
if (i % 1000 == 0) {
if (auto res = db.check_memory_and_flush_if_needed()) {
// we need to wait some time after clearing the soft-dirty bits from the task's PTEs
// for the next read to be accurate (see https://www.kernel.org/doc/Documentation/vm/soft-dirty.txt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying clearing the bits is asynchronous? If that's the case the whole approach seems dangerous.

I was initially worried about that when doing the initial review, but when I looked through the kernel source it seemed like it was synchronous.

Or instead is the problem here in this test that the oom_score is only updated at some interval and we need to wait around for that update to occur? If that's the case, I don't think citing that soft-dirty.txt refers to the same issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is not updating the oom_score (in the test we don't care about the value being updated because we set an arbitrarily low threshold).
It seems that if you check for the soft-dirty bits on the pages right after clearing them, you don't get correct results.

The reason I linked this url is:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that when we don't wait, we see more pages as dirty as we should, so I don't think it is dangerous, except for the fact that we'd flush more pages to disk than strictly needed. But I agree that it is unnerving.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dangerous because if it's not synchronous you'll miss writes that occur on pages after writing to clear_refs but before the dirty bit is cleared asynchronously in the background.

After save_database_file() we should have completely fresh pages anyways though right? So why are we seeing seemingly spurious dirty bits in brand new pages? Is it possible the dirty bit is only valid based on some other criteria? (some other bit being set?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dangerous because if it's not synchronous you'll miss writes that occur on pages after writing to clear_refs but before the dirty bit is cleared asynchronously in the background.

If it behaved like that the feature would be utterly useless.

Is it possible the dirty bit is only valid based on some other criteria? (some other bit being set?)

No I don't think so because all we need to do is wait a second or two and then we see the correct pages dirty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dangerous because if it's not synchronous you'll miss writes that occur on pages after writing to clear_refs but before the dirty bit is cleared asynchronously in the background.

I have tested it and you don't miss writes.

If you write to a page right after clear_refs, yo see the dirty bit whether you check right after the clear_refs, or two seconds later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the kernel unit tests not require waiting?

Copy link
Contributor Author

@greg7mdp greg7mdp Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, the test doesn't check exactly the same thing. The check has one page with the soft-dirty bit set and verifies that the bit is cleared immediately after clear_refs.

What I see in my test is that newly mapped pages, which were never written to, report being soft-dirty for a second or two after clear-refs, and then don't anymore. But if one of these pages is written to immediately after clear-refs, the soft-dirty bit will remain set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spoonincode you are correct, there was an issue in the test (now corrected) that made me think that clear_refs did not clear the soft-dirty bit immediately, but that was an error on my part. So my original comment was wrong, the clearing of the soft-dirty bits is synchronous.

std::this_thread::sleep_for (std::chrono::milliseconds(2000));

std::cerr << "oom score: " << res->oom_score_before << '\n';
if (res->num_pages_written > 0) {
std::cerr << "Flushed " << res->num_pages_written << " pages to disk\n";
if (++flush_count == 4)
if (++flush_count == 6)
break;
}
}
}
}
BOOST_REQUIRE( flush_count == 4 );
BOOST_REQUIRE( flush_count == 6 );
}

// BOOST_AUTO_TEST_SUITE_END()