Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow configuring clear-cache statically and injecting torn writes at runtime #7

Merged
merged 13 commits into from
Jun 18, 2024
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,14 @@ file="output1.txt"
occurrence=5
parts=3 #or parts_bytes=[4096,3600,1260]
persist=[1,3]

[[injection]]
type="clear"
Copy link
Collaborator

@devzizu devzizu Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of suggestions here before diving deep into a review:

  1. If we want to add a type="clear" fault, then we must maintain the already known syntax, which is type=clear-cache, which indicates that specifying crash=false (assume default) has the same effect has sending lazyfs::clear-cache to the faults fifo.

    • With this new type, and the option to specify crash=true, we must remove lazyfs::crash from the API and replace by lazyfs::clear-cache with more options, namely the timing, op, occurrence, and crash (e.g., lazyfs::clear-cache::timing=...::op=...::occurrence=...::crash), where crash may be omitted (optional parameter).
    • We must also ensure the user never pre-configures a clear-cache without a timing and op, but he can send lazyfs::clear-cache::crash to the fifo to simulate "crash right now".
  2. Instead of the above, why can't we just add support for the existing lazyfs::crash fault under the config file? Do we need to specify a timing for the clear cache without crashing LazyFS?

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: We going with option 1, but maintaining the crash fault option for now and will probably deprecate it soon. @mj-ramos Can you please change the fault syntax of the clear fault type to clear-cache?

from="f1.txt"
timing="before"
op="fsync"
occurrence=6
crash=true
```

I recommend following the `simple` cache configuration (indicating the cache size and using a similar configuration file as `default.toml`), since it's currently the most tested schema in our experiments. Additionally, for the section **[cache]**, you can specify the following:
Expand All @@ -105,6 +113,7 @@ I recommend following the `simple` cache configuration (indicating the cache siz

- **torn-seq**: This fault type is used when a sequence of system calls, targeting a single file, is executed consecutively without an intervening `fsync`. *In the example*, during the second group of consecutive writes (the group number is defined by the parameter `occurrence`), to the file "output.txt", the first and fourth writes will be persisted to disk (the writes to be persisted are defined by the parameter `persist`). After the fourth write (the last in the `persist` vector), LazyFS will crash itself.
- **torn-op**: This fault type involves dividing a write system call into smaller parts, with some of these parts being persisted while others are not. In the example, the fifth write issued (the number of the write is defined by the parameter `occurrence`) to the file "output1.txt" will be divided into three equal parts if the `parts` parameter is used, or into customizable-sized parts if the `parts_bytes` parameter is defined. In the commented code, there's an example of using `parts_bytes`, where the write will be torn into three parts: the first with 4096 bytes, the second with 3600 bytes, and the last with 1200 bytes. The `persist` vector determines which parts will be persisted. After the persistence of these parts, LazyFS will crash.
- **clear-cache**: Clears unsynced data in a certain point of the execution. In the example above, this fault will be injected after (`timing`) the sixth (`occurrence`) `fsync` (`op`) to the file "f1.txt" (`from`). The `op` parameter must be a system call, and if it involves two paths (such as `rename`), the `to` parameter should also be specified. The `crash` parameter determines whether LazyFS should crash after the fault injection.

Other parameters:

Expand Down Expand Up @@ -209,6 +218,19 @@ Finally, one can control LazyFS by echoing the following commands to the configu

> Kills LazyFS before executing a link operation to the file pattern 'fileabd'.

- **Kill the filesystem** after injecting `torn-op` or `torn-seq`faults:

The parameters are the same as the ones presented in the above configuration file. Parameters that have multiple values, must be specified without the parenthesis (e.g., `persist=1,2` ).

- ```bash
echo "lazyfs::torn-op::file=...::persist=...::parts=...::occurrence=..." > /my/path/faults.fifo
```

- ```bash
echo "lazyfs::torn-seq::op=...::file=...::persist=...::occurrence=..." > /my/path/faults.fifo
```


LazyFS expects that every buffer written to the FIFO file terminates with a new line character (**echo** does this by default). Thus, if using `pwrite`, for example, make sure you end the buffer with `\n`.

## Contact
Expand Down
2 changes: 1 addition & 1 deletion lazyfs/config/default.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ blocks_per_page=1
# no_pages=10
[filesystem]
log_all_operations=false
logfile=""
logfile=""
125 changes: 79 additions & 46 deletions lazyfs/include/lazyfs/lazyfs.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,13 @@ class LazyFS : public Fusepp::Fuse<LazyFS> {
/**
* @brief Faults programmed in the configuration file.
*/
unordered_map<string,vector<cache::config::Fault*>>* faults;
unordered_map<string,vector<faults::Fault*>>* faults;


/**
* @brief Faults of LazyFS crash injected during runtime.
*
* obsolete!
*/
std::unordered_map<string, unordered_set<string>> crash_faults;

Expand All @@ -121,16 +123,47 @@ class LazyFS : public Fusepp::Fuse<LazyFS> {
std::mutex write_lock;

/**
* @brief Path of the current fault being injected.
* @brief Current faults being injected.
*/
string path_injecting_fault;
vector<string> injecting_fault;

/**
* @brief Lock for path of current injected fault.
* @brief Lock for current injected faults.
*/
std::mutex path_injecting_fault_lock;
std::mutex injecting_fault_lock;

public:

/**
* @brief Map of faults associated with each filesystem operation
*
*/
// operation -> [((from_rgx, to_rgx), ...]
std::unordered_map<string, vector<pair<std::regex, string>>> crash_faults_before_map;
std::unordered_map<string, vector<pair<std::regex, string>>> crash_faults_after_map;

/**
* @brief Map of allowed operations to have a crash fault
*
*/
std::unordered_set<string> allow_crash_fs_operations = {"unlink",
"truncate",
"fsync",
"write",
"create",
"access",
"open",
"read",
"rename",
"link",
"symlink"};

/**
* @brief Map of operations that have two paths
*
*/
std::unordered_set<string> fs_op_multi_path = {"rename", "link", "symlink"};

/**
* @brief Construct a new LazyFS object.
*
Expand All @@ -149,7 +182,7 @@ class LazyFS : public Fusepp::Fuse<LazyFS> {
cache::config::Config* config,
std::thread* faults_handler_thread,
void (*fht_worker) (LazyFS* filesystem),
unordered_map<string,vector<cache::config::Fault*>>* faults);
unordered_map<string,vector<faults::Fault*>>* faults);

/**
* @brief Destroy the LazyFS object
Expand All @@ -158,9 +191,9 @@ class LazyFS : public Fusepp::Fuse<LazyFS> {
~LazyFS ();

/**
* @brief Get path of the fault currently being injected.
* @brief Get faults currently being injected.
*/
string get_path_injecting_fault();
vector<string> get_injecting_fault();

/**
* @brief Fifo: (fault) Clear the cached contents
Expand All @@ -182,18 +215,18 @@ class LazyFS : public Fusepp::Fuse<LazyFS> {

/**
* @brief Fifo: Reports which files have unsynced data.
* @param path_to_exclude Path to be excluded from the report.
* @param paths_to_exclude Paths to be excluded from the report.
*
*/
void command_unsynced_data_report (string path_to_exclude);
void command_unsynced_data_report (vector<string> paths_to_exclude);

/**
* @brief Checks if a programmed reorder fault for the given path and operation exists. If so, updates the counter and returns the fault.
* @param path Path of the file
* @param op Operation ('write','fsync',...)
* @return Pointer to the ReorderF object
*/
cache::config::ReorderF* get_and_update_reorder_fault(string path, string op);
faults::ReorderF* get_and_update_reorder_fault(string path, string op);

/**
* @brief Persists a write if a there is a programmed reorder fault for write in the given path and if the counter matches one of the writes to persist.
Expand Down Expand Up @@ -328,36 +361,6 @@ class LazyFS : public Fusepp::Fuse<LazyFS> {
static int lfs_chmod (const char*, mode_t, struct fuse_file_info*);
static int lfs_chown (const char*, uid_t, gid_t, fuse_file_info*);

/**
* @brief Map of faults associated with each filesystem operation
*
*/
// operation -> [((from_rgx, to_rgx), before?true:false), ...]
std::unordered_map<string, vector<pair<std::regex, string>>> crash_faults_before_map;
std::unordered_map<string, vector<pair<std::regex, string>>> crash_faults_after_map;

/**
* @brief Map of allowed operations to have a crash fault
*
*/
std::unordered_set<string> allow_crash_fs_operations = {"unlink",
"truncate",
"fsync",
"write",
"create",
"access",
"open",
"read",
"rename",
"link",
"symlink"};

/**
* @brief Map of operations that have two paths
*
*/
std::unordered_set<string> fs_op_multi_path = {"rename", "link", "symlink"};

/**
* @brief Adds a crash fault to the faults map
*
Expand All @@ -373,16 +376,46 @@ class LazyFS : public Fusepp::Fuse<LazyFS> {
string crash_regex_to);

/**
* @brief kills lazyfs with SIGINT if any fault condition verifies
* @brief Adds a torn-seq fault to the faults map. Returns a vector with errors if any.
*
* @param path path of the fault
* @param op system call
* @param persist which parts of the write to persist
* @return errors
*/
vector<string> add_torn_seq_fault(string path, string op, string persist);

/**
* @brief Adds a torn-op fault to the faults map. Returns a vector with errors if any.
*
* @param path path of the fault
* @param parts which parts of the write to persist
* @param parts_bytes division of the write in bytes
* @param persist which parts of the write to persist
* @return errors
*/
vector<string> add_torn_op_fault(string path, string parts, string parts_bytes, string persist);

/**
* @brief Kills lazyfs with SIGKILL if any fault condition verifies
*
* @param opname operation to check
* @param optiming one of 'allow_crash_fs_operations'
* @param opname one of 'allow_crash_fs_operations'
* @param optiming timing for triggering fault operation ('before' or 'after' a given system call)
* @param from_op_path source path specified in the operation
* @param dest_op_path destination path specified in the operation
* @param fault_type type of fault that triggered the crash
*/
void
trigger_crash_fault (string opname, string optiming, string from_op_path, string to_op_path, string fault_type);
void trigger_crash_fault (string opname, string optiming, string from_op_path, string to_op_path, string fault_type);

/**
* @brief Triggers a clear fault if condition is verified.
*
* @param opname operation name
* @param optiming timing for triggering fault operation ('before' or 'after')
* @param from_op_path source path specified in the operation
* @param dest_op_path destination path specified in the operation
*/
void trigger_configured_clear_fault (string opname, string optiming, string from_path, string to_path);
};

} // namespace lazyfs
Expand Down
Loading
Loading