Skip to content

Commit

Permalink
Write support using Block-Cache (#1258)
Browse files Browse the repository at this point in the history
* Adding support for writing using block-cache
  • Loading branch information
vibhansa-msft authored Jan 22, 2024
1 parent d918c11 commit fdfe001
Show file tree
Hide file tree
Showing 34 changed files with 2,132 additions and 115 deletions.
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
## 2.1.3 (Unreleased)
## 2.2.0 (Unreleased)
**Bug Fixes**
- Invalidate attribute cache entry on `PathAlreadyExists` error in create directory operation.
- When `$HOME` environment variable is not present, use the current directory.
- Fixed mount failure on nonempty mount path for fuse3.

**Features**
- Support CPK for block storage accounts.
- Added support to write files using block-cache
- Optimized for sequential writing
- Editing/Appending existing files works only if files were originally created using block-cache with the same block size

## 2.1.2 (2023-11-17)
**Bug Fixes**
Expand Down
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Blobfuse2 is stable, and is ***supported by Microsoft*** provided that it is use

## NOTICE
- We have seen some customer issues around files getting corrupted when `streaming` is used in write mode. Kindly avoid using this feature for write while we investigate and resolve it.

- You can now use block-cache instead of streaming for both read and write workflows, which offers much better performance compared to streaming. To enable `block-cache` instead of `streaming`, use `--block-cache` in CLI param or `block-cache` as component in config file instead of `streaming`.
## Supported Platforms
Visit [this](https://github.com/Azure/azure-storage-fuse/wiki/Blobfuse2-Supported-Platforms) page to see list of supported linux distros.

Expand All @@ -16,7 +16,7 @@ Visit [this](https://github.com/Azure/azure-storage-fuse/wiki/Blobfuse2-Supporte
- Basic file system operations such as mkdir, opendir, readdir, rmdir, open,
read, create, write, close, unlink, truncate, stat, rename
- Local caching to improve subsequent access times
- Streaming to support reading AND writing large files
- Streaming/Block-Cache to support reading AND writing large files
- Parallel downloads and uploads to improve access time for large files
- Multiple mounts to the same container for read-only workloads

Expand All @@ -43,7 +43,7 @@ One of the biggest BlobFuse2 features is our brand new health monitor. It allows
- CLI to check or update a parameter in the encrypted config
- Set MD5 sum of a blob while uploading
- Validate MD5 sum on download and fail file open on mismatch
- Large file writing through write streaming
- Large file writing through write streaming/Block-Cache

## Blobfuse2 performance compared to blobfuse(v1.x.x)
- 'git clone' operation is 25% faster (tested with vscode repo cloning)
Expand Down Expand Up @@ -112,6 +112,7 @@ To learn about a specific command, just include the name of the command (For exa
* `--secure-config=true` : Config file is encrypted suing 'blobfuse2 secure` command.
* `--passphrase=<STRING>` : Passphrase used to encrypt/decrypt config file.
* `--wait-for-mount=<TIMEOUT IN SECONDS>` : Let parent process wait for given timeout before exit to ensure child has started.
* `--block-cache` : To enable block-cache instead of file-cache. This works only when mounted without any config file.
- Attribute cache options
* `--attr-cache-timeout=<TIMEOUT IN SECONDS>`: The timeout for the attribute cache entries.
* `--no-symlinks=true`: To improve performance disable symlink support.
Expand All @@ -137,7 +138,9 @@ To learn about a specific command, just include the name of the command (For exa
* `--block-cache-pool-size=<SIZE IN MB>`: Size of pool to be used for caching. This limits total memory used by block-cache.
* `--block-cache-path=<PATH>`: Path where downloaded blocks will be persisted. Not providing this parameter will disable the disk caching.
* `--block-cache-disk-size=<SIZE IN MB>`: Disk space to be used for caching.
* `--block-cache-disk-timeout=<seconds>`: Timeout for which disk cache is valid.
* `--block-cache-prefetch=<Number of blocks>`: Number of blocks to prefetch at max when sequential reads are in progress.
* `--block-cache-parallelism=<count>`: Number of parallel threads doing upload/download operation.
* `--block-cache-prefetch-on-open=true`: Start prefetching on open system call instead of waiting for first read. Enhances perf if file is read sequentially from offset 0.
- Fuse options
* `--attr-timeout=<TIMEOUT IN SECONDS>`: Time the kernel can cache inode attributes.
Expand Down Expand Up @@ -190,8 +193,10 @@ Please refer to this diagram to decide on whether to use the file cache or strea

![alt text](./config_decision_tree.png?raw=true "File Cache vs. Streaming")

NOTE: At any point in above diagram `streaming` can be replaced by `block-cache`.
- [Sample File Cache Config](./sampleFileCacheConfig.yaml)
- [Sample Stream Config](./sampleStreamingConfig.yaml)
- [Sample Block-Cache Config](./sampleBlockCacheConfig.yaml)

## Frequently Asked Questions
- How do I generate a SAS with permissions for rename?
Expand Down
134 changes: 133 additions & 1 deletion azure-pipeline-templates/e2e-tests-block-cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ steps:
- script: |
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'head -c {}M < /dev/urandom > ${{ parameters.mount_dir }}/myfile_{}'
ls -l ${{ parameters.mount_dir }}/myfile_*
ls -lh ${{ parameters.mount_dir }}/myfile_*
displayName: 'Generate data'
- script: |
Expand Down Expand Up @@ -105,12 +105,144 @@ steps:
displayName: 'Unmount RO mount'
- script: |
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_block_cache.txt
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_file_cache.txt
echo "----------------------------------------------"
diff $(WORK_DIR)/md5sum_block_cache.txt $(WORK_DIR)/md5sum_file_cache.txt
if [ $? -ne 0 ]; then
exit 1
fi
displayName: 'Compare md5Sum'
- template: 'mount.yml'
parameters:
working_dir: $(WORK_DIR)
mount_dir: ${{ parameters.mount_dir }}
temp_dir: ${{ parameters.temp_dir }}
prefix: ${{ parameters.idstring }}
ro_mount: true
mountStep:
script: |
$(WORK_DIR)/blobfuse2 mount ${{ parameters.mount_dir }} --config-file=${{ parameters.config_file }} --default-working-dir=$(WORK_DIR)
- script: |
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'cp ${{ parameters.mount_dir }}/myfile_{} ${{ parameters.mount_dir }}/myfileCopy_{}'
md5sum ${{ parameters.mount_dir }}/myfileCopy_* > $(WORK_DIR)/md5sum_block_cache_write.txt
ls -lh ${{ parameters.mount_dir }}/myfile*
displayName: 'Copy files using block-cache'
- script: |
rm -rf ${{ parameters.mount_dir }}/myfile*
displayName: 'Clear files using block-cache'
- script: |
$(WORK_DIR)/blobfuse2 unmount all
displayName: 'Unmount RW mount'
- script: |
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_block_cache_write.txt
cat $(WORK_DIR)/md5sum_block_cache_write.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_block_cache_write.txt1
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_file_cache.txt
cat $(WORK_DIR)/md5sum_file_cache.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_file_cache.txt1
echo "----------------------------------------------"
diff $(WORK_DIR)/md5sum_block_cache_write.txt1 $(WORK_DIR)/md5sum_file_cache.txt1
if [ $? -ne 0 ]; then
exit 1
fi
displayName: 'Compare md5Sum'
- template: 'mount.yml'
parameters:
working_dir: $(WORK_DIR)
mount_dir: ${{ parameters.mount_dir }}
temp_dir: ${{ parameters.temp_dir }}
prefix: ${{ parameters.idstring }}
ro_mount: true
mountStep:
script: |
$(WORK_DIR)/blobfuse2 mount ${{ parameters.mount_dir }} --config-file=${{ parameters.config_file }} --default-working-dir=$(WORK_DIR)
- script: |
rm -rf $(WORK_DIR)/localfile*
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'head -c {}M < /dev/urandom > $(WORK_DIR)/localfile{}'
displayName: 'Generate local files'
- script: |
rm -rf ${{ parameters.mount_dir }}/remotefile*
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'cp $(WORK_DIR)/localfile{} ${{ parameters.mount_dir }}/remotefile{}'
displayName: 'Upload local files'
- script: |
md5sum $(WORK_DIR)/localfile* > $(WORK_DIR)/md5sum_local_modified.txt
md5sum ${{ parameters.mount_dir }}/remotefile* > $(WORK_DIR)/md5sum_remote_modified.txt
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_local_modified.txt
cat $(WORK_DIR)/md5sum_local_modified.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_local_modified.txt1
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_remote_modified.txt
cat $(WORK_DIR)/md5sum_remote_modified.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_remote_modified.txt1
echo "----------------------------------------------"
diff $(WORK_DIR)/md5sum_local_modified.txt1 $(WORK_DIR)/md5sum_remote_modified.txt1
if [ $? -ne 0 ]; then
exit 1
fi
head -c 13M < /dev/urandom > $(WORK_DIR)/additionaldata.data
displayName: 'Compare MD5 before modification'
- script: |
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'cat $(WORK_DIR)/additionaldata.data >> $(WORK_DIR)/localfile{}'
ls -lh $(WORK_DIR)/localfile*
displayName: 'Modify local files'
- script: |
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'cat $(WORK_DIR)/additionaldata.data >> ${{ parameters.mount_dir }}/remotefile{}'
ls -lh ${{ parameters.mount_dir }}/remotefile*
displayName: 'Modify remote files'
- script: |
$(WORK_DIR)/blobfuse2 unmount all
displayName: 'Unmount RW mount'
- template: 'mount.yml'
parameters:
working_dir: $(WORK_DIR)
mount_dir: ${{ parameters.mount_dir }}
temp_dir: ${{ parameters.temp_dir }}
prefix: ${{ parameters.idstring }}
ro_mount: true
mountStep:
script: |
$(WORK_DIR)/blobfuse2 mount ${{ parameters.mount_dir }} --config-file=${{ parameters.config_file }} --default-working-dir=$(WORK_DIR)
- script: |
md5sum $(WORK_DIR)/localfile* > $(WORK_DIR)/md5sum_local_modified.txt
md5sum ${{ parameters.mount_dir }}/remotefile* > $(WORK_DIR)/md5sum_remote_modified.txt
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_local_modified.txt
cat $(WORK_DIR)/md5sum_local_modified.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_local_modified.txt1
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_remote_modified.txt
cat $(WORK_DIR)/md5sum_remote_modified.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_remote_modified.txt1
echo "----------------------------------------------"
diff $(WORK_DIR)/md5sum_local_modified.txt1 $(WORK_DIR)/md5sum_remote_modified.txt1
if [ $? -ne 0 ]; then
exit 1
fi
displayName: 'Compare MD5 of modified files'
- script: |
rm -rf $(WORK_DIR)/localfile*
rm -rf ${{ parameters.mount_dir }}/myfile*
displayName: 'Copy files using block-cache'
- script: |
$(WORK_DIR)/blobfuse2 unmount all
displayName: 'Unmount RW mount'
- template: 'cleanup.yml'
parameters:
working_dir: $(WORK_DIR)
Expand Down
76 changes: 71 additions & 5 deletions blobfuse2-perf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ parameters:
- name: resnet_test
displayName: 'ResNet50 Test'
type: boolean
default: true
default: false


stages:
- stage: ShortRunning
jobs:
- job: PerformanceEval
timeoutInMinutes: 240
timeoutInMinutes: 2800 # two day timeout
strategy:
matrix:
Ubuntu-20:
Expand All @@ -38,9 +38,13 @@ stages:
- name: MOUNT_DIR
value: "/home/vsts/workv2/blobfuse2mnt"
- name: TEMP_DIR
value: "/home/vsts/workv2/blobfuse2tmp"
value: "/mnt/blobfuse2tmp"
- name: BLOBFUSE2_CFG
value: "$(System.DefaultWorkingDirectory)/blobfuse2_manual_perf.yaml"
- name: BLOBFUSE2_FILE_CFG
value: "$(System.DefaultWorkingDirectory)/blobfuse2_file_perf.yaml"
- name: BLOBFUSE2_BLOCK_CFG
value: "$(System.DefaultWorkingDirectory)/blobfuse2_block_perf.yaml"
- name: BLOBFUSE_CFG
value: "$(System.DefaultWorkingDirectory)/blobfuse_manual_perf.cfg"
- name: GOPATH
Expand All @@ -58,6 +62,10 @@ stages:
hostnamectl
displayName: 'Print Agent Info'
- script: |
df -h
displayName: 'Print Storage details'
- script: |
sudo apt-get update --fix-missing -o Dpkg::Options::="--force-confnew"
sudo apt-get install fuse3 make cmake gcc g++ python3-setuptools python3-pip parallel fio -y -o Dpkg::Options::="--force-confnew"
Expand All @@ -81,6 +89,9 @@ stages:
sudo chown -R `whoami` $(ROOT_DIR)
chmod 777 $(ROOT_DIR)
mkdir -p $(ROOT_DIR)/go/src
sudo mkdir -p $(TEMP_DIR)
sudo chown -R `whoami` $(TEMP_DIR)
sudo chmod 777 $(TEMP_DIR)
displayName: 'Create Directory Structure'
# Checkout the code
Expand Down Expand Up @@ -119,6 +130,27 @@ stages:
ACCOUNT_ENDPOINT: 'https://$(PERF_WEEKLY_STO_BLOB_ACC_NAME).blob.core.windows.net'
continueOnError: false
- script: |
cd $(WORK_DIR)
$(WORK_DIR)/blobfuse2 gen-test-config --config-file=azure_key_perf.yaml --container-name=cont1 --temp-path=$(TEMP_DIR) --output-file=$(BLOBFUSE2_FILE_CFG)
$(WORK_DIR)/blobfuse2 gen-test-config --config-file=azure_block_perf.yaml --container-name=cont1 --temp-path=$(TEMP_DIR) --output-file=$(BLOBFUSE2_BLOCK_CFG)
echo "---------------------------------------------------"
echo " File Cache config"
echo "---------------------------------------------------"
cat $(BLOBFUSE2_FILE_CFG)
echo "---------------------------------------------------"
echo " Block Cache config"
echo "---------------------------------------------------"
cat $(BLOBFUSE2_BLOCK_CFG)
echo "---------------------------------------------------"
displayName: "Generate v2 Config File for File vs Block"
env:
NIGHTLY_STO_ACC_NAME: $(PERF_WEEKLY_STO_BLOB_ACC_NAME)
NIGHTLY_STO_ACC_KEY: $(PERF_WEEKLY_STO_BLOB_ACC_KEY)
ACCOUNT_TYPE: 'block'
ACCOUNT_ENDPOINT: 'https://$(PERF_WEEKLY_STO_BLOB_ACC_NAME).blob.core.windows.net'
continueOnError: false
- script: |
touch $(BLOBFUSE_CFG)
echo "accountName $(PERF_WEEKLY_STO_BLOB_ACC_NAME)" >> $(BLOBFUSE_CFG)
Expand All @@ -129,10 +161,39 @@ stages:
displayName: "Generate v1 Config File"
continueOnError: false
# --------------------------------------------------------------------------------------------
# Block vs File Tests
- script: |
chmod 777 ./test/scripts/file_block_compare.sh
rm -rf $(MOUNT_DIR)/fio/*
./test/scripts/file_block_compare.sh $(MOUNT_DIR)/fio $(TEMP_DIR) $(BLOBFUSE2_FILE_CFG) $(BLOBFUSE2_BLOCK_CFG) rw
displayName: 'Block-File Compare Test'
workingDirectory: $(WORK_DIR)
- script: |
echo "-----------------------------------------------------------------------------"
echo "Write test results with dd"
echo "-----------------------------------------------------------------------------"
cat file_block_write.txt
echo .
# echo "-----------------------------------------------------------------------------"
# echo "Read test results with dd"
# cat file_block_read_dd.txt
# echo .
echo "-----------------------------------------------------------------------------"
echo "Read test results with FIO"
echo "-----------------------------------------------------------------------------"
cat file_block_read.txt
echo .
echo "-----------------------------------------------------------------------------"
displayName: 'Block-File Compare Test'
workingDirectory: $(WORK_DIR)
# --------------------------------------------------------------------------------------------
# FIO Tests
- script: |
chmod 777 ./test/scripts/fio.sh
rm -rf $(MOUNT_DIR)/fio/*
./test/scripts/fio.sh $(MOUNT_DIR)/fio $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG) rw
displayName: 'FIO Sequential Test'
workingDirectory: $(WORK_DIR)
Expand All @@ -144,6 +205,7 @@ stages:
- script: |
chmod 777 ./test/scripts/fio.sh
rm -rf $(MOUNT_DIR)/fio/*
./test/scripts/fio.sh $(MOUNT_DIR)/fio $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG) randrw
displayName: 'FIO Random Test'
workingDirectory: $(WORK_DIR)
Expand All @@ -155,6 +217,7 @@ stages:
- script: |
chmod 777 ./test/scripts/fio.sh
rm -rf $(MOUNT_DIR)/fio/*
./test/scripts/fio.sh $(MOUNT_DIR)/fio $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG) rw csi
displayName: 'FIO CSI Test'
workingDirectory: $(WORK_DIR)
Expand All @@ -168,7 +231,7 @@ stages:
# Upload-DownloadFIO Tests
- script: |
chmod 777 ./test/scripts/run.sh
./test/scripts/run.sh $(MOUNT_DIR)/run $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG)
./test/scripts/run.sh $(MOUNT_DIR)/run $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG) $(BLOBFUSE2_BLOCK_CFG)
displayName: 'Upload Download'
workingDirectory: $(WORK_DIR)
Expand Down Expand Up @@ -215,7 +278,7 @@ stages:
- name: MOUNT_DIR
value: "/home/vsts/workv2/blob_mnt"
- name: TEMP_DIR
value: "/home/vsts/workv2/blobfuse2tmp"
value: "/mnt/blobfuse2tmp"
- name: BLOBFUSE2_CFG
value: "$(System.DefaultWorkingDirectory)/blobfuse2_manual_perf.yaml"
- name: GOPATH
Expand Down Expand Up @@ -247,6 +310,9 @@ stages:
sudo chown -R `whoami` $(ROOT_DIR)
chmod 777 $(ROOT_DIR)
mkdir -p $(ROOT_DIR)/go/src
sudo mkdir -p $(TEMP_DIR)
sudo chown -R `whoami` $(TEMP_DIR)
sudo chmod 777 $(TEMP_DIR)
displayName: 'Create Directory Structure'
# Checkout the code
Expand Down
Loading

0 comments on commit fdfe001

Please sign in to comment.