Skip to content

Commit fdfe001

Browse files
Write support using Block-Cache (#1258)
* Adding support for writing using block-cache
1 parent d918c11 commit fdfe001

34 files changed

+2132
-115
lines changed

CHANGELOG.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
1-
## 2.1.3 (Unreleased)
1+
## 2.2.0 (Unreleased)
22
**Bug Fixes**
33
- Invalidate attribute cache entry on `PathAlreadyExists` error in create directory operation.
44
- When `$HOME` environment variable is not present, use the current directory.
55
- Fixed mount failure on nonempty mount path for fuse3.
66

77
**Features**
88
- Support CPK for block storage accounts.
9+
- Added support to write files using block-cache
10+
- Optimized for sequential writing
11+
- Editing/Appending existing files works only if files were originally created using block-cache with the same block size
912

1013
## 2.1.2 (2023-11-17)
1114
**Bug Fixes**

README.md

+8-3
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Blobfuse2 is stable, and is ***supported by Microsoft*** provided that it is use
77

88
## NOTICE
99
- We have seen some customer issues around files getting corrupted when `streaming` is used in write mode. Kindly avoid using this feature for write while we investigate and resolve it.
10-
10+
- You can now use block-cache instead of streaming for both read and write workflows, which offers much better performance compared to streaming. To enable `block-cache` instead of `streaming`, use `--block-cache` in CLI param or `block-cache` as component in config file instead of `streaming`.
1111
## Supported Platforms
1212
Visit [this](https://github.com/Azure/azure-storage-fuse/wiki/Blobfuse2-Supported-Platforms) page to see list of supported linux distros.
1313

@@ -16,7 +16,7 @@ Visit [this](https://github.com/Azure/azure-storage-fuse/wiki/Blobfuse2-Supporte
1616
- Basic file system operations such as mkdir, opendir, readdir, rmdir, open,
1717
read, create, write, close, unlink, truncate, stat, rename
1818
- Local caching to improve subsequent access times
19-
- Streaming to support reading AND writing large files
19+
- Streaming/Block-Cache to support reading AND writing large files
2020
- Parallel downloads and uploads to improve access time for large files
2121
- Multiple mounts to the same container for read-only workloads
2222

@@ -43,7 +43,7 @@ One of the biggest BlobFuse2 features is our brand new health monitor. It allows
4343
- CLI to check or update a parameter in the encrypted config
4444
- Set MD5 sum of a blob while uploading
4545
- Validate MD5 sum on download and fail file open on mismatch
46-
- Large file writing through write streaming
46+
- Large file writing through write streaming/Block-Cache
4747

4848
## Blobfuse2 performance compared to blobfuse(v1.x.x)
4949
- 'git clone' operation is 25% faster (tested with vscode repo cloning)
@@ -112,6 +112,7 @@ To learn about a specific command, just include the name of the command (For exa
112112
* `--secure-config=true` : Config file is encrypted suing 'blobfuse2 secure` command.
113113
* `--passphrase=<STRING>` : Passphrase used to encrypt/decrypt config file.
114114
* `--wait-for-mount=<TIMEOUT IN SECONDS>` : Let parent process wait for given timeout before exit to ensure child has started.
115+
* `--block-cache` : To enable block-cache instead of file-cache. This works only when mounted without any config file.
115116
- Attribute cache options
116117
* `--attr-cache-timeout=<TIMEOUT IN SECONDS>`: The timeout for the attribute cache entries.
117118
* `--no-symlinks=true`: To improve performance disable symlink support.
@@ -137,7 +138,9 @@ To learn about a specific command, just include the name of the command (For exa
137138
* `--block-cache-pool-size=<SIZE IN MB>`: Size of pool to be used for caching. This limits total memory used by block-cache.
138139
* `--block-cache-path=<PATH>`: Path where downloaded blocks will be persisted. Not providing this parameter will disable the disk caching.
139140
* `--block-cache-disk-size=<SIZE IN MB>`: Disk space to be used for caching.
141+
* `--block-cache-disk-timeout=<seconds>`: Timeout for which disk cache is valid.
140142
* `--block-cache-prefetch=<Number of blocks>`: Number of blocks to prefetch at max when sequential reads are in progress.
143+
* `--block-cache-parallelism=<count>`: Number of parallel threads doing upload/download operation.
141144
* `--block-cache-prefetch-on-open=true`: Start prefetching on open system call instead of waiting for first read. Enhances perf if file is read sequentially from offset 0.
142145
- Fuse options
143146
* `--attr-timeout=<TIMEOUT IN SECONDS>`: Time the kernel can cache inode attributes.
@@ -190,8 +193,10 @@ Please refer to this diagram to decide on whether to use the file cache or strea
190193

191194
![alt text](./config_decision_tree.png?raw=true "File Cache vs. Streaming")
192195

196+
NOTE: At any point in above diagram `streaming` can be replaced by `block-cache`.
193197
- [Sample File Cache Config](./sampleFileCacheConfig.yaml)
194198
- [Sample Stream Config](./sampleStreamingConfig.yaml)
199+
- [Sample Block-Cache Config](./sampleBlockCacheConfig.yaml)
195200

196201
## Frequently Asked Questions
197202
- How do I generate a SAS with permissions for rename?

azure-pipeline-templates/e2e-tests-block-cache.yml

+133-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ steps:
6363
6464
- script: |
6565
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'head -c {}M < /dev/urandom > ${{ parameters.mount_dir }}/myfile_{}'
66-
ls -l ${{ parameters.mount_dir }}/myfile_*
66+
ls -lh ${{ parameters.mount_dir }}/myfile_*
6767
displayName: 'Generate data'
6868
6969
- script: |
@@ -105,12 +105,144 @@ steps:
105105
displayName: 'Unmount RO mount'
106106
107107
- script: |
108+
echo "----------------------------------------------"
109+
cat $(WORK_DIR)/md5sum_block_cache.txt
110+
echo "----------------------------------------------"
111+
cat $(WORK_DIR)/md5sum_file_cache.txt
112+
echo "----------------------------------------------"
108113
diff $(WORK_DIR)/md5sum_block_cache.txt $(WORK_DIR)/md5sum_file_cache.txt
109114
if [ $? -ne 0 ]; then
110115
exit 1
111116
fi
112117
displayName: 'Compare md5Sum'
113118
119+
- template: 'mount.yml'
120+
parameters:
121+
working_dir: $(WORK_DIR)
122+
mount_dir: ${{ parameters.mount_dir }}
123+
temp_dir: ${{ parameters.temp_dir }}
124+
prefix: ${{ parameters.idstring }}
125+
ro_mount: true
126+
mountStep:
127+
script: |
128+
$(WORK_DIR)/blobfuse2 mount ${{ parameters.mount_dir }} --config-file=${{ parameters.config_file }} --default-working-dir=$(WORK_DIR)
129+
130+
- script: |
131+
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'cp ${{ parameters.mount_dir }}/myfile_{} ${{ parameters.mount_dir }}/myfileCopy_{}'
132+
md5sum ${{ parameters.mount_dir }}/myfileCopy_* > $(WORK_DIR)/md5sum_block_cache_write.txt
133+
ls -lh ${{ parameters.mount_dir }}/myfile*
134+
displayName: 'Copy files using block-cache'
135+
136+
- script: |
137+
rm -rf ${{ parameters.mount_dir }}/myfile*
138+
displayName: 'Clear files using block-cache'
139+
140+
- script: |
141+
$(WORK_DIR)/blobfuse2 unmount all
142+
displayName: 'Unmount RW mount'
143+
144+
- script: |
145+
echo "----------------------------------------------"
146+
cat $(WORK_DIR)/md5sum_block_cache_write.txt
147+
cat $(WORK_DIR)/md5sum_block_cache_write.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_block_cache_write.txt1
148+
echo "----------------------------------------------"
149+
cat $(WORK_DIR)/md5sum_file_cache.txt
150+
cat $(WORK_DIR)/md5sum_file_cache.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_file_cache.txt1
151+
echo "----------------------------------------------"
152+
diff $(WORK_DIR)/md5sum_block_cache_write.txt1 $(WORK_DIR)/md5sum_file_cache.txt1
153+
if [ $? -ne 0 ]; then
154+
exit 1
155+
fi
156+
displayName: 'Compare md5Sum'
157+
158+
- template: 'mount.yml'
159+
parameters:
160+
working_dir: $(WORK_DIR)
161+
mount_dir: ${{ parameters.mount_dir }}
162+
temp_dir: ${{ parameters.temp_dir }}
163+
prefix: ${{ parameters.idstring }}
164+
ro_mount: true
165+
mountStep:
166+
script: |
167+
$(WORK_DIR)/blobfuse2 mount ${{ parameters.mount_dir }} --config-file=${{ parameters.config_file }} --default-working-dir=$(WORK_DIR)
168+
169+
- script: |
170+
rm -rf $(WORK_DIR)/localfile*
171+
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'head -c {}M < /dev/urandom > $(WORK_DIR)/localfile{}'
172+
displayName: 'Generate local files'
173+
174+
- script: |
175+
rm -rf ${{ parameters.mount_dir }}/remotefile*
176+
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'cp $(WORK_DIR)/localfile{} ${{ parameters.mount_dir }}/remotefile{}'
177+
displayName: 'Upload local files'
178+
179+
- script: |
180+
md5sum $(WORK_DIR)/localfile* > $(WORK_DIR)/md5sum_local_modified.txt
181+
md5sum ${{ parameters.mount_dir }}/remotefile* > $(WORK_DIR)/md5sum_remote_modified.txt
182+
echo "----------------------------------------------"
183+
cat $(WORK_DIR)/md5sum_local_modified.txt
184+
cat $(WORK_DIR)/md5sum_local_modified.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_local_modified.txt1
185+
echo "----------------------------------------------"
186+
cat $(WORK_DIR)/md5sum_remote_modified.txt
187+
cat $(WORK_DIR)/md5sum_remote_modified.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_remote_modified.txt1
188+
echo "----------------------------------------------"
189+
diff $(WORK_DIR)/md5sum_local_modified.txt1 $(WORK_DIR)/md5sum_remote_modified.txt1
190+
if [ $? -ne 0 ]; then
191+
exit 1
192+
fi
193+
head -c 13M < /dev/urandom > $(WORK_DIR)/additionaldata.data
194+
displayName: 'Compare MD5 before modification'
195+
196+
- script: |
197+
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'cat $(WORK_DIR)/additionaldata.data >> $(WORK_DIR)/localfile{}'
198+
ls -lh $(WORK_DIR)/localfile*
199+
displayName: 'Modify local files'
200+
201+
- script: |
202+
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200,1024,2048,4096}; do echo $i; done | parallel --will-cite -j 5 'cat $(WORK_DIR)/additionaldata.data >> ${{ parameters.mount_dir }}/remotefile{}'
203+
ls -lh ${{ parameters.mount_dir }}/remotefile*
204+
displayName: 'Modify remote files'
205+
206+
- script: |
207+
$(WORK_DIR)/blobfuse2 unmount all
208+
displayName: 'Unmount RW mount'
209+
210+
- template: 'mount.yml'
211+
parameters:
212+
working_dir: $(WORK_DIR)
213+
mount_dir: ${{ parameters.mount_dir }}
214+
temp_dir: ${{ parameters.temp_dir }}
215+
prefix: ${{ parameters.idstring }}
216+
ro_mount: true
217+
mountStep:
218+
script: |
219+
$(WORK_DIR)/blobfuse2 mount ${{ parameters.mount_dir }} --config-file=${{ parameters.config_file }} --default-working-dir=$(WORK_DIR)
220+
221+
- script: |
222+
md5sum $(WORK_DIR)/localfile* > $(WORK_DIR)/md5sum_local_modified.txt
223+
md5sum ${{ parameters.mount_dir }}/remotefile* > $(WORK_DIR)/md5sum_remote_modified.txt
224+
echo "----------------------------------------------"
225+
cat $(WORK_DIR)/md5sum_local_modified.txt
226+
cat $(WORK_DIR)/md5sum_local_modified.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_local_modified.txt1
227+
echo "----------------------------------------------"
228+
cat $(WORK_DIR)/md5sum_remote_modified.txt
229+
cat $(WORK_DIR)/md5sum_remote_modified.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_remote_modified.txt1
230+
echo "----------------------------------------------"
231+
diff $(WORK_DIR)/md5sum_local_modified.txt1 $(WORK_DIR)/md5sum_remote_modified.txt1
232+
if [ $? -ne 0 ]; then
233+
exit 1
234+
fi
235+
displayName: 'Compare MD5 of modified files'
236+
237+
- script: |
238+
rm -rf $(WORK_DIR)/localfile*
239+
rm -rf ${{ parameters.mount_dir }}/myfile*
240+
displayName: 'Copy files using block-cache'
241+
242+
- script: |
243+
$(WORK_DIR)/blobfuse2 unmount all
244+
displayName: 'Unmount RW mount'
245+
114246
- template: 'cleanup.yml'
115247
parameters:
116248
working_dir: $(WORK_DIR)

blobfuse2-perf.yaml

+71-5
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,14 @@ parameters:
1313
- name: resnet_test
1414
displayName: 'ResNet50 Test'
1515
type: boolean
16-
default: true
16+
default: false
1717

1818

1919
stages:
2020
- stage: ShortRunning
2121
jobs:
2222
- job: PerformanceEval
23-
timeoutInMinutes: 240
23+
timeoutInMinutes: 2800 # two day timeout
2424
strategy:
2525
matrix:
2626
Ubuntu-20:
@@ -38,9 +38,13 @@ stages:
3838
- name: MOUNT_DIR
3939
value: "/home/vsts/workv2/blobfuse2mnt"
4040
- name: TEMP_DIR
41-
value: "/home/vsts/workv2/blobfuse2tmp"
41+
value: "/mnt/blobfuse2tmp"
4242
- name: BLOBFUSE2_CFG
4343
value: "$(System.DefaultWorkingDirectory)/blobfuse2_manual_perf.yaml"
44+
- name: BLOBFUSE2_FILE_CFG
45+
value: "$(System.DefaultWorkingDirectory)/blobfuse2_file_perf.yaml"
46+
- name: BLOBFUSE2_BLOCK_CFG
47+
value: "$(System.DefaultWorkingDirectory)/blobfuse2_block_perf.yaml"
4448
- name: BLOBFUSE_CFG
4549
value: "$(System.DefaultWorkingDirectory)/blobfuse_manual_perf.cfg"
4650
- name: GOPATH
@@ -58,6 +62,10 @@ stages:
5862
hostnamectl
5963
displayName: 'Print Agent Info'
6064
65+
- script: |
66+
df -h
67+
displayName: 'Print Storage details'
68+
6169
- script: |
6270
sudo apt-get update --fix-missing -o Dpkg::Options::="--force-confnew"
6371
sudo apt-get install fuse3 make cmake gcc g++ python3-setuptools python3-pip parallel fio -y -o Dpkg::Options::="--force-confnew"
@@ -81,6 +89,9 @@ stages:
8189
sudo chown -R `whoami` $(ROOT_DIR)
8290
chmod 777 $(ROOT_DIR)
8391
mkdir -p $(ROOT_DIR)/go/src
92+
sudo mkdir -p $(TEMP_DIR)
93+
sudo chown -R `whoami` $(TEMP_DIR)
94+
sudo chmod 777 $(TEMP_DIR)
8495
displayName: 'Create Directory Structure'
8596
8697
# Checkout the code
@@ -119,6 +130,27 @@ stages:
119130
ACCOUNT_ENDPOINT: 'https://$(PERF_WEEKLY_STO_BLOB_ACC_NAME).blob.core.windows.net'
120131
continueOnError: false
121132
133+
- script: |
134+
cd $(WORK_DIR)
135+
$(WORK_DIR)/blobfuse2 gen-test-config --config-file=azure_key_perf.yaml --container-name=cont1 --temp-path=$(TEMP_DIR) --output-file=$(BLOBFUSE2_FILE_CFG)
136+
$(WORK_DIR)/blobfuse2 gen-test-config --config-file=azure_block_perf.yaml --container-name=cont1 --temp-path=$(TEMP_DIR) --output-file=$(BLOBFUSE2_BLOCK_CFG)
137+
echo "---------------------------------------------------"
138+
echo " File Cache config"
139+
echo "---------------------------------------------------"
140+
cat $(BLOBFUSE2_FILE_CFG)
141+
echo "---------------------------------------------------"
142+
echo " Block Cache config"
143+
echo "---------------------------------------------------"
144+
cat $(BLOBFUSE2_BLOCK_CFG)
145+
echo "---------------------------------------------------"
146+
displayName: "Generate v2 Config File for File vs Block"
147+
env:
148+
NIGHTLY_STO_ACC_NAME: $(PERF_WEEKLY_STO_BLOB_ACC_NAME)
149+
NIGHTLY_STO_ACC_KEY: $(PERF_WEEKLY_STO_BLOB_ACC_KEY)
150+
ACCOUNT_TYPE: 'block'
151+
ACCOUNT_ENDPOINT: 'https://$(PERF_WEEKLY_STO_BLOB_ACC_NAME).blob.core.windows.net'
152+
continueOnError: false
153+
122154
- script: |
123155
touch $(BLOBFUSE_CFG)
124156
echo "accountName $(PERF_WEEKLY_STO_BLOB_ACC_NAME)" >> $(BLOBFUSE_CFG)
@@ -129,10 +161,39 @@ stages:
129161
displayName: "Generate v1 Config File"
130162
continueOnError: false
131163
164+
# --------------------------------------------------------------------------------------------
165+
# Block vs File Tests
166+
- script: |
167+
chmod 777 ./test/scripts/file_block_compare.sh
168+
rm -rf $(MOUNT_DIR)/fio/*
169+
./test/scripts/file_block_compare.sh $(MOUNT_DIR)/fio $(TEMP_DIR) $(BLOBFUSE2_FILE_CFG) $(BLOBFUSE2_BLOCK_CFG) rw
170+
displayName: 'Block-File Compare Test'
171+
workingDirectory: $(WORK_DIR)
172+
173+
- script: |
174+
echo "-----------------------------------------------------------------------------"
175+
echo "Write test results with dd"
176+
echo "-----------------------------------------------------------------------------"
177+
cat file_block_write.txt
178+
echo .
179+
# echo "-----------------------------------------------------------------------------"
180+
# echo "Read test results with dd"
181+
# cat file_block_read_dd.txt
182+
# echo .
183+
echo "-----------------------------------------------------------------------------"
184+
echo "Read test results with FIO"
185+
echo "-----------------------------------------------------------------------------"
186+
cat file_block_read.txt
187+
echo .
188+
echo "-----------------------------------------------------------------------------"
189+
displayName: 'Block-File Compare Test'
190+
workingDirectory: $(WORK_DIR)
191+
132192
# --------------------------------------------------------------------------------------------
133193
# FIO Tests
134194
- script: |
135195
chmod 777 ./test/scripts/fio.sh
196+
rm -rf $(MOUNT_DIR)/fio/*
136197
./test/scripts/fio.sh $(MOUNT_DIR)/fio $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG) rw
137198
displayName: 'FIO Sequential Test'
138199
workingDirectory: $(WORK_DIR)
@@ -144,6 +205,7 @@ stages:
144205
145206
- script: |
146207
chmod 777 ./test/scripts/fio.sh
208+
rm -rf $(MOUNT_DIR)/fio/*
147209
./test/scripts/fio.sh $(MOUNT_DIR)/fio $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG) randrw
148210
displayName: 'FIO Random Test'
149211
workingDirectory: $(WORK_DIR)
@@ -155,6 +217,7 @@ stages:
155217
156218
- script: |
157219
chmod 777 ./test/scripts/fio.sh
220+
rm -rf $(MOUNT_DIR)/fio/*
158221
./test/scripts/fio.sh $(MOUNT_DIR)/fio $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG) rw csi
159222
displayName: 'FIO CSI Test'
160223
workingDirectory: $(WORK_DIR)
@@ -168,7 +231,7 @@ stages:
168231
# Upload-DownloadFIO Tests
169232
- script: |
170233
chmod 777 ./test/scripts/run.sh
171-
./test/scripts/run.sh $(MOUNT_DIR)/run $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG)
234+
./test/scripts/run.sh $(MOUNT_DIR)/run $(TEMP_DIR) $(BLOBFUSE2_CFG) $(BLOBFUSE_CFG) $(BLOBFUSE2_BLOCK_CFG)
172235
173236
displayName: 'Upload Download'
174237
workingDirectory: $(WORK_DIR)
@@ -215,7 +278,7 @@ stages:
215278
- name: MOUNT_DIR
216279
value: "/home/vsts/workv2/blob_mnt"
217280
- name: TEMP_DIR
218-
value: "/home/vsts/workv2/blobfuse2tmp"
281+
value: "/mnt/blobfuse2tmp"
219282
- name: BLOBFUSE2_CFG
220283
value: "$(System.DefaultWorkingDirectory)/blobfuse2_manual_perf.yaml"
221284
- name: GOPATH
@@ -247,6 +310,9 @@ stages:
247310
sudo chown -R `whoami` $(ROOT_DIR)
248311
chmod 777 $(ROOT_DIR)
249312
mkdir -p $(ROOT_DIR)/go/src
313+
sudo mkdir -p $(TEMP_DIR)
314+
sudo chown -R `whoami` $(TEMP_DIR)
315+
sudo chmod 777 $(TEMP_DIR)
250316
displayName: 'Create Directory Structure'
251317
252318
# Checkout the code

0 commit comments

Comments
 (0)