Skip to content

Commit bdc6052

Browse files
authored
Merge pull request #207 from visual-layer/dbickson-patch-1
Update RUN.md
2 parents 2d5f5f4 + 856cb3d commit bdc6052

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

RUN.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -233,9 +233,9 @@ fastdup.create_duplicates_gallery(os.path.join(test_dir, 'similarity.csv'))
233233

234234
## Working with tar/tgz/zip files as input <a name="tar"/>
235235

236-
Some popular datasets like [LAOIN 400M](https://laion.ai/laion-400-open-dataset/) use webdataset compressed formats. Fastdup supports the following compressed file formats: `tar,tgz,tar.gz,zip`. Those compressed files can be located in a local folder or remote s3 or minio path.
236+
Some popular datasets like [LAION 400M](https://laion.ai/laion-400-open-dataset/) use webdataset compressed formats. Fastdup supports the following compressed file formats: `tar,tgz,tar.gz,zip`. Those compressed files can be located in a local folder or remote s3 or minio path.
237237

238-
For example, the LAOIN dataset contains the following tar files:
238+
For example, the LAION dataset contains the following tar files:
239239

240240
```
241241
00000.tar containing:
@@ -280,7 +280,7 @@ Once all jobs are finished, collect all the output files from the `work_dir` int
280280

281281
```python
282282
import fastdup
283-
fastdup.run('', run_mode=2, work_dir='/path/to/work_dir')
283+
fastdup.run('s3://mybucket/myfolder', run_mode=2, work_dir='/path/to/work_dir')
284284
```
285285

286286
For running on 50M images you will need an ubuntu machine with 32 cores and 256GB RAM. We are working on further scaling the implementation for the full dataset - stay tuned!

0 commit comments

Comments
 (0)