Duplicate File Cleaner

Scans a provided folder and subfolders to find and move duplicate files. No files are deleted.

Approach:

1. Makes a dictionary of file sizes with a list of files for each file size.  This identifies duplicates by file size.
2. Iterates through the dictionary and generates hashes on the file size based list of duplicates and removes items with a hash that only appears once.  This removes files from the dictionary that were a file size match but, indeed, are not an actual duplicate.
3. The remain list is the list of duplicates and is then iterated again to move the files to a staging location prior to the user deleting. A CLI will allow the user to choose which of the duplicates to keep.  The other will be moved.

Usage:

Set these variables to configure the script.

search_root
Set to the path to start searching from.
target_location
Set to the path to where files will be moved to.
is_test
Set to test the behavior of the algorithm without moving the files.
auto_move
Optional: Set if you don't want to choose the file to keep from the duplicates and instead automatically keep the first one and move the rest.
log_enabled
Optional: Set to print more verbose messages in the terminal during execution.

Run duplicateFileCleaner.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
Readme.md		Readme.md
duplicateFileCleaner.py		duplicateFileCleaner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Duplicate File Cleaner

Approach:

Usage:

About

Uh oh!

Releases

Packages

Languages

adamThornton/duplicateFileCleaner

Folders and files

Latest commit

History

Repository files navigation

Duplicate File Cleaner

Approach:

Usage:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages