Scans a provided folder and subfolders to find and move duplicate files. No files are deleted.
1. Makes a dictionary of file sizes with a list of files for each file size. This identifies duplicates by file size.
2. Iterates through the dictionary and generates hashes on the file size based list of duplicates and removes items with a hash that only appears once. This removes files from the dictionary that were a file size match but, indeed, are not an actual duplicate.
3. The remain list is the list of duplicates and is then iterated again to move the files to a staging location prior to the user deleting. A CLI will allow the user to choose which of the duplicates to keep. The other will be moved.
Set these variables to configure the script.
- search_root
Set to the path to start searching from. - target_location
Set to the path to where files will be moved to. - is_test
Set to test the behavior of the algorithm without moving the files. - auto_move
Optional: Set if you don't want to choose the file to keep from the duplicates and instead automatically keep the first one and move the rest. - log_enabled
Optional: Set to print more verbose messages in the terminal during execution.
Run duplicateFileCleaner.py