Skip to content

adamThornton/duplicateFileCleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Duplicate File Cleaner

Scans a provided folder and subfolders to find and move duplicate files. No files are deleted.

Approach:

1. Makes a dictionary of file sizes with a list of files for each file size.  This identifies duplicates by file size.
2. Iterates through the dictionary and generates hashes on the file size based list of duplicates and removes items with a hash that only appears once.  This removes files from the dictionary that were a file size match but, indeed, are not an actual duplicate.
3. The remain list is the list of duplicates and is then iterated again to move the files to a staging location prior to the user deleting. A CLI will allow the user to choose which of the duplicates to keep.  The other will be moved.

Usage:

Set these variables to configure the script.

  1. search_root
    Set to the path to start searching from.
  2. target_location
    Set to the path to where files will be moved to.
  3. is_test
    Set to test the behavior of the algorithm without moving the files.
  4. auto_move
    Optional: Set if you don't want to choose the file to keep from the duplicates and instead automatically keep the first one and move the rest.
  5. log_enabled
    Optional: Set to print more verbose messages in the terminal during execution.

Run duplicateFileCleaner.py

About

Finds duplicate files and gives user choice of which one to keep.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages