Keeping only the oldest versions of duplicate files
I have a simple cron script on the Raspberry Pi which creates a list of installed packages (output of dpkg -l) every day, but most days this list doesn't change so I've a list of dated files with identical contents.
So on a whim I wrote this little script to remove the newest duplicates, i.e. all but the oldest, from a directory of files, or a list of files provided on the command line.
As I create the list in cron using
08 02 * * * dpkg -l > $DPKG_DIR/dpkg.$(date +\%Y-\%m-\%d).txt 2>&1
I get files named
dpkg.2012-09-30.txt and so on.
So to remove the duplicates I run my script as
./remove_newest_dupes.pl -do dpkg.2012-0*
(Well, I usually run it as follows just to check what it's going to do
./remove_newest_dupes.pl -vv dpkg.2012-0*
I'm left with only the first version of each dpkg list file.
The script is located in my Tools directory on github.
You can also provide the option -old to remove the oldest files, leaving just the newest unique files.
The script usage is shown below:
remove_newest_dupes.pl [-h|-help] [-do] [-old] [-v] [
] -h|help: Show this message -do: Doit - actually delete the selected files -old: Removed oldest dupes -v: Increase verbosity This script detects duplicate files and removes the newest duplicates (or oldest if the -old option is specified) This can be useful for example if a cron script creates some daily status files e.g. the output of dpkg -l So we keep only the files as they change and not any intermediate duplicates. e.g. To remove newest duplicated files in current dir: remove_newest_dupes.pl To remove oldest duplicated files in current dir: remove_newest_dupes.pl -old To remove oldest duplicated files from provided list: remove_newest_dupes.pl -old FILE1 FILE1_NEWER FILE1_VERYOLD FILE2 FILE2_OLD Would remove files FILE1 FILE1_VERYOLD FILE2_OLD"; Keeping oldest copies of FILE1, FILE2: FILE1_NEWER FILE2