Sunday, September 30, 2012

Keeping only the oldest versions of duplicate files

I wrote a useful little utility this morning.
I have a simple cron script on the Raspberry Pi which creates a list of installed packages (output of dpkg -l) every day, but most days this list doesn't change so I've a list of dated files with identical contents.

So on a whim I wrote this little script to remove the newest duplicates, i.e. all but the oldest, from a directory of files, or a list of files provided on the command line.

As I create the list in cron using
     08 02 * * * dpkg -l > $DPKG_DIR/dpkg.$(date +\%Y-\%m-\%d).txt 2>&1

I get files named
     dpkg.2012-09-30.txt and so on.

So to remove the duplicates I run my script as
     ./remove_newest_dupes.pl -do dpkg.2012-0*

(Well, I usually run it as follows just to check what it's going to do
     ./remove_newest_dupes.pl -vv dpkg.2012-0*
)

I'm left with only the first version of each dpkg list file.

The script is located in my Tools directory on github.

You can also provide the option -old to remove the oldest files, leaving just the newest unique files.

The script usage is shown below:

Usage: remove_newest_dupes.pl

  remove_newest_dupes.pl [-h|-help] [-do] [-old] [-v] []

    -h|help: Show this message

    -do:  Doit - actually delete the selected files
    -old: Removed oldest dupes
    -v:   Increase verbosity

This script detects duplicate files and removes the newest duplicates
  (or oldest if the -old option is specified)

This can be useful for example if a cron script creates some daily status files
e.g. the output of dpkg -l
So we keep only the files as they change and not any intermediate duplicates.

e.g.
  To remove newest duplicated files in current dir:
    remove_newest_dupes.pl

  To remove oldest duplicated files in current dir:
    remove_newest_dupes.pl -old

  To remove oldest duplicated files from provided list:
    remove_newest_dupes.pl -old FILE1 FILE1_NEWER FILE1_VERYOLD FILE2 FILE2_OLD

  Would remove files FILE1 FILE1_VERYOLD FILE2_OLD";
  Keeping oldest copies of FILE1, FILE2: FILE1_NEWER FILE2

No comments:

[Conference - CodeEurope.pl] Developing Micro-services on Kubernetes

In April I had the chance to present at CodeEurope.pl , first in Warsaw on Apr 24th, and then in Wroclaw ("wroslof" was my best at...