Image of the glider from the Game of Life by John Conway
Skip to content

A Note About Removing Files With find(1)

I've seen on the internet, and elsewhere, that when there are too many arguments for rm(1) to handle, that the following command will suffice:

% find /path -exec rm -rf {} \;

While certainly functional, it's not optimal. If there are thousands of files (as is often the case at my job), this command is slow, slow, slow. The reason being are all the excessive fork() and exec() calls for each pass with rm(1). Instead, you could optimize find(1) by using "-delete":

% find /path -delete

This is much more optimal, but it has one VERY nasty side effect. If you place "-delete" in the wrong spot in your find(1) command, you could delete all the files listed under "/path" before processing the necessary logic. From the find(1) manual:

Warnings: Don't forget that the find command line is evaluated as an expression, so putting -delete first will make find try to delete everything below the starting points you specified. When testing a find line that you later intend to use with -delete, you should explicitly specify -depth in order to avoid later surprises. Because -delete implies -depth, you cannot usefully use -prune and -delete together.

One nice benefit of "-delete", however, is the proper handling of NUL characters in your filename, such as spaces, tabs or the newline character. Thankfully, there is another option, which is not only supported in GNU/Linux, but also in FreeBSD (and perhaps others):

% find /path -print0 | xargs -0 rm -rf

This avoids the excessive fork() and exec() system calls from our first command, and doesn't have the nasty side effects of "-delete". Further, because of "-print0" as a find(1) argument, and "-0" with xargs(1), we can handle files properly with NUL characters. Time the three commands above, and you'll see that the last is most optimal.

We can squeeze some extra juice out of the command, though. All we need to do is cd(1) to the directory we wish to operate our find(1) command on:

% cd /path && find . -print0 | xargs -0 rm -rf

Working with removing millions of files (yes, I do actually remove that many, often), I have found this latest find(1) command to be the most optimized in terms of sheer speed. It moves. You may find the same results as I.

FYI.

{ 6 } Comments

  1. patrick using Mozilla Compatible 5.0 on GNU/Linux 64 bits | December 21, 2011 at 12:54 am | Permalink

    hi,

    you should improve your last command like so

    cd /path && find . -print0 | xargs -0 rm -rf

    to avoid deleting all files from the cwd if the cd fails (e.g. due to spelling mistake although i guess one would use tab completion of the terminal anyway)...

    you could also use the find command to delete many files like this

    find /path -exec rm -rf {} +

    which builds the args for rm like xargs (check man page)

    patrick

  2. richard using Firefox 8.0 on Windows 7 | December 21, 2011 at 3:33 am | Permalink

    Hi Aaron,

    You say

    -- begin quote --
    because of “-print0″ as a find(1) argument, and “-0″ with xargs(1), we can handle files with NUL characters
    -- end quote --

    But this is not correct. the -0 uses a NUL character to terminate the name of each file when passing between find and xargs. This allows files with whitespace, quotes and similar special characters to be deleted. But it does not allow files with a NUL character in the name.

    Richard

  3. Aaron Toponce using Debian IceWeasel 9.0 on GNU/Linux 64 bits | December 21, 2011 at 9:48 am | Permalink

    @richard- I think you misunderstood the sentence. Please try reading it again (this time with some updates).

  4. Aaron Toponce using Debian IceWeasel 9.0 on GNU/Linux 64 bits | December 21, 2011 at 9:48 am | Permalink

    @patrick- Yes, you're absolutely right about "&&" versus ";". Updated.

  5. Flimm using Google Chrome 16.0.912.63 on GNU/Linux 64 bits | December 21, 2011 at 12:25 pm | Permalink

    "the proper handling of NUL characters in your filename, such as spaces, tabs or the newline character"

    The space character (U+0020), the tab character (U+0009) and the newline character (U+000A) are all examples of whitespace. They are not the NUL character.

    There is only one NUL character (U+0000). In ASCII and in UTF-8, in is encoded as a single byte with the value zero. http://en.wikipedia.org/wiki/Null_character

    The NUL character and the forward slash character are the only two banned characters in a filename. Because the NUL character/byte is guaranteed not be found in a filename, you can separate a list of filenames with a NUL byte. This is why -print0 is considered safe.

  6. richard using Firefox 8.0 on Windows 7 | December 21, 2011 at 12:51 pm | Permalink

    Aaron, I think that the confusion is as @Flimm mentions that NUL is not the same as whitespace. To quote the xargs manpage:

    "-0 Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literally)."

    Regards, Richard

Post a Comment

Your email is never published nor shared.

Switch to our mobile site