Softpanorama

May the source be with you, but remember the KISS principle ;-)
Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Unix find tutorial

Additional means of controlling tree traversal

Prev | Contents | Next

Part 9: Additional means of controlling tree traversal

By default, the find command searches all subdirectories under the specified path, or using only specified filesystems (see also Unix Find Tutorial: Using find for backups)

Find has several options that help to control tree traversal. But you are not limited to using only find capabilities. any file search can be organized as two stages:

  1. Selecting the directories in which the file search will be performed
  2. Performing actual file search using the set of directories found in previous step.

This is far more powerful method than any "native" find capabilities of controlling traversal. But in simple cases "native" find capabilities of controlling tree traversal can be adequate. Among native capabilities that are the most useful we can mention:

  1. -mount Don't descend directories on other filesystems. An alternate name for -xdev, for compatibility with some other versions of find.
  2. -fstype type File is on a filesystem of type type. The valid filesystem types vary among different versions of Unix; an incomplete list of filesystem types that are accepted on some version of Unix or another is: ufs, 4.2, 4.3, nfs, tmp, mfs, S51K, S52K. You can use -printf with the %F directive to see the types of your filesystems.
  3. -prune -- if the current file is a directory, do not descend into it during search, just ignore it. This predicate always evaluates to true on all files and "consumes" all found directories. In other words it has a side effect of blocking the descend into the directory, if the current file in a search is a directory. Note: If -depth is given, the predicate evaluates to false. Because -delete implies -depth, you cannot usefully use -prune and -delete together.

    It can be used to ignore the directories below current. If the expression contains no actions other than -prune, -print is performed on all files for which the expression is true. Note, that -prune predicate is ignored if -d option is specified.

    To list files in the current directory using GNU find:

    find . \( -path './*' -prune \} ...

    On any version of find you can use this more complex (but more portable) code:

    find . \( ! -name . -prune \) ...

    which says to prune (don't descend into) any directories except .(current). For example,

    find . \( ! -name . -prune \) -name "*.c" -print
  4. -path pattern. File name matches simple (DOS-style or shell) regulat expression. Shell regular expressions do not treat `/' or `.' as metacharacheters. for example,
    find . -path "./sr*sc"
    will print an entry for a directory called ./src/misc (if one exists). To ignore a whole directory tree, use -prune rather than checking every file in the tree. For example, to skip the directory `src/emacs' and all files and directories under it, and print the names of the other files found, do something like this:
    find . -path ./src/emacs -prune -o -print
    Note that the pattern match test applies to the whole file name, starting from one of the start points named on the command line. It would only make sense to use an absolute path name here if the relevant start point is also an absolute path. This means that this command will never match anything:
    find bar -path /foo/bar/myfile -print
    Find compares the -path argument with the concatenation of a directory name and the base name of the file it's examining. Since the concatenation will never end with a slash, -path arguments ending in a slash will match nothing (except perhaps a start point specified on the command line). The predicate -path is also supported by HP-UX find and will be in a forthcoming version of the POSIX standard.
  5. -maxdepth is a GNU extension. Using the maxdepth option, the search depth can be limited. To run a find command limited to only the current directory and not search through any subdirectories, use the -maxdepth 0 option. For example
    find . -maxdepth 0 -print

    To search one level of directories below /usr, use:

    find /usr -maxdepth 1 -print
The maxdepth option can be used in conjunction with other find options such as -name, -nouser, -atime, etc.

Note:

-maxdepth 1 will include . unless you also specify -mindepth 1. A portable way to include . is:

 find . \( -name . -o -prune \) ...

The \( and \) are just parenthesis used for grouping, and escaped from the shell.

[This information was posted by Stephane Chazelas, on 3/10/09 in newsgroup comp.unix.shell.]

A more complex examples

Example 1:

As a system administrator you can use find to locate suspicious files (e.g., world writable files, files with no valid owner and/or group, SetUID files, files with unusual permissions, sizes, names, or dates). Here's a final more complex example (which I saved as a shell script):

find / -noleaf -wholename '/proc' -prune \
     -o -wholename '/sys' -prune \
     -o -wholename '/dev' -prune \
     -o -perm -2 ! -type l  ! -type s \
     ! \( -type d -perm -1000 \) -print

This says to search the whole system, skipping the directories /proc, /sys, and /dev. The Gnu -noleaf option tells find not to assume all remaining mounted filesystems are Unix file systems (you might have a mounted CD for instance). The -o is the Boolean OR operator, and ! is the Boolean NOT operator (applies to the following criteria).

So these criteria say to locate files that are world writable (-perm -2, same as -o=w) and NOT symlinks (! -type l) and NOT sockets (! -type s) and NOT directories with the sticky (or text) bit set (! \( -type d -perm -1000 \)). (Symlinks, sockets and directories with the sticky bit set are often world-writable and generally not suspicious.)

A common request is a way to find all the hard links to some file. Using ls -li file will tell you how many hard links the file has, and the inode number. You can locate all pathnames to this file with:

     find mount-point -xdev -inum inode-number

Since hard links are restricted to a single filesystem, you need to search that whole filesystem so you start the search at the filesystem's mount point. (This is likely to be either /home or / for files in your home directory.) The -xdev options tells find to not search any other filesystems.

(While most Unix and all Linux systems have a find command that supports the -inum criterion, this isn't POSIX standard. Older Unix systems provided the ncheck utility instead that could be used for this.)

Example 2:

cd /source-dir 
find . -name .snapshot -prune -o \( \! -name *~ -print0 \) | cpio -pmd0 /dest-dir
This command copies the contents of /source-dir to /dest-dir, but omits files and directories named .snapshot (and anything in them). It also omits files or directories whose name ends in ~, but not their contents. The construct -prune -o \( ... -print0 \) is quite common. The idea here is that the expression before -prune matches things which are to be pruned. However, the -prune action itself returns true, so the following -o ensures that the right hand side is evaluated only for those directories which didn't get pruned (the contents of the pruned directories are not even visited, so their contents are irrelevant). The expression on the right hand side of the -o is in parentheses only for clarity.

It emphasizes that the -print0 action takes place only for things that didn't have -prune applied to them. Because the default `and' condition between tests binds more tightly than -o, this is the default anyway, but the parentheses help to show what is going on.


Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

Traversal control By Jerry Peek

September 22, 2008 | www.linux-mag.com

Since find(1) came into being decades ago, programmers have been adding new features. Here's the fourth of a series about some of those. Jerry Peek Monday, September 22nd, 2008

A few months ago, we finished the third of a series about features added to longstanding utility programs. This month we'll look at the new features that GNU programmers and others have added to all of the other features that find(1) already had. (You can "find" an introduction to find in here: A Very Valuable Find.

There are lots of versions of find. We'll cover the GNU version 4.1.20 (the latest, as of this writing, from the Debian stable distribution).

Filename Matching

Older versions of find had one way to check the name of an entry: the -name test. The argument to -name is a case-sensitive filename or shell wildcard pattern.

Shell wildcards are simpler than grep-like regular expressions, but they limit the matching -name can do. For instance, matching a file named with all uppercase characters is tough with shell wildcards (but simple with a regular expression, as we'll see soon with -regex).

The string or wildcard pattern after -name is compared to the name of the entry currently being scanned, not that entry's pathname. So, for instance, it's easy to know whether the current filename ends in .c, but it's a lot harder to know whether that file is in a directory named src.

The -path test, which was added fairly early to many find versions, is a shell wildcard-type pattern match against the entire current pathname. So, the test -path '*src/*.c' gets close to what we want here: it matches any pathname containing src, followed by any number of characters and a literal .c. That could be a file ./src/foo.c, but it could also be a file ./src/subdir/bar.c, or ./TeXsrc/foo.c, or something even messier. The wide-open matching of * meaning "zero or more of any character" can cause trouble when you need a specific pathname match.

The GNU find has several other name tests:

Another new test is -lname, which matches the target of a symbolic link. (Using other name tests, like -name, matches the name of the symlink itself.) The corresponding -ilname test does case-insensitive matching of the symlink target.

There are two other new tests and options for symbolic links:

Timestamp matching

Older versions of find matched timestamps only in 24-hour intervals. For instance, the tests -mtime -3 and -mtime 2 are both true for files modified between 72 and 48 hours ago. Besides being a bit hard to understand at first, the three timestamp tests (-atime, -ctime and -mtime) also are limited to 24-hour granularity. If you needed more accuracy, you'd have to use -newer or ! -newer to match a timestamp file - often one created by touch(1). (Worse yet, many versions of find would silently ignore more than one -newer test in the same expression!)

The new -amin, -cmin and -mmin tests check timestamps a certain number of minutes ago. For instance, to find files accessed within the past hour, use -amin -60. (Note that it's hard to test last-access times for directories. That's because, when find searches through a file tree, it accesses all of the directories - which updates all directories' last-access timestamps.)

Another new option, -daypart, tells find to measure times from the beginning of today instead of in 24-hour multiples. This frees you from dependence on the current time you run find.

Directory Control

Early versions didn't give you much control over which directories find visited. Once -prune was added, you could write an expression to keep find from descending into certain directories. For instance, to keep from descending into the ./src subdirectory, you can do something like this:

find . -path ./src -prune -o -etc…

And to skip all directories named lib (and all of their subdirectories):

find . -name lib -prune -o -etc…

The -prune action is good for avoiding certain directories, but - without the regular expression tests added later, at least - it's not so good for limiting searches to a particular depth. In particular, it may not be obvious how to process only the entries in the current directory without any recursion. (The answer with -prune is:

find . ( -type d ! -name . -prune ) 
   -o -etc…

which "prunes" all directories except the current directory ".".)

The new -mindepth and -maxdepth options make this a lot easier. Use -maxdepth n ; to descend no more than n levels below the command-line arguments. The option -maxdepth 0 tells find to evaluate only the command-line arguments.

In the same way, -mindepth n> tells find to ignore the first n levels of subdirectories. Also, -mindepth 1 processes all files except the command-line arguments. For instance, find subdir -mindepth 1 -ls> will descend into subdir and list each of its contents, but won't list subdir itself.

The -depth option has been in quite a few versions of find; it's not as "new" as some of the other features we cover. It's not related to -maxdepth or -mindepth, though. The sidebar "-depth explained" has more information about how this option is used.

-depth explained

Because find is often used to give filenames to archive programs like tar, it's worth understanding -depth and that part of its purpose.

A tar archive is a stream of bytes that contain header information for each file (including its name and access permissions) followed by that file's data. The archive is extracted in order from first byte to last.

Let's say that you archive an unwritable directory. When you later extract that directory from the archive, its permissions will be set at the time it's extracted:

One "new" addition - which is actually in a lot of find versions - is -xdev or -mount. (GNU find understands both of those.) It tells find not to descend into directories mounted from other filesystems. This is handy, for example, to avoid network-mounted filesystems.

A more specific test is -fstype, which tests true if a file is on a certain type of filesystem. For instance, ! -fstype nfs is true for a file that's not on an NFS-type filesystem. Different systems have different filesystem names and types, though. To get a listing of what's on your system, use the new -printf action with its %F format directive to display the filesystems from the second field of each /etc/mtab entry:

% B
/                    type ext3
/proc                type proc
/dev/pts             type devpts
/dev/shm             type tmpfs
…

(You'll probably find that same data in the second and third fields of each entry in /proc/mounts.)

Text Output

Early versions of find had basically one choice for outputting a pathname: print it to the standard output. Later, -ls was added; it gives an output format similar to ls -l. The new -printf action lets you use a C-like printf format. This has the usual format specifiers like the filename and the last-modification date, but it has others specific to find. For instance, %H tells you which command-line argument find was processing when it found this entry. One simple use for this is to make your own version of ls that gives just the information you want. As an example, the following bash function, findc, searches the command-line arguments (or, if there are no arguments, the current directory . instead) and prints information about all filenames ending with .c:

findc()
{
  find "${@-.}" -name '*.c' -printf 
    'DEPTH %2d  GROUP %-10g  NAME %fn'
}

(Note that the stat(1) utility might be simpler to use if you want a recursive listing and if stat's format specifiers give the information you want.)

The longstanding -print action writes a pathname to the standard output, followed by a newline character. If that pathname happens to contain a newline, you get two newlines. (A newline is legal in a filename.) Most shells also break command-line arguments into words at whitespace (tabs, spaces and newlines); this means that command substitution (the backquote operators) could fail if, say, a filename contained spaces. It wasn't too long before programmers fixed this problem by adding the -print0 action; it outputs a pathname followed by NUL (a zero byte). Because NUL isn't legal in a filename, this pathname delimiter solved the problem - when find's output was piped to the command xargs -0, which accepts NUL as an argument separator.

Because find can do many different tests as it traverses a filesystem, it's good to be able to choose what should be done in each individual case. For instance, if you run a nightly cron job to clean up various files and directories from all of your disks, it's nice to do all of the tests in a single pass through the filesystem - instead of making another complete pass for each of your tests. But it's also good to avoid the overhead of running utilities like rm and rmdir over and over, once per file, in a find job like this one using -exec:

find /var/tmp -mtime +3 ( 
  ( -type f -exec rm -f {} ; ) -o 

  ( -type d -exec ..... {} ; ) 
)

This inefficiency could be solved by replacing -exec with -print or -print0, then piping find's output to xargs. xargs collects arguments and passes them to another program each time it has collected "enough." But all the text from -print or -print0 goes to find's standard output, so there's been no easy way to tell which pathnames were from which test (which are files, which are directories…).

The new -fprintf and -fprint0 actions can solve this problem. They write a formatted string to a file you specify. For instance, the following example writes a NUL-separated list of the files from /var/tmp into the file named by $files and a list of directories into the file named by $dirs:

dirs=`mktemp`
files=`mktemp`
find /var/tmp ( 
  ( -type f -fprint0 $files ) -o 
  ( -type d -fprint0 $dirs ) 
)

Other New Tests

The -empty test is true for an empty file or directory. (An empty file has no bytes; an empty directory has no entries.) One place this is handy is for removing empty directories while you're cleaning a filesystem. If you also use <-depth>, all of the files in a directory should be removed before find examines the directory itself. Then you can use an expression like the following:

find /tmp -depth ( 
  ( -mtime +3 -type f -exec rm -f {} ; ) 
  -o ( -type d -empty -exec rmdir {} ; ) 
)

The -false "test" is always false, and -true is always true. These are a lot more efficient than the old methods (-exec false and -exec true) that execute the external Linux commands false(1) and true(1).

The -perm test has long accepted arguments like -perm 222 (which means "exactly mode 222″ - that is, write-only) and -perm -222 (which means "all of the write (2) mode bits are set"). Now -perm also accepts arguments starting with a plus sign. It means "any of these bits are set." For instance, -perm +222 is true when any write bit is set.

Does -prune work like -maxdepth in Unix find on AIX Online Tech Support Help Ask Dave Taylor!®

I'm not sure that -prune is what you want, though. Here's what the find man page on my system says about it:

-prune This primary always evaluates to true. It causes find to not descend into the current file. Note, the -prune primary has no effect if the -d option was specified.

Is this a viable replacement for -maxdepth?

When I run it the output isn't useful:

$ find . -prune -print
.
$
Hmmmm.... replacing the "." with a "*" proves interesting (yes, I'm making the find more interesting too, just matching files that are non-zero in size):
$ find * -prune -type f -size +0c -print
African Singing.aif
Branding for Writers.doc
GYBGCH12.doc
KF BPlan-04-1123.doc
Parent Night.aif
Rahima Keynote.aif
lumxtiger_outline final.doc
master-adwords.pdf
$
Maybe that's what you need (ignore the specific files I have. You can see what I'm working on as this is my desktop. ;-)

Try what I suggested, see what kind of results you get!


AIX find indeed does not have the -maxdepth flag option. The man page returns this for the -prune flag on my AIX 5.2 system. Only one find also, /usr/bin/find

-prune Always evaluates to the value True. Stops the descent of the current path name if it is a directory. If the -depth flag is specified, the -prune flag is ignored.

Posted by: Scott at December 17, 2004 2:59 PM

Some "creative" use of grep / sed / awk applied against the find output should allow you to select only to a (selected) depth; given a relatively stable structure, you might also be as well off by driving from a stored list of directories instead of "building on the fly".

Posted by: Mike C. Baker at February 11, 2005 4:04 PM

Well yes, you can do this:

find /tmp/* -prune

which limits the search to only the /tmp directory (no subdirs). However, "/tmp/*" is evaluated by the shell before being passed to "find" and could possibly exceed the maximum allowed length.

And while it is certainly possible to parse the output for the desired results, this does not stop the "find" command from needless searching. The find could be lengthy if the filesystem is large and/or over a network.

Without a "maxdepth" option, the best way to list all files in a given directory is:

(cd $DIR && find . ! -name . -prune)

Other options to "find" can be added as needed, and the parentheses can be removed if the changedir is not bothersome.

Posted by: Gus Schlachter at March 23, 2007 2:30 PM

Thanks Dave, again. I was doing this on Solaris machine, no maxdepth also, using prune with -d option, wrong, the AIX help where it says, don't use with -d, fixed the problem, works like charm now.

Posted by: Frankce10 at May 5, 2009 8:58 AM

For running find on a directory with many thousand files, the above command with "*" does not work, and fails with "The parameter or environment lists are too long.".
Any easy alternative to this??

Posted by: Ankush Jhalani at October 11, 2009 4:01

Controlling depth with find - The UNIX and Linux Forums

Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Search Forums

Show Threads Show Posts Tag Search Advanced Search Unanswered Threads Find All Thanked Posts

Go to Page...

Page 1 of 2 1 2 >

Thread Tools Search this Thread Display Modes

#1 05-19-2008 la_womn Registered User

Join Date: May 2007 Posts: 13 Thanks: 0 Thanked 0 Times in 0 Posts

Controlling depth with find

--------------------------------------------------------------------------------

I have the following script:

Quote: #!/usr/bin/ksh

export MDIR=$PS_HOME/datafiles

if [[ -f $MDIR/bldtuout.txt ]]; then rm $MDIR/bldtuout.txt fi

if [[ -f $MDIR/bldterr.txt ]]; then rm $MDIR/bldterr.txt fi

NAME=$1 SERVER=$2 DIRECTORY=$3 DATABASE=$4 ID=$5

echo "*" $NAME $DATABASE $DIRECTORY $SERVER >> $MDIR/bldtuout.txt /usr/bin/ssh -q $ID@$SERVER "find $DIRECTORY -type d -exec du -ks {} \;" >> $MDIR/bldtuout.txt 2>> $MDIR/bldterr.txt

Now they have added on a new requirement, they only want to go to a certain depth in the directories returned. How do I code it to only go say 3 directories deeper than $DIRECTORY?

Remove advertisements Sponsored Links

#2 05-20-2008 era Herder of Useless Cats (On Sabbatical)

Join Date: Mar 2008 Location: /there/is/only/bin/sh Posts: 3,653 Thanks: 0 Thanked 6 Times in 6 Posts

Some versions of find have a -maxdepth option. In the absence of that, something like find | egrep -v '/.*/.*/.*/' | xargs du -ks is probably the way to go.

Remove advertisements Sponsored Links

#3 05-20-2008 la_womn Registered User

Join Date: May 2007 Posts: 13 Thanks: 0 Thanked 0 Times in 0 Posts

There is no maxdepth avilable in my version of find.

Hoiw do I use that code you gave me. When I tried to use it - I got 6786785 .

I put it in like this:

Quote: find $DIRECTORY -type d | egrep -v '/.*/.*/.*/' | du -ks

What am I doing wrong?

#4 05-20-2008 shamrock Registered User

Join Date: Oct 2007 Location: USA Posts: 982 Thanks: 2 Thanked 34 Times in 34 Posts

Code: /usr/bin/ssh -q $ID@$SERVER "find $DIRECTORY -type d -exec du -ks {} \; | awk -F/ 'NF <= 3'" >> $MDIR/bldtuout.txt 2>> $MDIR/bldterr.txt