Advanced find & grep for searching large filesystems

grep is a great little command and probably the first “advanced” command most people learn when first exposed to shell scripting.

But what do you do when you need to regularly search a LARGE file system that takes a normal grep command minutes (hours?) to traverse?

Even worse, what if that file system has user data mixed in with developer code as well as svn repositories, archive folders and all sorts of other nonsense that returns loads of false positives for even the most targeted, code specific grep you can do?

I of course found myself in the later scenario and was forced to take my “grep game” to a new level. (grep game…..that gives me an idea….)

The magic command:

find . -not -path “/.svn” | xargs grep -ls “String to find”

It starts by using the find command to create a list of all files we want to grep. You can add as many “-not -path” components as you need. (don’t pipe “ | ” them, just add them)
The REGEX that follows each you’ll want to specially craft to eliminate directories and file types that you want to skip. This is extremely important if you are getting too many results back or if the search takes too long to run.

After that, we just pipe the find results into a basic grep command.

Important Notes About GREP Options Here:
grep -l “string” #will only show filenames

grep -s “string” # will ignore error messages

grep -q “string” # won’t return results
# useful with echo $?
# if you just want to know something exists in a folder

grep -r “string” # is not only not necessary here,
# it is actually NOT good to use at all.
# It’ll ignore your -not -path options and grep those folders
# anyway, negating any value we get from the find command.
And that’s it!
Obviously this can be very powerful inside a shell script after spending some time to well craft the results of the find command.

Edward Romano Written by:

I dabble in, and occasionally obsess over, technology and problems that bug me