Searching Files
Many programs provide facilities for searching files for
groups of characters that match patterns.
These programs include:
- all the standard editors (vi, emacs, ed, ex, sed ...)
- special scripting languages such as awk and perl
- pattern matching commands, namely grep, egrep
and fgrep.
The Grep Family of Commands
- GREP is an acronym --
Global
Regular-Expression
Print.
-
| grep
| the basic searching and pattern-matching command. |
| egrep
| extended grep, a version of grep that has more
powerful pattern matching features. |
| fgrep
| fixed string grep, a supposedly faster version of
grep
where the patterns are restricted to fixed strings. (In
practice, egrep is usually faster than fgrep!) |
- Example: a trivial use of grep to find all occurrences of the
identifier Counter in the source code files of our
project:
grep Counter */*.c */*.h
The filename, line number, and the text of the line counting the
string Counter is output -- for every occurrence found.
- The standard Unix usage of
grep/egrep/fgrep is:
grep [flags] pattern file1 file2 ...
or
grep [flags] -e pattern file1 file2 ...
(and similarly for fgrep and egrep).
- AIX has extended the grep command invocation so that:
grep [flags] -E pattern file1 file2 ...
is equivalent to invoking egrep, and
grep [flags] -F pattern file1 file2 ...
is equivalent to invoking fgrep.
- Patterns should almost always be enclosed by single quotes so
that the csh command shell does not interpret characters like
`$' and `.' in unintended ways.
- The grep regular-expression notation has been borrowed by many
other programs, including non-Unix programs. (Unfortunately the
filename pattern matching facilities in csh use a substantially
different notation.)
Regular-Expressions in Grep
The following conventions appear to be universal in all the programs
which support regular expression patterns (vi, awk,
perl ...).
- Any ordinary character matches itself
- The special characters are:
. ^ $ [ * \
- The notations \. \^ \$
\[ \* and \\ match exactly one
occurrence of the special character given after the backslash.
- Special pattern matching notations:
| .
| A period matches any single character (but not a
newline). |
| ^
| A caret matches an empty string at the beginning of the
line. |
| $
| A dollar symbol matches an empty string at the end of
the line. |
| [abc]
| A set of characters, from which exactly one character
will be matched. A shorthand notation exists for a
range of characters, e.g. [a-z]. |
| [^abc]
| This matches exactly one character and this character is
not contained in the set after the caret symbol. |
| *
| an asterisk following a one-character pattern (i.e. a
period or the square bracket set notation) indicates
that zero or more repetitions of that one-character
pattern are allowed. |
Note that only a subset of the notation has been covered above.
For more complete information, you should consult the on-line
manual pages.
Examples using Basic Grep Patterns
Regular Expressions in Egrep
Egrep has an annoyingly different set of pattern matching operations
compared to grep.
It is probably more powerful, including the following operators
which are not provided in grep.
| *
| matches zero or more repetitions of the preceding
regular expression.
(I.e we are not constrained to a one-character pattern). |
| +
| matches one or more repetitions of the preceding regular
expression. |
| ?
| matches zero or one repetitions of the preceding regular
expression. |
| |
| separates two regular expressions, either of which is
matched. |
| ( .. )
| Sub-expressions in a regular expression may be enclosed
in parentheses. |
Command-Line Flags
The commonly used flags are:
| -i
| ignore case of letters when pattern matching |
| -n
| always show the number of a line containing a match |
| -v
| invert the pattern matching so that lines which do not
match the pattern are output by grep/egrep/fgrep. |
| -w
| apply the pattern to each word in the file (i.e. this is
equivalent to enclosing the pattern in
\< and \> brackets). |
More Examples of Use