The name grep comes from the ed (and vim) command “g/re/p”, which means globally search for a given regular expression and print (display) the output.
Regular Expressions
The utilities allow the user to search text files for lines that match a regular expression (regexp). A regular expression is a search string made up of text and one or more of 11 special characters. A simple example is matching the start of a line.
Sample File
The basic form of grep may be used to find simple text within a particular file or files. In order to try the examples, first create the sample file.
Use an editor such as nano or vim to copy the text below into a file called myfile.
xyzde
exyzd
dexyz
d?gxyz
xxz
xzz
x\z
x*z
xz
x z
XYZ
XYYZ
xYz
xyyz
xyyyz
xyyyyz
Although you may copy and paste the examples in the text (note that double quotes may not copy properly), commands need to be typed in order to learn them properly.
Before trying the examples, view the sample file:
Simple Search
To find the text ‘xyz’ within the file run the following:
Using Colours
To display colours, use –color (a double hyphen) or simply create an alias. For example:
or
$ grep xyz myfile
Options
Common options used with the grep command include:
- -i find all lines irrespective of case
- -c count how many lines contain the text
- -n display line numbers of matching lines
- -l display only file names that match
- -r recursive search of sub-directories
- -v find all lines NOT containing the text
For example:
$ grep -ic xyz myfile # count lines with text
$ grep -in xyz myfile # show line numbers
Create Multiple Files
Before trying to search multiple files, first create several new files:
$ echo -e “xyz\nxzz\nXYZ”>myfile2
$ echo -e “xxx\nyyy”>myfile3
$ cat myfile1
$ cat myfile2
$ cat myfile3
Search Multiple Files
To search multiple files using filenames or a wildcard enter:
$ grep -in xyz my*
# match filenames beginning with ‘my’
Exercise I
- First count how many lines there are in the file /etc/passwd.
- Now find all occurrences of the text var in the file /etc/passwd.
- Find how many lines in the file contain the text
- Find how many lines do NOT contain the text var.
- Find the entry for your login in the /etc/passwd
Exercise solutions can be found at the end of this article.
Using Regular Expressions
The command grep may also be used with regular expressions by using one or more of eleven special characters or symbols to refine the search. A regular expression is a character string that includes special characters to allow pattern matching within utilities such as grep, vim and sed. Note that the strings may need to be enclosed in quotes.
The special characters available include:
^ | Start of a line |
$ | End of a line |
. | Any character (except \n newline) |
* | 0 or more of previous expression |
\ | Preceding a symbol makes it a literal character |
Note that the *, which may be used at the command line to match any number of characters including none, is not used in the same way here.
Also note the use of quotes in the following examples.
Examples
To find all lines starting with text using the ^ character:
To find all lines ending with text using the $ character:
To find lines containing a string using both ^ and $ characters:
To find lines using the . to match any character:
To find lines using the * to match 0 or more of the previous expression:
To find lines using .* to match 0 or more of any character:
To find lines using the \ to escape the * character:
To find the \ character use:
Expression grep – egrep
The grep command supports only a subset of the regular expressions available. However, the command egrep:
- allows the full use of all regular expressions
- may simultaneously search for more than one expression
Note that the expressions must be enclosed within a pair of quotes.
To use colours, use –color or again create an alias:
In order to search for more than one regex the egrep command may be written over multiple lines. However, this can also be done using these special characters:
| | Alternation, either one or the other |
(…) | Logical grouping of part of an expression |
This extracts the lines which begin with root, uucp or mail from the file, the | symbol meaning either of the options.
The following command will not work, although no message is displayed, since the basic grep command does not support all regular expressions:
However, on most Linux systems the command grep -E is the same as using egrep:
Using Filters
Piping is the process of sending the output of one command as input into another command and is one of the most powerful Linux tools available.
Commands that appear in a pipeline are often referred to as filters since in many cases they sift through or modify the input passed to them before sending the modified stream to standard output.
In the following example, standard output from ls -l is passed as standard input to the grep command. Output from the grep command is then passed as input to the more command.
This will display only directories in /etc:
The following commands are examples of using filters:
Sample File
In order to try the review exercise, first create the following sample file.
Use an editor such as nano or vim to copy the text below into a file called people:
Personal E.Smith 25400
Training A.Brown 27500
Training C.Browen 23400
(Admin) R.Bron 30500
Goodsout T.Smyth 30000
Personal F.Jones 25000
training* C.Evans 25500
Goodsout W.Pope 30400
Groundfloor T.Smythe 30500
Personal J.Maler 33000
Exercise II
- Display the file people and examine its contents.
- Find all lines containing the string Smith in the file people.Hint: use the command grep but remember that by default, it is case sensitive.
- Create a new file, npeople, containing all lines beginning with the string Personal in the people file.Hint: use the command grep with >.
- Confirm the contents of the file npeople by listing the file.
- Now append all lines where the text ends with the string 500 in the file people to the file npeople.Hint: use the command grep with >>.
- Again, confirm the contents of the file npeople by listing the file.
- Find the IP Address of the server which is stored in the file /etc/hosts.Hint: use the command grep with $(hostname)
- Use egrep to extract from the /etc/passwd file account lines containing lp or your own user id.
Exercise solutions can be found at the end of this article.
More Regular Expressions
A regular expression can be thought of as wildcards on steroids.
There are eleven characters with special meanings: the opening and closing square brackets [ ], the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign + and the opening and closing round bracket { }. These special characters are also often called metacharacters.
Here is the full set of special characters:
^ | Start of a line |
$ | End of a line |
. | Any character (except \n newline) |
* | 0 or more of previous expression |
| | Alternation, either one or the other |
[…] | Explicit set of characters to match |
+ | 1 or more of previous expression |
? | 0 or 1 of previous expression |
\ | Preceding a symbol makes it a literal character |
{…} | Explicit quantifier notation |
(…) | Logical grouping of part of an expression |
The default version of grep has only limited regular expression support. In order for all of the following examples to work, use egrep instead or grep -E.
To find lines using the | to match either expression:
To find lines using | to match either expression within a string also use ( ):
To find lines using [ ] to match any character:
To find lines using [ ] to NOT match any character:
To find lines using the * to match 0 or more of the previous expression:
To find lines using the + to match 1 or more of the previous expression:
To find lines using the ? to match 0 or 1 of the previous expression:
Exercise III
- Find all lines containing the names Evans or Maler in the file people.
- Find all lines containing the names Smith, Smyth or Smythe in the file people.
- Find all lines containing the names Brown, Browen or Bron in the file people.If you have time:
- Find the line containing the string (admin), including the brackets, in the file people.
- Find the line containing the character * in the file people.
- Combine 5 and 6 above to find both expressions.
More Examples
To find lines using . and * to match any set of characters:
To find lines using { } to match N number of characters:
$ egrep ‘^xy{4}z’ myfile
To find lines using { } to match N or more times:
To find lines using { } to match N times but not more than M times:
Conclusion
In this tutorial we first looked at using grep in it’s simple form to find text in a file or in multiple files. We then combined the text to be searched for with simple regular expressions and then more complex ones using egrep.
Next Steps
I hope you will put the knowledge gained here to good use. Try out grep commands on your own data and remember, regular expressions as described here can be used in the same form in vi, sed and awk!
Exercise Solutions
Exercise I
First count how many lines there are in the file /etc/passwd.
$ wc -l /etc/passwd
Now find all occurrences of the text var in the file /etc/passwd.
$ grep var /etc/passwd
Find how many lines in the file contain the text var
Find how many lines do NOT contain the text var.
Find the entry for your login in the /etc/passwd file
grep kdm /etc/passwd
Exercise II
Display the file people and examine its contents.
$ cat people
Find all lines containing the string Smith in the file people.
$ grep 'Smith' people
Create a new file, npeople, containing all lines beginning with the string Personal in the people file
$ grep '^Personal' people> npeople
Confirm the contents of the file npeople by listing the file.
$ cat npeople
Now append all lines where the text ends with the string 500 in the file people to the file npeople.
$ grep '500$' people>>npeople
Again, confirm the contents of the file npeople by listing the file.
$ cat npeople
Find the IP Address of the server which is stored in the file /etc/hosts.
$ grep $(hostname) /etc/hosts
Use egrep to extract from the /etc/passwd file account lines containing lp or your own user id.
$ egrep '(lp|kdm:)' /etc/passwd
Exercise III
Find all lines containing the names Evans or Maler in the file people.
$ egrep 'Evans|Maler' people
Find all lines containing the names Smith, Smyth or Smythe in the file people.
$ egrep 'Sm(i|y)the?' people
Find all lines containing the names Brown, Browen or Bron in the file people.
$ egrep 'Brow?e?n' people
Find the line containing the string (admin), including the brackets, in the file people.
Find the line containing the character * in the file people.
$ egrep '\*' people
Combine 5 and 6 above to find both expressions.