grep
and Regular Expressions
Here is grep
's usage.
unix> grep search_string file_name
The argument file_name
is optional but nearly always used
because, by default, it is stdin
. The grep
command
will put all lines containing search_string
to stdout
.
Open the man page for grep
Look at the options. Which ones do
you think will be handy? I put one on the table below; let's discuss others.
Option | Action |
---|---|
-i, --ignore-case | ignores case |
Wait!!!! But there's more!
There is a language for textual patterns called regular expressions. We can
use this along with grep
as an immensely powerful search tool. You can
also use regular expressions in vi
to search in a file. We are going to learn
how to use this tool today.
Character Classes
These are character wildcards and are the "bricks" of regular expressions.
Download the file sampler.txt
at the left; we will use it to do some
spelunking.
Regexes level 1: Juxtaposition
The regex101 site is an excellent tool for practicing regexes and for debugging.
Kris Jordan on regular expressions
Special Character Classes
^ start of line $ end of line . any one character except \n \d any decimal digit \s whitespace (\n, \t, " ", "\r") \b word boundary
clue: butter g__t
All multiplicity operators are postfix uanry operators Multiplicity has precedence over juxtaposition. Override with ()
{2} matches pattern occurring twice {n} matches pattern occurring n times + one or more of ? once or nonce * zero or more of
BEGIN a possible + or - [+-] a sequence of one or more digits \d, or [0-9] END
^[+-]?\d+$
| is or
M 1000 D 500 C 100 L 50 X 10 V 5 I 1 CM 900 CD 400 XC 90 XL 40 IX 9 IV 4
Regexes level 2: Multiplicity
These are all postfix unary operators with precedence over juxtapositions. Use () to override the order of operations.
{n}
Exactly n times{m, n}
At least m, but not more than n times*
+
?
Orring
The | symbol does or. Bound its enthusiasm with parentheses.
A Python or Java Warmup
Write cat
in Python or Java. This program can take
one or more files and puts them to stdout. If no file
is specified, have it use sys.stdin
as a file.
Can you make it behave like UNIX's cat
.
Using Regexes in Python
Using Regexes in Java
sort
uniq
tr
tee What does this do?
fold
nl