Contents
- Introduction
- What Are Regular Expressions?
- grep
- Metacharacters (Escaped With \)
- Anchors
- Character Classes
- POSIX
- Quantifiers
- Groups And Ranges
- Special Characters
- String Replacement
- Assertions
- Summary
Introduction
Text data plays an important role on all Unix-like systems, such as Linux. But before we can fully appreciate all the features offered by these tools, first we have to examine a technology that is frequently associated with the most sophisticated uses of these tools—regular expressions.
What Are Regular Expressions?
Simply put, regular expressions are symbolic notations used to identify patterns in text. In some ways, they resemble the shell’s wildcard method of matching file and pathnames but on a much grander scale. Regular expressions are supported by many command line tools and by most programming languages to facilitate the solution of text manipulation problems. However, to further confuse things, not all regular expressions are the same; they vary slightly from tool to tool and from programming language to language. For our discussion, we will limit ourselves to regular expressions as described in the POSIX standard (which will cover most of the command line tools), as opposed to many programming languages (most notably Perl), which use slightly larger and richer sets of notations.
grep
The main program we will use to work with regular expressions is our old pal grep
. The name grep
is actually derived from the phrase “global regular expression print,” so we can see that grep
has something to do with regular expressions. In essence, grep
searches text files for text matching a specified regular expression and outputs any line containing a match to standard output.
Here is a list of commonly used grep
options:
-i
- Ignore case. Do not distinguish between uppercase and lowercase characters.-v
- Invert match. Normally,grep
prints lines that contain a match. This option causesgrep
to print every line that does not contain a match.-c
- Print the number of matches (or non-matches if the-v
option is also specified) instead of the lines themselves.-l
- Print the name of each file that contains a match instead of the lines themselves.-L
- Like the-l
option, but print only the names of files that do not contain matches.-n
- Prefix each matching line with the number of the line within the file.-h
- For multifile searches, suppress the output of filenames.
Metacharacters (Escaped With \)
^
[
.
$
{
*
(
\
+
)
|
?
<
>
Anchors
^
- Start of string, or start of line in multi-line pattern\A
- Start of string$
- End of string, or end of line in multi-line pattern\Z
- End of string\b
- Word boundary\B
- Not word boundary\<
- Start of word\>
- End of word
Character Classes
\c
- Control character\s
- White space\S
- Not white space\d
- Digit\D
- Not digit\w
- Word\W
- Not word\x
- Hexadecimal digit\O
- Octal digit
POSIX
[:upper:]
- Upper case letters[:lower:]
- Lower case letters[:alpha:]
- All letters[:alnum:]
- Digits and letters[:digit:]
- Digits[:xdigit:]
- Hexadecimal digits[:punct:]
- Punctuation[:blank:]
- Space and tab[:space:]
- Blank characters[:cntrl:]
- Control characters[:graph:]
- Printed characters[:print:]
- Printed characters and spaces[:word:]
- Digits, letters and underscore
Quantifiers
*
- 0 or more+
- 1 or more?
- 0 or 1
Groups And Ranges
.
- Any character except new line (\n)(a|b)
- a or b(...)
- Group(?:...)
- Passive (non-capturing) group[abc]
- Range (a or b or c)[^abc]
- Not (a or b or c)[a-q]
- Lower case letter from a to q[A-Q]
- Upper case letter from A to Q[0-7]
- Digit from 0 to 7\x
- Group/subpattern number “x”
Special Characters
\n
- New line\r
- Carriage return\t
- Tab\v
- Vertical tab\f
- Form feed\xxx
- Octal character xxx\xhh
- Hex character hh
String Replacement
$n
- nth non-passive group$2
- “xyz” in /^(abc(xyz))$/$1
- “xyz” in /^(?:abc)(xyz)$/- `$“ - Before matched string
$'
- After matched string$+
- Last matched string$&
- Entire matched string
Assertions
?=
- Lookahead assertion?!
- Negative lookahead?<=
- Lookbehind assertion?!= or ?<!
- Negative lookbehind?>
- Once-only Subexpression?()
- Condition [if then]?()|
- Condition [if then else]?#
- Comment
Summary
In this chapter, we saw a few of the many uses of regular expressions. We can find even more if we use regular expressions to search for additional applications that use them.