- What Are Regular Expressions?
- Metacharacters (Escaped With \)
- Character Classes
- Groups And Ranges
- Special Characters
- String Replacement
Text data plays an important role on all Unix-like systems, such as Linux. But before we can fully appreciate all the features offered by these tools, first we have to examine a technology that is frequently associated with the most sophisticated uses of these tools—regular expressions.
What Are Regular Expressions?
Simply put, regular expressions are symbolic notations used to identify patterns in text. In some ways, they resemble the shell’s wildcard method of matching file and pathnames but on a much grander scale. Regular expressions are supported by many command line tools and by most programming languages to facilitate the solution of text manipulation problems. However, to further confuse things, not all regular expressions are the same; they vary slightly from tool to tool and from programming language to language. For our discussion, we will limit ourselves to regular expressions as described in the POSIX standard (which will cover most of the command line tools), as opposed to many programming languages (most notably Perl), which use slightly larger and richer sets of notations.
The main program we will use to work with regular expressions is our old pal
grep. The name
grep is actually derived from the phrase "global regular expression print," so we can see that
grep has something to do with regular expressions. In essence,
grep searches text files for text matching a specified regular expression and outputs any line containing a match to standard output.
Here is a list of commonly used
-i- Ignore case. Do not distinguish between uppercase and lowercase characters.
-v- Invert match. Normally,
grepprints lines that contain a match. This option causes
grepto print every line that does not contain a match.
-c- Print the number of matches (or non-matches if the
-voption is also specified) instead of the lines themselves.
-l- Print the name of each file that contains a match instead of the lines themselves.
-L- Like the
-loption, but print only the names of files that do not contain matches.
-n- Prefix each matching line with the number of the line within the file.
-h- For multifile searches, suppress the output of filenames.
Metacharacters (Escaped With \)
^- Start of string, or start of line in multi-line pattern
\A- Start of string
$- End of string, or end of line in multi-line pattern
\Z- End of string
\b- Word boundary
\B- Not word boundary
\<- Start of word
\>- End of word
\c- Control character
\s- White space
\S- Not white space
\D- Not digit
\W- Not word
\x- Hexadecimal digit
\O- Octal digit
[:upper:]- Upper case letters
[:lower:]- Lower case letters
[:alpha:]- All letters
[:alnum:]- Digits and letters
[:xdigit:]- Hexadecimal digits
[:blank:]- Space and tab
[:space:]- Blank characters
[:cntrl:]- Control characters
[:graph:]- Printed characters
[:print:]- Printed characters and spaces
[:word:]- Digits, letters and underscore
*- 0 or more
+- 1 or more
?- 0 or 1
Groups And Ranges
.- Any character except new line (\n)
(a|b)- a or b
(?:...)- Passive (non-capturing) group
[abc]- Range (a or b or c)
[^abc]- Not (a or b or c)
[a-q]- Lower case letter from a to q
[A-Q]- Upper case letter from A to Q
[0-7]- Digit from 0 to 7
\x- Group/subpattern number "x"
\n- New line
\r- Carriage return
\v- Vertical tab
\f- Form feed
\xxx- Octal character xxx
\xhh- Hex character hh
$n- nth non-passive group
$2- "xyz" in /^(abc(xyz))$/
$1- "xyz" in /^(?:abc)(xyz)$/
- `$`` - Before matched string
$'- After matched string
$+- Last matched string
$&- Entire matched string
?=- Lookahead assertion
?!- Negative lookahead
?<=- Lookbehind assertion
?!= or ?<!- Negative lookbehind
?>- Once-only Subexpression
?()- Condition [if then]
?()|- Condition [if then else]
In this chapter, we saw a few of the many uses of regular expressions. We can find even more if we use regular expressions to search for additional applications that use them.