A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.
Each section in this quick reference lists a particular category of characters, operators, and constructs that you can use to define regular expressions:
Character escapesCharacter classes
Anchors
Grouping constructs
Quantifiers
Backreference constructs
Alternation constructs
Substitutions
Regular expression options
Miscellaneous constructs
Character Escapes
The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally.
Escaped character | Description | Pattern | Matches |
---|---|---|---|
\a | Matches a bell character, \u0007. | \a | "\u0007" in "Error!" + '\u0007' |
\b | In a character class, matches a backspace, \u0008. | [\b]{3,} | "\b\b\b\b" in "\b\b\b\b" |
\t | Matches a tab, \u0009. | (\w+)\t | "item1\t", "item2\t" in "item1\titem2\t" |
\r | Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.) | \r\n(\w+) | "\r\nThese" in "\r\nThese are\ntwo lines." |
\v | Matches a vertical tab, \u000B. | [\v]{2,} | "\v\v\v" in "\v\v\v" |
\f | Matches a form feed, \u000C. | [\f]{2,} | "\f\f\f" in "\f\f\f" |
\n | Matches a new line, \u000A. | \r\n(\w+) | "\r\nThese" in "\r\nThese are\ntwo lines." |
\e | Matches an escape, \u001B. | \e | "\x001B" in "\x001B" |
\ nnn | Uses octal representation to specify a character (nnn consists of two or three digits). | \w\040\w | "a b", "c d" in "a bc d" |
\x nn | Uses hexadecimal representation to specify a character (nn consists of exactly two digits). | \w\x20\w | "a b", "c d" in "a bc d" |
\c X \c x | Matches the ASCII control character that is specified by X or x, where X or x is the letter of the control character. | \cC | "\x0003" in "\x0003" (Ctrl-C) |
\u nnnn | Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn). | \w\u0020\w | "a b", "c d" in "a bc d" |
\ | When followed by a character that is not recognized as an escaped character in this and other tables in this topic, matches that character. For example, \* is the same as \x2A, and \. is the same as \x2E. This allows the regular expression engine to disambiguate language elements (such as * or ?) and character literals (represented by \* or \?). | \d+[\+-x\*]\d+\d+[\+-x\*\d+ | "2+2" and "3*9" in "(2+2) * 3*9" |
Character Classes
A character class matches any one of a set of characters. Character classes include the language elements listed in the following table.
Character class | Description | Pattern | Matches |
---|---|---|---|
[ character_group ] | Matches any single character in character_group. By default, the match is case-sensitive. | [ae] | "a" in "gray" "a", "e" in "lane" |
[^ character_group ] | Negation: Matches any single character that is not in character_group. By default, characters in character_group are case-sensitive. | [^aei] | "r", "g", "n" in "reign" |
[ first - last ] | Character range: Matches any single character in the range from first to last. | [A-Z] | "A", "B" in "AB123" |
. | Wildcard: Matches any single character except \n. To match a literal period character (. or \u002E), you must precede it with the escape character (\.). | a.e | "ave" in "nave" "ate" in "water" |
\p{ name } | Matches any single character in the Unicode general category or named block specified by name. | \p{Lu} \p{IsCyrillic} | "C", "L" in "City Lights" "Д", "Ж" in "ДЖem" |
\P{ name } | Matches any single character that is not in the Unicode general category or named block specified by name. | \P{Lu} \P{IsCyrillic} | "i", "t", "y" in "City" "e", "m" in "ДЖem" |
\w | Matches any word character. | \w | "I", "D", "A", "1", "3" in "ID A1.3" |
\W | Matches any non-word character. | \W | " ", "." in "ID A1.3" |
\s | Matches any white-space character. | \w\s | "D " in "ID A1.3" |
\S | Matches any non-white-space character. | \s\S | " _" in "int __ctr" |
\d | Matches any decimal digit. | \d | "4" in "4 = IV" |
\D | Matches any character other than a decimal digit. | \D | " ", "=", " ", "I", "V" in "4 = IV" |
No comments:
Post a Comment