A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.

Each section in this quick reference lists a particular category of characters, operators, and constructs that you can use to define regular expressions:
Character escapesCharacter classes
Anchors
Grouping constructs
Quantifiers
Backreference constructs
Alternation constructs
Substitutions
Regular expression options
Miscellaneous constructs
Character Escapes
The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally.
| Escaped character | Description | Pattern | Matches |
|---|---|---|---|
| \a | Matches a bell character, \u0007. | \a | "\u0007" in "Error!" + '\u0007' |
| \b | In a character class, matches a backspace, \u0008. | [\b]{3,} | "\b\b\b\b" in "\b\b\b\b" |
| \t | Matches a tab, \u0009. | (\w+)\t | "item1\t", "item2\t" in "item1\titem2\t" |
| \r | Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.) | \r\n(\w+) | "\r\nThese" in "\r\nThese are\ntwo lines." |
| \v | Matches a vertical tab, \u000B. | [\v]{2,} | "\v\v\v" in "\v\v\v" |
| \f | Matches a form feed, \u000C. | [\f]{2,} | "\f\f\f" in "\f\f\f" |
| \n | Matches a new line, \u000A. | \r\n(\w+) | "\r\nThese" in "\r\nThese are\ntwo lines." |
| \e | Matches an escape, \u001B. | \e | "\x001B" in "\x001B" |
| \ nnn | Uses octal representation to specify a character (nnn consists of two or three digits). | \w\040\w | "a b", "c d" in "a bc d" |
| \x nn | Uses hexadecimal representation to specify a character (nn consists of exactly two digits). | \w\x20\w | "a b", "c d" in "a bc d" |
| \c X \c x | Matches the ASCII control character that is specified by X or x, where X or x is the letter of the control character. | \cC | "\x0003" in "\x0003" (Ctrl-C) |
| \u nnnn | Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn). | \w\u0020\w | "a b", "c d" in "a bc d" |
| \ | When followed by a character that is not recognized as an escaped character in this and other tables in this topic, matches that character. For example, \* is the same as \x2A, and \. is the same as \x2E. This allows the regular expression engine to disambiguate language elements (such as * or ?) and character literals (represented by \* or \?). | \d+[\+-x\*]\d+\d+[\+-x\*\d+ | "2+2" and "3*9" in "(2+2) * 3*9" |
Character Classes
A character class matches any one of a set of characters. Character classes include the language elements listed in the following table.
| Character class | Description | Pattern | Matches |
|---|---|---|---|
| [ character_group ] | Matches any single character in character_group. By default, the match is case-sensitive. | [ae] | "a" in "gray" "a", "e" in "lane" |
| [^ character_group ] | Negation: Matches any single character that is not in character_group. By default, characters in character_group are case-sensitive. | [^aei] | "r", "g", "n" in "reign" |
| [ first - last ] | Character range: Matches any single character in the range from first to last. | [A-Z] | "A", "B" in "AB123" |
| . | Wildcard: Matches any single character except \n. To match a literal period character (. or \u002E), you must precede it with the escape character (\.). | a.e | "ave" in "nave" "ate" in "water" |
| \p{ name } | Matches any single character in the Unicode general category or named block specified by name. | \p{Lu} \p{IsCyrillic} | "C", "L" in "City Lights" "Д", "Ж" in "ДЖem" |
| \P{ name } | Matches any single character that is not in the Unicode general category or named block specified by name. | \P{Lu} \P{IsCyrillic} | "i", "t", "y" in "City" "e", "m" in "ДЖem" |
| \w | Matches any word character. | \w | "I", "D", "A", "1", "3" in "ID A1.3" |
| \W | Matches any non-word character. | \W | " ", "." in "ID A1.3" |
| \s | Matches any white-space character. | \w\s | "D " in "ID A1.3" |
| \S | Matches any non-white-space character. | \s\S | " _" in "int __ctr" |
| \d | Matches any decimal digit. | \d | "4" in "4 = IV" |
| \D | Matches any character other than a decimal digit. | \D | " ", "=", " ", "I", "V" in "4 = IV" |

No comments:
Post a Comment