Refinance now before rates go up! Get multiple rate quotes at GetMyLender.com.

Regular Expression in Dot Net Framework Part I - Character Escapes and Character Classes

A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.

Each section in this quick reference lists a particular category of characters, operators, and constructs that you can use to define regular expressions:

Character escapes
Character classes
Anchors
Grouping constructs
Quantifiers
Backreference constructs
Alternation constructs
Substitutions
Regular expression options
Miscellaneous constructs

Character Escapes

The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally.

Escaped character Description Pattern Matches
\a Matches a bell character, \u0007. \a "\u0007" in "Error!" + '\u0007'
\b In a character class, matches a backspace, \u0008. [\b]{3,} "\b\b\b\b" in "\b\b\b\b"
\t Matches a tab, \u0009. (\w+)\t "item1\t", "item2\t" in "item1\titem2\t"
\r Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.) \r\n(\w+) "\r\nThese" in "\r\nThese are\ntwo lines."
\v Matches a vertical tab, \u000B. [\v]{2,} "\v\v\v" in "\v\v\v"
\f Matches a form feed, \u000C. [\f]{2,} "\f\f\f" in "\f\f\f"
\n Matches a new line, \u000A. \r\n(\w+) "\r\nThese" in "\r\nThese are\ntwo lines."
\e Matches an escape, \u001B. \e "\x001B" in "\x001B"
\ nnn Uses octal representation to specify a character (nnn consists of two or three digits). \w\040\w "a b", "c d" in "a bc d"
\x nn Uses hexadecimal representation to specify a character (nn consists of exactly two digits). \w\x20\w "a b", "c d" in "a bc d"
\c X \c x Matches the ASCII control character that is specified by X or x, where X or x is the letter of the control character. \cC "\x0003" in "\x0003" (Ctrl-C)
\u nnnn Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn). \w\u0020\w "a b", "c d" in "a bc d"
\ When followed by a character that is not recognized as an escaped character in this and other tables in this topic, matches that character. For example, \* is the same as \x2A, and \. is the same as \x2E. This allows the regular expression engine to disambiguate language elements (such as * or ?) and character literals (represented by \* or \?). \d+[\+-x\*]\d+\d+[\+-x\*\d+ "2+2" and "3*9" in "(2+2) * 3*9"

Character Classes

A character class matches any one of a set of characters. Character classes include the language elements listed in the following table.

Character class Description Pattern Matches
[ character_group ] Matches any single character in character_group. By default, the match is case-sensitive. [ae] "a" in "gray" "a", "e" in "lane"
[^ character_group ] Negation: Matches any single character that is not in character_group. By default, characters in character_group are case-sensitive. [^aei] "r", "g", "n" in "reign"
[ first - last ] Character range: Matches any single character in the range from first to last. [A-Z] "A", "B" in "AB123"
. Wildcard: Matches any single character except \n. To match a literal period character (. or \u002E), you must precede it with the escape character (\.). a.e "ave" in "nave" "ate" in "water"
\p{ name } Matches any single character in the Unicode general category or named block specified by name. \p{Lu} \p{IsCyrillic} "C", "L" in "City Lights" "Д", "Ж" in "ДЖem"
\P{ name } Matches any single character that is not in the Unicode general category or named block specified by name. \P{Lu} \P{IsCyrillic} "i", "t", "y" in "City" "e", "m" in "ДЖem"
\w Matches any word character. \w "I", "D", "A", "1", "3" in "ID A1.3"
\W Matches any non-word character. \W " ", "." in "ID A1.3"
\s Matches any white-space character. \w\s "D " in "ID A1.3"
\S Matches any non-white-space character. \s\S " _" in "int __ctr"
\d Matches any decimal digit. \d "4" in "4 = IV"
\D Matches any character other than a decimal digit. \D " ", "=", " ", "I", "V" in "4 = IV"

No comments:

Post a Comment