Vim Regex Collections: Special Characters, Equivalence Classes, and Optional Matching

class expressions are additional to the square brackets delimiting a collection. For example, the following is a plausible pattern for a UNIX filename: "[-./[:alnum:]_~]\+". That is, a list of at least one character, each of which is either '-', '.', '/', alphabetic, numeric, '_' or '~'. These items only work for 8-bit characters, except [:lower:] and [:upper:] also work for multibyte characters when using the new regexp engine. See |two-engines|. In the future these items may work for multibyte characters. For now, to get all "alpha" characters you can use: [[:lower:][:upper:]]. The "Func" column shows what library function is used. The implementation depends on the system. Otherwise: (1) Uses islower() for ASCII and Vim builtin rules for other characters. (2) Uses Vim builtin rules (3) As with (1) but using isupper() */[[=* *[==]* - An equivalence class. This means that characters are matched that have almost the same meaning, e.g., when ignoring accents. This only works for Unicode, latin1 and latin9. The form is: [=a=] */[[.* *[..]* - A collation element. This currently simply accepts a single character in the form: [.a.] */\]* - To include a literal ']', '^', '-' or '\' in the collection, put a backslash before it: "[xyz\]]", "[\^xyz]", "[xy\-z]" and "[xyz\\]". (Note: POSIX does not support the use of a backslash this way). For ']' you can also make it the first character (following a possible "^"): "[]xyz]" or "[^]xyz]". For '-' you can also make it the first or last character: "[-xyz]", "[^-xyz]" or "[xyz-]". For '\' you can also let it be followed by any character that's not in "^]-\bdertnoUux". "[\xyz]" matches '\', 'x', 'y' and 'z'. It's better to use "\\" though, future expansions may use other characters after '\'. - Omitting the trailing ] is not considered an error. "[]" works like "[]]", it matches the ']' character. - The following translations are accepted when the 'l' flag is not included in 'cpoptions': \e <Esc> \t <Tab> \r <CR> (NOT end-of-line!) \b <BS> \n line break, see above |/[\n]| \d123 decimal number of character \o40 octal number of character up to 0o377 \x20 hexadecimal number of character up to 0xff \u20AC hex. number of multibyte character up to 0xffff \U1234 hex. number of multibyte character up to 8 characters 0xffffffff |E1541| NOTE: The other backslash codes mentioned above do not work inside []! - Matching with a collection can be slow, because each character in the text has to be compared with each character in the collection. Use one of the other atoms above when possible. Example: "\d" is much faster than "[0-9]" and matches the same characters. However, the new |NFA| regexp engine deals with this better than the old one. */\%[]* *E69* *E70* *E369* \%[] A sequence of optionally matched atoms. This always matches. It matches as much of the list of atoms it contains as possible. Thus it stops at the first atom that doesn't match. For example: > /r\%[ead] < matches "r", "re", "rea" or "read". The longest that matches is used. To match the Ex command "function", where "fu" is required and "nction" is optional, this would work: > /\<fu\%[nction]\> < The end-of-word atom "\>" is used to avoid matching "fu" in "full". It gets more complicated when the atoms are not ordinary characters. You don't often have to use it, but it is possible. Example: > /\<r\%[[eo]ad]\> < Matches the words "r", "re", "ro", "rea", "roa", "read" and "road". There can be no \(\), \%(\) or \z(\) items inside the [] and \%[] does not nest. To include a "[" use "[[]" and for "]" use []]", e.g.,: > /index\%[[[]0[]]] < matches "index" "index[", "index[0" and "index[0]". */\%d* */\%x* */\%o* */\%u* */\%U* *E678* \%d123 Matches the character specified with a decimal number. Must be followed by a non-digit. \%o40

This section covers various aspects of Vim regex collections. It explains how to include literal special characters like ']', '^', '-', and '\' within a collection. It then details the use of equivalence classes for matching characters with similar meanings (e.g., ignoring accents), and collation elements. Further, the section explains how to define a sequence of optionally matched atoms using \%[], which matches as much of the contained list as possible.