Vim Regex Collections: Negation, Character Ranges, and Classes

'x', 'y' or 'z' [a-zA-Z]$ any alphabetic character at the end of a line \c[a-z]$ same [А-яЁё] Russian alphabet (with utf-8 and cp1251) */[\n]* With "\_" prepended the collection also includes the end-of-line. The same can be done by including "\n" in the collection. The end-of-line is also matched when the collection starts with "^"! Thus "\_[^ab]" matches the end-of-line and any character but "a" and "b". This makes it Vi compatible: Without the "\_" or "\n" the collection does not match an end-of-line. *E769* When the ']' is not there Vim will not give an error message but assume no collection is used. Useful to search for '['. However, you do get E769 for internal searching. And be aware that in a `:substitute` command the whole command becomes the pattern. E.g. ":s/[/x/" searches for "[/x" and replaces it with nothing. It does not search for "[" and replaces it with "x"! *E944* *E945* If the sequence begins with "^", it matches any single character NOT in the collection: "[^xyz]" matches anything but 'x', 'y' and 'z'. - If two characters in the sequence are separated by '-', this is shorthand for the full list of ASCII characters between them. E.g., "[0-9]" matches any decimal digit. If the starting character exceeds the ending character, e.g. [c-a], E944 occurs. Non-ASCII characters can be used, but the character values must not be more than 256 apart in the old regexp engine. For example, searching by [\u3000-\u4000] after setting re=1 emits a E945 error. Prepending \%#=2 will fix it. - A character class expression is evaluated to the set of characters belonging to that character class. The following character classes are supported: Name Func Contents ~ *[:alnum:]* [:alnum:] isalnum ASCII letters and digits *[:alpha:]* [:alpha:] isalpha ASCII letters *[:blank:]* [:blank:] space and tab *[:cntrl:]* [:cntrl:] iscntrl ASCII control characters *[:digit:]* [:digit:] decimal digits '0' to '9' *[:graph:]* [:graph:] isgraph ASCII printable characters excluding space *[:lower:]* [:lower:] (1) lowercase letters (all letters when 'ignorecase' is used) *[:print:]* [:print:] (2) printable characters including space *[:punct:]* [:punct:] ispunct ASCII punctuation characters *[:space:]* [:space:] whitespace characters: space, tab, CR, NL, vertical tab, form feed *[:upper:]* [:upper:] (3) uppercase letters (all letters when 'ignorecase' is used) *[:xdigit:]* [:xdigit:] hexadecimal digits: 0-9, a-f, A-F *[:return:]* [:return:] the <CR> character *[:tab:]* [:tab:] the <Tab> character *[:escape:]* [:escape:] the <Esc> character *[:backspace:]* [:backspace:] the <BS> character *[:ident:]* [:ident:] identifier character (same as "\i") *[:keyword:]* [:keyword:] keyword character (same as "\k") *[:fname:]* [:fname:] file name character (same as "\f") The square brackets in character class expressions are additional to the square brackets delimiting a collection. For example, the following is a plausible pattern for a UNIX filename: "[-./[:alnum:]_~]\+". That is, a list of at least one character, each of which is either '-', '.', '/', alphabetic, numeric, '_' or '~'. These items only work for 8-bit characters, except [:lower:] and [:upper:] also work for multibyte characters when using the new regexp engine. See |two-engines|. In the future these items may work for multibyte characters. For now, to get all "alpha" characters you can use: [[:lower:][:upper:]]. The "Func" column shows what library function is used. The implementation depends on the system. Otherwise: (1) Uses islower() for ASCII and Vim builtin rules for other characters. (2) Uses Vim builtin rules (3) As with (1) but using isupper() */[[=* *[==]* - An equivalence class. This means that characters are matched that have almost the same

This section delves into the intricacies of collections in Vim regex. It explains how to negate a collection using '^', use character ranges like '[a-z]', and employ character class expressions such as '[:alnum:]', '[:alpha:]', '[:digit:]', etc. for matching sets of characters. It also details how end-of-line characters are handled within collections, and notes specific library functions used for character class implementation.