Home Explore Blog CI



neovim

19th chunk of `runtime/doc/pattern.txt`
330fabde247a557bc13d43d8bd69b4d5e986425249ef0e7a0000000100000fa3
 this
works similar to the usage of <NL> for a <Nul>.

When working with expression evaluation, a <NL> character in the pattern
matches a <NL> in the string.  The use of "\n" (backslash n) to match a <NL>
doesn't work there, it only works to match text in the buffer.

				*pattern-multi-byte* *pattern-multibyte*
Patterns will also work with multibyte characters, mostly as you would
expect.  But invalid bytes may cause trouble, a pattern with an invalid byte
will probably never match.

==============================================================================
8. Composing characters					*patterns-composing*

							*/\Z*
When "\Z" appears anywhere in the pattern, all composing characters are
ignored.  Thus only the base characters need to match, the composing
characters may be different and the number of composing characters may differ.
Exception: If the pattern starts with one or more composing characters, these
must match.
							*/\%C*
Use "\%C" to skip any composing characters.  For example, the pattern "a" does
not match in "càt" (where the a has the composing character 0x0300), but
"a\%C" does.  Note that this does not match "cát" (where the á is character
0xe1, it does not have a compositing character).  It does match "cat" (where
the a is just an a).

When a composing character appears at the start of the pattern or after an
item that doesn't include the composing character, a match is found at any
character that includes this composing character.

When using a dot and a composing character, this works the same as the
composing character by itself, except that it doesn't matter what comes before
this.

The order of composing characters does not matter.  Also, the text may have
more composing characters than the pattern, it still matches.  But all
composing characters in the pattern must be found in the text.

Suppose B is a base character and x and y are composing characters:
	pattern		text		match ~
	Bxy		Bxy		yes (perfect match)
	Bxy		Byx		yes (order ignored)
	Bxy		By		no (x missing)
	Bxy		Bx		no (y missing)
	Bx		Bx		yes (perfect match)
	Bx		By		no (x missing)
	Bx		Bxy		yes (extra y ignored)
	Bx		Byx		yes (extra y ignored)

==============================================================================
9. Compare with Perl patterns				*perl-patterns*

Vim's regexes are most similar to Perl's, in terms of what you can do.  The
difference between them is mostly just notation;  here's a summary of where
they differ:

Capability			in Vimspeak	in Perlspeak ~
force case insensitivity	\c		(?i)
force case sensitivity		\C		(?-i)
backref-less grouping		\%(atom\)	(?:atom)
conservative quantifiers	\{-n,m}		`*?,` +?, ??, {}?
0-width match			atom\@=		(?=atom)
0-width non-match		atom\@!		(?!atom)
0-width preceding match		atom\@<=	(?<=atom)
0-width preceding non-match	atom\@<!	(?<!atom)
match without retry		atom\@>		(?>atom)

Vim and Perl handle newline characters inside a string a bit differently:

In Perl, ^ and $ only match at the very beginning and end of the text,
by default, but you can set the 'm' flag, which lets them match at
embedded newlines as well.  You can also set the 's' flag, which causes
a . to match newlines as well.  (Both these flags can be changed inside
a pattern using the same syntax used for the i flag above, BTW.)

On the other hand, Vim's ^ and $ always match at embedded newlines, and
you get two separate atoms, \%^ and \%$, which only match at the very
start and end of the text, respectively.  Vim solves the second problem
by giving you the \_ "modifier":  put it in front of a . or a character
class, and they will match newlines as well.

Finally, these constructs are unique to Perl:
- execution of arbitrary code in the regex:  (?{perl code})
- conditional expressions:  (?(condition)true-expr|false-expr)

...and these are unique to Vim:
- changing the magic-ness of a pattern:  \v \V \m \M
   (very useful for avoiding backslashitis)
- sequence of optionally matching atoms:  \%[atoms]
- \& (which is to \|

Title: Vim Regex: Composing Characters and Comparison with Perl Patterns
Summary
This section explains how Vim handles composing characters in regex patterns. The \Z modifier allows matching base characters regardless of composing characters, and \%C skips any composing characters. The order of composing characters is ignored, and the text may have more composing characters than the pattern. It also compares Vim's regex features to Perl's, highlighting differences in syntax and capabilities, such as case sensitivity, grouping, quantifiers, zero-width assertions, newline handling, and unique Vim constructs.