used on MS-Windows, UTF-32 is not
widespread as file format.
*mbyte-combining* *mbyte-composing*
A composing or combining character is used to change the meaning of the
character before it. The combining characters are drawn on top of the
preceding character.
Nvim largely follows the definition of extended grapheme clusters in UAX#29
in the Unicode standard, with some modifications: An ascii char will always
start a new cluster. In addition 'arabicshape' enables the combining of some
arabic letters, when they are shaped to be displayed together in a single cell.
Too big combined characters cannot be displayed, but they can still be
inspected using the |g8| and |ga| commands described below.
When editing text a composing character is mostly considered part of the
preceding character. For example "x" will delete a character and its
following composing characters by default.
If the 'delcombine' option is on, then pressing 'x' will delete the combining
characters, one at a time, then the base character. But when inserting, you
type the first character and the following composing characters separately,
after which they will be joined. The "r" command will not allow you to type a
combining character, because it doesn't know one is coming. Use "R" instead.
Bytes which are not part of a valid UTF-8 byte sequence are handled like a
single character and displayed as <xx>, where "xx" is the hex value of the
byte.
Overlong sequences are not handled specially and displayed like a valid
character. However, search patterns may not match on an overlong sequence.
(an overlong sequence is where more bytes are used than required for the
character.) An exception is NUL (zero) which is displayed as "<00>".
In the file and buffer the full range of Unicode characters can be used (31
bits). However, displaying only works for the characters present in the
selected font.
Useful commands:
- "ga" shows the decimal, hexadecimal and octal value of the character under
the cursor. If there are composing characters these are shown too. (If the
message is truncated, use ":messages").
- "g8" shows the bytes used in a UTF-8 character, also the composing
characters, as hex numbers.
- ":set fileencodings=" forces using UTF-8 for all files. The
default is to automatically detect the encoding of a file.
STARTING VIM
You might want to select the font used for the menus. Unfortunately this
doesn't always work. See the system specific remarks below, and 'langmenu'.
USING UTF-8 IN X-WINDOWS *utf-8-in-xwindows*
You need to specify a font to be used. For double-wide characters another
font is required, which is exactly twice as wide. There are three ways to do
this:
1. Set 'guifont' and let Nvim find a matching 'guifontwide'
2. Set 'guifont' and 'guifontwide'
See the documentation