Home Explore Blog CI



neovim

9th chunk of `runtime/doc/mbyte.txt`
c814fbb932b0feb2add0246c03a28a335464fc41fa5830bf0000000100000b11
 used on MS-Windows, UTF-32 is not
widespread as file format.


					*mbyte-combining* *mbyte-composing*
A composing or combining character is used to change the meaning of the
character before it.  The combining characters are drawn on top of the
preceding character.

Nvim largely follows the definition of extended grapheme clusters in UAX#29
in the Unicode standard, with some modifications: An ascii char will always
start a new cluster. In addition 'arabicshape' enables the combining of some
arabic letters, when they are shaped to be displayed together in a single cell.

Too big combined characters cannot be displayed, but they can still be
inspected using the |g8| and |ga| commands described below.
When editing text a composing character is mostly considered part of the
preceding character.  For example "x" will delete a character and its
following composing characters by default.
If the 'delcombine' option is on, then pressing 'x' will delete the combining
characters, one at a time, then the base character.  But when inserting, you
type the first character and the following composing characters separately,
after which they will be joined.  The "r" command will not allow you to type a
combining character, because it doesn't know one is coming.  Use "R" instead.

Bytes which are not part of a valid UTF-8 byte sequence are handled like a
single character and displayed as <xx>, where "xx" is the hex value of the
byte.

Overlong sequences are not handled specially and displayed like a valid
character.  However, search patterns may not match on an overlong sequence.
(an overlong sequence is where more bytes are used than required for the
character.)  An exception is NUL (zero) which is displayed as "<00>".

In the file and buffer the full range of Unicode characters can be used (31
bits).  However, displaying only works for the characters present in the
selected font.

Useful commands:
- "ga" shows the decimal, hexadecimal and octal value of the character under
  the cursor.  If there are composing characters these are shown too.  (If the
  message is truncated, use ":messages").
- "g8" shows the bytes used in a UTF-8 character, also the composing
  characters, as hex numbers.
- ":set fileencodings=" forces using UTF-8 for all files.  The
  default is to automatically detect the encoding of a file.


STARTING VIM

You might want to select the font used for the menus.  Unfortunately this
doesn't always work.  See the system specific remarks below, and 'langmenu'.


USING UTF-8 IN X-WINDOWS				*utf-8-in-xwindows*

You need to specify a font to be used.  For double-wide characters another
font is required, which is exactly twice as wide.  There are three ways to do
this:

1. Set 'guifont' and let Nvim find a matching 'guifontwide'
2. Set 'guifont' and 'guifontwide'

See the documentation

Title: Combining Characters, UTF-8 Handling, and Configuring UTF-8 in X-Windows for Nvim
Summary
This section discusses how Nvim handles combining characters, treating them as part of the preceding character for editing purposes, while explaining the behavior of commands like 'x' and 'r' in relation to combining characters. It also covers how Nvim deals with invalid UTF-8 byte sequences and overlong sequences. Furthermore, the section details commands for inspecting characters and their UTF-8 representation, and provides guidance on configuring UTF-8 in X-Windows by setting appropriate fonts for Nvim.