Combining Characters, UTF-8 Handling, and Configuring UTF-8 in X-Windows for Nvim

used on MS-Windows, UTF-32 is not widespread as file format. *mbyte-combining* *mbyte-composing* A composing or combining character is used to change the meaning of the character before it. The combining characters are drawn on top of the preceding character. Nvim largely follows the definition of extended grapheme clusters in UAX#29 in the Unicode standard, with some modifications: An ascii char will always start a new cluster. In addition 'arabicshape' enables the combining of some arabic letters, when they are shaped to be displayed together in a single cell. Too big combined characters cannot be displayed, but they can still be inspected using the |g8| and |ga| commands described below. When editing text a composing character is mostly considered part of the preceding character. For example "x" will delete a character and its following composing characters by default. If the 'delcombine' option is on, then pressing 'x' will delete the combining characters, one at a time, then the base character. But when inserting, you type the first character and the following composing characters separately, after which they will be joined. The "r" command will not allow you to type a combining character, because it doesn't know one is coming. Use "R" instead. Bytes which are not part of a valid UTF-8 byte sequence are handled like a single character and displayed as <xx>, where "xx" is the hex value of the byte. Overlong sequences are not handled specially and displayed like a valid character. However, search patterns may not match on an overlong sequence. (an overlong sequence is where more bytes are used than required for the character.) An exception is NUL (zero) which is displayed as "<00>". In the file and buffer the full range of Unicode characters can be used (31 bits). However, displaying only works for the characters present in the selected font. Useful commands: - "ga" shows the decimal, hexadecimal and octal value of the character under the cursor. If there are composing characters these are shown too. (If the message is truncated, use ":messages"). - "g8" shows the bytes used in a UTF-8 character, also the composing characters, as hex numbers. - ":set fileencodings=" forces using UTF-8 for all files. The default is to automatically detect the encoding of a file. STARTING VIM You might want to select the font used for the menus. Unfortunately this doesn't always work. See the system specific remarks below, and 'langmenu'. USING UTF-8 IN X-WINDOWS *utf-8-in-xwindows* You need to specify a font to be used. For double-wide characters another font is required, which is exactly twice as wide. There are three ways to do this: 1. Set 'guifont' and let Nvim find a matching 'guifontwide' 2. Set 'guifont' and 'guifontwide' See the documentation

This section discusses how Nvim handles combining characters, treating them as part of the preceding character for editing purposes, while explaining the behavior of commands like 'x' and 'r' in relation to combining characters. It also covers how Nvim deals with invalid UTF-8 byte sequences and overlong sequences. Furthermore, the section details commands for inspecting characters and their UTF-8 representation, and provides guidance on configuring UTF-8 in X-Windows by setting appropriate fonts for Nvim.