Home Explore Blog CI



neovim

8th chunk of `runtime/doc/mbyte.txt`
5385d5876b3c2afd998dc7e36d604b17613afce45ff6c5930000000100001005

ב֟    0x59f   CP   qarnei-parah
ב֪    0x5aa   Cy   yerach-ben-yomo
ב֫    0x5ab   Co   ole
ב֬    0x5ac   Ci   iluy
ב֭    0x5ad   Cd   dehi
ב֮    0x5ae   Cn   zinor
ב֯    0x5af   CC   masora circle

Combining forms:
ﬠ    0xfb20  X`   Alternative ayin
ﬡ    0xfb21  X'   Alternative alef
ﬢ    0xfb22  X-d  Alternative dalet
ﬣ    0xfb23  X-h  Alternative he
ﬤ    0xfb24  X-k  Alternative kaf
ﬥ    0xfb25  X-l  Alternative lamed
ﬦ    0xfb26  X-m  Alternative mem-sofit
ﬧ    0xfb27  X-r  Alternative resh
ﬨ    0xfb28  X-t  Alternative tav
﬩    0xfb29  X-+  Alternative plus
שׁ    0xfb2a  XW   shin+shin-dot
שׂ    0xfb2b  Xw   shin+sin-dot
שּׁ    0xfb2c  X..W  shin+shin-dot+dagesh
שּׂ    0xfb2d  X..w  shin+sin-dot+dagesh
אַ    0xfb2e  XA   alef+patah
אָ    0xfb2f  XO   alef+qamats
אּ    0xfb30  XI   alef+hiriq (mapiq)
בּ    0xfb31  X.b  bet+dagesh
גּ    0xfb32  X.g  gimel+dagesh
דּ    0xfb33  X.d  dalet+dagesh
הּ    0xfb34  X.h  he+dagesh
וּ    0xfb35  Xu  vav+dagesh
זּ    0xfb36  X.z  zayin+dagesh
טּ    0xfb38  X.T  tet+dagesh
יּ    0xfb39  X.y  yud+dagesh
ךּ    0xfb3a  X.K  kaf sofit+dagesh
כּ    0xfb3b  X.k  kaf+dagesh
לּ    0xfb3c  X.l  lamed+dagesh
מּ    0xfb3e  X.m  mem+dagesh
נּ    0xfb40  X.n  nun+dagesh
סּ    0xfb41  X.s  samech+dagesh
ףּ    0xfb43  X.P  pe sofit+dagesh
פּ    0xfb44  X.p  pe+dagesh
צּ    0xfb46  X.x  tsadi+dagesh
קּ    0xfb47  X.q  qof+dagesh
רּ    0xfb48  X.r  resh+dagesh
שּ    0xfb49  X.w  shin+dagesh
תּ    0xfb4a  X.t  tav+dagesh
וֹ    0xfb4b  Xo   vav+holam
בֿ    0xfb4c  XRb  bet+rafe
כֿ    0xfb4d  XRk  kaf+rafe
פֿ    0xfb4e  XRp  pe+rafe
ﭏ    0xfb4f  Xal  alef-lamed

==============================================================================
Using UTF-8				*mbyte-utf8* *UTF-8* *utf-8* *utf8*
							*Unicode* *unicode*
The Unicode character set was designed to include all characters from other
character sets.  Therefore it is possible to write text in (almost) any
language using Unicode.  And it's mostly possible to mix these languages in
one file, which is impossible with other encodings.

Unicode can be encoded in several ways.  The most popular one is UTF-8, which
uses one or more bytes for each character and is backwards compatible with
ASCII.  On MS-Windows UTF-16 is also used (previously UCS-2), which uses
16-bit words.  Nvim supports all of these encodings, but always uses UTF-8
internally.

Nvim supports double-width characters; works best with 'guifontwide'.  When
using only 'guifont' the wide characters are drawn in the normal width and
a space to fill the gap.

EMOJI							*emoji*

You can list emoji characters using this script: >vim
    :source $VIMRUNTIME/scripts/emoji_list.lua
<
							*bom-bytes*
When reading a file a BOM (Byte Order Mark) can be used to recognize the
Unicode encoding:
	EF BB BF     UTF-8
	FE FF        UTF-16 big endian
	FF FE        UTF-16 little endian
	00 00 FE FF  UTF-32 big endian
	FF FE 00 00  UTF-32 little endian

UTF-8 is the recommended encoding.  Note that it's difficult to tell UTF-16
and UTF-32 apart.  UTF-16 is often used on MS-Windows, UTF-32 is not
widespread as file format.


					*mbyte-combining* *mbyte-composing*
A composing or combining character is used to change the meaning of the
character before it.  The combining characters are drawn on top of the
preceding character.

Nvim largely follows the definition of extended grapheme clusters in UAX#29
in the Unicode standard, with some modifications: An ascii char will always
start a new cluster. In addition 'arabicshape' enables the combining of some
arabic letters, when they are shaped to be displayed together in a single cell.

Too big combined characters cannot be displayed, but they can still be
inspected using the |g8| and |ga| commands described below.
When editing text a composing character is mostly considered part of the
preceding character.  For example "x" will delete a character and its
following composing characters by default.
If the 'delcombine' option is on, then pressing 'x' will delete the combining
characters, one at a time,

Title: Unicode Combining Forms, UTF-8 Encoding, and Combining Characters in Nvim
Summary
This section lists Unicode combining forms with their UTF-8 encoding and keymap representations. It then discusses UTF-8 encoding in Nvim, highlighting its compatibility with ASCII and its support for double-width characters and emojis. It also describes how Nvim handles combining characters, drawing them on top of preceding characters, and explains the behavior of editing commands like 'x' when dealing with these characters, influenced by the 'delcombine' option.