ב֟ 0x59f CP qarnei-parah
ב֪ 0x5aa Cy yerach-ben-yomo
ב֫ 0x5ab Co ole
ב֬ 0x5ac Ci iluy
ב֭ 0x5ad Cd dehi
ב֮ 0x5ae Cn zinor
ב֯ 0x5af CC masora circle
Combining forms:
ﬠ 0xfb20 X` Alternative ayin
ﬡ 0xfb21 X' Alternative alef
ﬢ 0xfb22 X-d Alternative dalet
ﬣ 0xfb23 X-h Alternative he
ﬤ 0xfb24 X-k Alternative kaf
ﬥ 0xfb25 X-l Alternative lamed
ﬦ 0xfb26 X-m Alternative mem-sofit
ﬧ 0xfb27 X-r Alternative resh
ﬨ 0xfb28 X-t Alternative tav
﬩ 0xfb29 X-+ Alternative plus
שׁ 0xfb2a XW shin+shin-dot
שׂ 0xfb2b Xw shin+sin-dot
שּׁ 0xfb2c X..W shin+shin-dot+dagesh
שּׂ 0xfb2d X..w shin+sin-dot+dagesh
אַ 0xfb2e XA alef+patah
אָ 0xfb2f XO alef+qamats
אּ 0xfb30 XI alef+hiriq (mapiq)
בּ 0xfb31 X.b bet+dagesh
גּ 0xfb32 X.g gimel+dagesh
דּ 0xfb33 X.d dalet+dagesh
הּ 0xfb34 X.h he+dagesh
וּ 0xfb35 Xu vav+dagesh
זּ 0xfb36 X.z zayin+dagesh
טּ 0xfb38 X.T tet+dagesh
יּ 0xfb39 X.y yud+dagesh
ךּ 0xfb3a X.K kaf sofit+dagesh
כּ 0xfb3b X.k kaf+dagesh
לּ 0xfb3c X.l lamed+dagesh
מּ 0xfb3e X.m mem+dagesh
נּ 0xfb40 X.n nun+dagesh
סּ 0xfb41 X.s samech+dagesh
ףּ 0xfb43 X.P pe sofit+dagesh
פּ 0xfb44 X.p pe+dagesh
צּ 0xfb46 X.x tsadi+dagesh
קּ 0xfb47 X.q qof+dagesh
רּ 0xfb48 X.r resh+dagesh
שּ 0xfb49 X.w shin+dagesh
תּ 0xfb4a X.t tav+dagesh
וֹ 0xfb4b Xo vav+holam
בֿ 0xfb4c XRb bet+rafe
כֿ 0xfb4d XRk kaf+rafe
פֿ 0xfb4e XRp pe+rafe
ﭏ 0xfb4f Xal alef-lamed
==============================================================================
Using UTF-8 *mbyte-utf8* *UTF-8* *utf-8* *utf8*
*Unicode* *unicode*
The Unicode character set was designed to include all characters from other
character sets. Therefore it is possible to write text in (almost) any
language using Unicode. And it's mostly possible to mix these languages in
one file, which is impossible with other encodings.
Unicode can be encoded in several ways. The most popular one is UTF-8, which
uses one or more bytes for each character and is backwards compatible with
ASCII. On MS-Windows UTF-16 is also used (previously UCS-2), which uses
16-bit words. Nvim supports all of these encodings, but always uses UTF-8
internally.
Nvim supports double-width characters; works best with 'guifontwide'. When
using only 'guifont' the wide characters are drawn in the normal width and
a space to fill the gap.
EMOJI *emoji*
You can list emoji characters using this script: >vim
:source $VIMRUNTIME/scripts/emoji_list.lua
<
*bom-bytes*
When reading a file a BOM (Byte Order Mark) can be used to recognize the
Unicode encoding:
EF BB BF UTF-8
FE FF UTF-16 big endian
FF FE UTF-16 little endian
00 00 FE FF UTF-32 big endian
FF FE 00 00 UTF-32 little endian
UTF-8 is the recommended encoding. Note that it's difficult to tell UTF-16
and UTF-32 apart. UTF-16 is often used on MS-Windows, UTF-32 is not
widespread as file format.
*mbyte-combining* *mbyte-composing*
A composing or combining character is used to change the meaning of the
character before it. The combining characters are drawn on top of the
preceding character.
Nvim largely follows the definition of extended grapheme clusters in UAX#29
in the Unicode standard, with some modifications: An ascii char will always
start a new cluster. In addition 'arabicshape' enables the combining of some
arabic letters, when they are shaped to be displayed together in a single cell.
Too big combined characters cannot be displayed, but they can still be
inspected using the |g8| and |ga| commands described below.
When editing text a composing character is mostly considered part of the
preceding character. For example "x" will delete a character and its
following composing characters by default.
If the 'delcombine' option is on, then pressing 'x' will delete the combining
characters, one at a time,