Home Explore Blog CI



neovim

4th chunk of `runtime/doc/mbyte.txt`
5a2206604e90e5eb529ab95503ce0afcde18fd32429cf4150000000100000faa
 to iso-8859-9
1   cp861	similar to iso-8859-1
1   cp862	similar to iso-8859-1
1   cp863	similar to iso-8859-8
1   cp865	similar to iso-8859-1
1   cp866	similar to iso-8859-5
1   cp869	similar to iso-8859-7
1   cp874	Thai
1   cp1250	Czech, Polish, etc.
1   cp1251	Cyrillic
1   cp1253	Greek
1   cp1254	Turkish
1   cp1255	Hebrew
1   cp1256	Arabic
1   cp1257	Baltic
1   cp1258	Vietnamese
1   cp{number}	MS-Windows: any installed single-byte codepage
2   cp932	Japanese (Windows only)
2   euc-jp	Japanese
2   sjis	Japanese
2   cp949	Korean
2   euc-kr	Korean
2   cp936	simplified Chinese (Windows only)
2   euc-cn	simplified Chinese
2   cp950	traditional Chinese (alias for big5)
2   big5	traditional Chinese (alias for cp950)
2   euc-tw	traditional Chinese
2   2byte-{name} any double-byte encoding (Vim-specific name)
2   cp{number}	MS-Windows: any installed double-byte codepage
u   utf-8	32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1)
u   ucs-2	16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1)
u   ucs-2le	like ucs-2, little endian
u   utf-16	ucs-2 extended with double-words for more characters
u   utf-16le	like utf-16, little endian
u   ucs-4	32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1)
u   ucs-4le	like ucs-4, little endian

The {name} can be any encoding name that your system supports.  It is passed
to iconv() to convert between UTF-8 and the encoding of the file.
For MS-Windows "cp{number}" means using codepage {number}.
Examples: >
		:set fileencoding=8bit-cp1252
		:set fileencoding=2byte-cp932

The MS-Windows codepage 1252 is very similar to latin1.  For practical reasons
the same encoding is used and it's called latin1.  'isprint' can be used to
display the characters 0x80 - 0xA0 or not.

Several aliases can be used, they are translated to one of the names above.
Incomplete list:

1   ansi	same as latin1 (obsolete, for backward compatibility)
2   japan	Japanese: "euc-jp"
2   korea	Korean: "euc-kr"
2   prc		simplified Chinese: "euc-cn"
2   chinese     same as "prc"
2   taiwan	traditional Chinese: "euc-tw"
u   utf8	same as utf-8
u   unicode	same as ucs-2
u   ucs2be	same as ucs-2 (big endian)
u   ucs-2be	same as ucs-2 (big endian)
u   ucs-4be	same as ucs-4 (big endian)
u   utf-32	same as ucs-4
u   utf-32le	same as ucs-4le
    default     the encoding of the current locale.

For the UCS codes the byte order matters.  This is tricky, use UTF-8 whenever
you can.  The default is to use big-endian (most significant byte comes
first):
	    name	bytes		char ~
	    ucs-2	      11 22	    1122
	    ucs-2le	      22 11	    1122
	    ucs-4	11 22 33 44	11223344
	    ucs-4le	44 33 22 11	11223344

On MS-Windows systems you often want to use "ucs-2le", because it uses little
endian UCS-2.

There are a few encodings which are similar, but not exactly the same.  Vim
treats them as if they were different encodings, so that conversion will be
done when needed.  You might want to use the similar name to avoid conversion
or when conversion is not possible:

	cp932, shift-jis, sjis
	cp936, euc-cn

CONVERSION						*charset-conversion*

Vim will automatically convert from one to another encoding in several places:
- When reading a file and 'fileencoding' is different from "utf-8"
- When writing a file and 'fileencoding' is different from "utf-8"
- When displaying messages and the encoding used for LC_MESSAGES differs from
  "utf-8" (requires a gettext version that supports this).
- When reading a Vim script where |:scriptencoding| is different from
  "utf-8".
Most of these require iconv.  Conversion for reading and writing files may
also be specified with the 'charconvert' option.

Useful utilities for converting the charset:
    All:	    iconv
	GNU iconv can convert most encodings.  Unicode is used as the
	intermediate encoding, which allows conversion from and to all other
	encodings.  See https://directory.fsf.org/wiki/Libiconv.


							*mbyte-conversion*
When reading and writing files in an encoding different from "utf-8",
conversion needs to be done.  These conversions

Title: Detailed Encoding Options and Character Conversion in Vim
Summary
This section details various encoding options supported by Vim, including single-byte (cp), double-byte (euc, sjis, big5), and Unicode (utf, ucs) encodings, as well as aliases for common encodings. It explains how Vim uses iconv() for converting between UTF-8 and file encodings, particularly on MS-Windows. It also describes automatic character conversion in scenarios like reading/writing files and displaying messages, highlighting the role of iconv and 'charconvert' for character set conversion. Additionally, it mentions the need for conversion when reading and writing files in encodings different from "utf-8".