to iso-8859-9
1 cp861 similar to iso-8859-1
1 cp862 similar to iso-8859-1
1 cp863 similar to iso-8859-8
1 cp865 similar to iso-8859-1
1 cp866 similar to iso-8859-5
1 cp869 similar to iso-8859-7
1 cp874 Thai
1 cp1250 Czech, Polish, etc.
1 cp1251 Cyrillic
1 cp1253 Greek
1 cp1254 Turkish
1 cp1255 Hebrew
1 cp1256 Arabic
1 cp1257 Baltic
1 cp1258 Vietnamese
1 cp{number} MS-Windows: any installed single-byte codepage
2 cp932 Japanese (Windows only)
2 euc-jp Japanese
2 sjis Japanese
2 cp949 Korean
2 euc-kr Korean
2 cp936 simplified Chinese (Windows only)
2 euc-cn simplified Chinese
2 cp950 traditional Chinese (alias for big5)
2 big5 traditional Chinese (alias for cp950)
2 euc-tw traditional Chinese
2 2byte-{name} any double-byte encoding (Vim-specific name)
2 cp{number} MS-Windows: any installed double-byte codepage
u utf-8 32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1)
u ucs-2 16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1)
u ucs-2le like ucs-2, little endian
u utf-16 ucs-2 extended with double-words for more characters
u utf-16le like utf-16, little endian
u ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1)
u ucs-4le like ucs-4, little endian
The {name} can be any encoding name that your system supports. It is passed
to iconv() to convert between UTF-8 and the encoding of the file.
For MS-Windows "cp{number}" means using codepage {number}.
Examples: >
:set fileencoding=8bit-cp1252
:set fileencoding=2byte-cp932
The MS-Windows codepage 1252 is very similar to latin1. For practical reasons
the same encoding is used and it's called latin1. 'isprint' can be used to
display the characters 0x80 - 0xA0 or not.
Several aliases can be used, they are translated to one of the names above.
Incomplete list:
1 ansi same as latin1 (obsolete, for backward compatibility)
2 japan Japanese: "euc-jp"
2 korea Korean: "euc-kr"
2 prc simplified Chinese: "euc-cn"
2 chinese same as "prc"
2 taiwan traditional Chinese: "euc-tw"
u utf8 same as utf-8
u unicode same as ucs-2
u ucs2be same as ucs-2 (big endian)
u ucs-2be same as ucs-2 (big endian)
u ucs-4be same as ucs-4 (big endian)
u utf-32 same as ucs-4
u utf-32le same as ucs-4le
default the encoding of the current locale.
For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever
you can. The default is to use big-endian (most significant byte comes
first):
name bytes char ~
ucs-2 11 22 1122
ucs-2le 22 11 1122
ucs-4 11 22 33 44 11223344
ucs-4le 44 33 22 11 11223344
On MS-Windows systems you often want to use "ucs-2le", because it uses little
endian UCS-2.
There are a few encodings which are similar, but not exactly the same. Vim
treats them as if they were different encodings, so that conversion will be
done when needed. You might want to use the similar name to avoid conversion
or when conversion is not possible:
cp932, shift-jis, sjis
cp936, euc-cn
CONVERSION *charset-conversion*
Vim will automatically convert from one to another encoding in several places:
- When reading a file and 'fileencoding' is different from "utf-8"
- When writing a file and 'fileencoding' is different from "utf-8"
- When displaying messages and the encoding used for LC_MESSAGES differs from
"utf-8" (requires a gettext version that supports this).
- When reading a Vim script where |:scriptencoding| is different from
"utf-8".
Most of these require iconv. Conversion for reading and writing files may
also be specified with the 'charconvert' option.
Useful utilities for converting the charset:
All: iconv
GNU iconv can convert most encodings. Unicode is used as the
intermediate encoding, which allows conversion from and to all other
encodings. See https://directory.fsf.org/wiki/Libiconv.
*mbyte-conversion*
When reading and writing files in an encoding different from "utf-8",
conversion needs to be done. These conversions