Detailed Encoding Options and Character Conversion in Vim

to iso-8859-9 1 cp861 similar to iso-8859-1 1 cp862 similar to iso-8859-1 1 cp863 similar to iso-8859-8 1 cp865 similar to iso-8859-1 1 cp866 similar to iso-8859-5 1 cp869 similar to iso-8859-7 1 cp874 Thai 1 cp1250 Czech, Polish, etc. 1 cp1251 Cyrillic 1 cp1253 Greek 1 cp1254 Turkish 1 cp1255 Hebrew 1 cp1256 Arabic 1 cp1257 Baltic 1 cp1258 Vietnamese 1 cp{number} MS-Windows: any installed single-byte codepage 2 cp932 Japanese (Windows only) 2 euc-jp Japanese 2 sjis Japanese 2 cp949 Korean 2 euc-kr Korean 2 cp936 simplified Chinese (Windows only) 2 euc-cn simplified Chinese 2 cp950 traditional Chinese (alias for big5) 2 big5 traditional Chinese (alias for cp950) 2 euc-tw traditional Chinese 2 2byte-{name} any double-byte encoding (Vim-specific name) 2 cp{number} MS-Windows: any installed double-byte codepage u utf-8 32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1) u ucs-2 16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1) u ucs-2le like ucs-2, little endian u utf-16 ucs-2 extended with double-words for more characters u utf-16le like utf-16, little endian u ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1) u ucs-4le like ucs-4, little endian The {name} can be any encoding name that your system supports. It is passed to iconv() to convert between UTF-8 and the encoding of the file. For MS-Windows "cp{number}" means using codepage {number}. Examples: > :set fileencoding=8bit-cp1252 :set fileencoding=2byte-cp932 The MS-Windows codepage 1252 is very similar to latin1. For practical reasons the same encoding is used and it's called latin1. 'isprint' can be used to display the characters 0x80 - 0xA0 or not. Several aliases can be used, they are translated to one of the names above. Incomplete list: 1 ansi same as latin1 (obsolete, for backward compatibility) 2 japan Japanese: "euc-jp" 2 korea Korean: "euc-kr" 2 prc simplified Chinese: "euc-cn" 2 chinese same as "prc" 2 taiwan traditional Chinese: "euc-tw" u utf8 same as utf-8 u unicode same as ucs-2 u ucs2be same as ucs-2 (big endian) u ucs-2be same as ucs-2 (big endian) u ucs-4be same as ucs-4 (big endian) u utf-32 same as ucs-4 u utf-32le same as ucs-4le default the encoding of the current locale. For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever you can. The default is to use big-endian (most significant byte comes first): name bytes char ~ ucs-2 11 22 1122 ucs-2le 22 11 1122 ucs-4 11 22 33 44 11223344 ucs-4le 44 33 22 11 11223344 On MS-Windows systems you often want to use "ucs-2le", because it uses little endian UCS-2. There are a few encodings which are similar, but not exactly the same. Vim treats them as if they were different encodings, so that conversion will be done when needed. You might want to use the similar name to avoid conversion or when conversion is not possible: cp932, shift-jis, sjis cp936, euc-cn CONVERSION *charset-conversion* Vim will automatically convert from one to another encoding in several places: - When reading a file and 'fileencoding' is different from "utf-8" - When writing a file and 'fileencoding' is different from "utf-8" - When displaying messages and the encoding used for LC_MESSAGES differs from "utf-8" (requires a gettext version that supports this). - When reading a Vim script where |:scriptencoding| is different from "utf-8". Most of these require iconv. Conversion for reading and writing files may also be specified with the 'charconvert' option. Useful utilities for converting the charset: All: iconv GNU iconv can convert most encodings. Unicode is used as the intermediate encoding, which allows conversion from and to all other encodings. See https://directory.fsf.org/wiki/Libiconv. *mbyte-conversion* When reading and writing files in an encoding different from "utf-8", conversion needs to be done. These conversions

This section details various encoding options supported by Vim, including single-byte (cp), double-byte (euc, sjis, big5), and Unicode (utf, ucs) encodings, as well as aliases for common encodings. It explains how Vim uses iconv() for converting between UTF-8 and file encodings, particularly on MS-Windows. It also describes automatic character conversion in scenarios like reading/writing files and displaying messages, highlighting the role of iconv and 'charconvert' for character set conversion. Additionally, it mentions the need for conversion when reading and writing files in encodings different from "utf-8".