Home Explore Blog CI



neovim

4th chunk of `runtime/doc/usr_45.txt`
c3737f79218edc868eb5f204f9a06d47a433bd2cd377bf190000000100000d66
 view a file that way, if you have lots of time at hand.

	Note:
	Since 'encoding' is used for all text inside Vim, changing it makes
	all non-ASCII text invalid.  You will notice this when using registers
	and the |shada-file| (e.g., a remembered search pattern).  It's
	recommended to set 'encoding' in your vimrc file, and leave it alone.

==============================================================================
*45.4*	Editing files with a different encoding

Suppose you have setup Vim to use Unicode, and you want to edit a file that is
in 16-bit Unicode.  Sounds simple, right?  Well, Vim actually uses utf-8
encoding internally, thus the 16-bit encoding must be converted, since there
is a difference between the character set (Unicode) and the encoding (utf-8 or
16-bit).
   Vim will try to detect what kind of file you are editing.  It uses the
encoding names in the 'fileencodings' option.  When using Unicode, the default
value is: "ucs-bom,utf-8,latin1".  This means that Vim checks the file to see
if it's one of these encodings:

	ucs-bom		File must start with a Byte Order Mark (BOM).  This
			allows detection of 16-bit, 32-bit and utf-8 Unicode
			encodings.
	utf-8		utf-8 Unicode.  This is rejected when a sequence of
			bytes is illegal in utf-8.
	latin1		The good old 8-bit encoding.  Always works.

When you start editing that 16-bit Unicode file, and it has a BOM, Vim will
detect this and convert the file to utf-8 when reading it.  The 'fileencoding'
option (without s at the end) is set to the detected value.  In this case it
is "utf-16le".  That means it's Unicode, 16-bit and little-endian.  This
file format is common on MS-Windows (e.g., for registry files).
   When writing the file, Vim will compare 'fileencoding' with 'encoding'.  If
they are different, the text will be converted.
   An empty value for 'fileencoding' means that no conversion is to be done.
Thus the text is assumed to be encoded with 'encoding'.

If the default 'fileencodings' value is not good for you, set it to the
encodings you want Vim to try.  Only when a value is found to be invalid will
the next one be used.  Putting "latin1" first doesn't work, because it is
never illegal.  An example, to fall back to Japanese when the file doesn't
have a BOM and isn't utf-8: >

	:set fileencodings=ucs-bom,utf-8,sjis

See |encoding-values| for suggested values.  Other values may work as well.
This depends on the conversion available.


FORCING AN ENCODING

If the automatic detection doesn't work you must tell Vim what encoding the
file is.  Example: >

	:edit ++enc=koi8-r russian.txt

The "++enc" part specifies the name of the encoding to be used for this file
only.  Vim will convert the file from the specified encoding, Russian in this
example, to 'encoding'.  'fileencoding' will also be set to the specified
encoding, so that the reverse conversion can be done when writing the file.
   The same argument can be used when writing the file.  This way you can
actually use Vim to convert a file.  Example: >

	:write ++enc=utf-8 russian.txt
<
	Note:
	Conversion may result in lost characters.  Conversion from an encoding
	to Unicode and back is mostly free of this problem, unless there are
	illegal characters.  Conversion from Unicode to other encodings often
	loses information when there was more than one language in the file.

==============================================================================

Title: Editing Files with Different Encodings: Automatic Detection and Forcing Encoding
Summary
This section focuses on how Vim handles files with different encodings, especially when Unicode is enabled. It explains how Vim uses the 'fileencodings' option to automatically detect the encoding of a file, prioritizing those that can be uniquely identified. It also covers how to manually force a specific encoding when opening or saving a file using the ":edit ++enc=" and ":write ++enc=" commands, enabling file conversion. A note is added about potential character loss during encoding conversion.