Home Explore Blog CI



git

4th chunk of `Documentation/gitattributes.adoc`
2c950dc412a80d7367ab2bf33878c96ac75a9bc291d126c00000000100000fa4

-------------------------------------------------
$ echo "* text=auto" >.gitattributes
$ git add --renormalize .
$ git status        # Show files that will be normalized
$ git commit -m "Introduce end-of-line normalization"
-------------------------------------------------

If any files that should not be normalized show up in 'git status',
unset their `text` attribute before running 'git add -u'.

------------------------
manual.pdf	-text
------------------------

Conversely, text files that Git does not detect can have normalization
enabled manually.

------------------------
weirdchars.txt	text
------------------------

If `core.safecrlf` is set to "true" or "warn", Git verifies if
the conversion is reversible for the current setting of
`core.autocrlf`.  For "true", Git rejects irreversible
conversions; for "warn", Git only prints a warning but accepts
an irreversible conversion.  The safety triggers to prevent such
a conversion done to the files in the work tree, but there are a
few exceptions.  Even though...

- 'git add' itself does not touch the files in the work tree, the
  next checkout would, so the safety triggers;

- 'git apply' to update a text file with a patch does touch the files
  in the work tree, but the operation is about text files and CRLF
  conversion is about fixing the line ending inconsistencies, so the
  safety does not trigger;

- 'git diff' itself does not touch the files in the work tree, it is
  often run to inspect the changes you intend to next 'git add'.  To
  catch potential problems early, safety triggers.


`working-tree-encoding`
^^^^^^^^^^^^^^^^^^^^^^^

Git recognizes files encoded in ASCII or one of its supersets (e.g.
UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other
encodings (e.g. UTF-16) are interpreted as binary and consequently
built-in Git text processing tools (e.g. 'git diff') as well as most Git
web front ends do not visualize the contents of these files by default.

In these cases you can tell Git the encoding of a file in the working
directory with the `working-tree-encoding` attribute. If a file with this
attribute is added to Git, then Git re-encodes the content from the
specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded
content in its internal data structure (called "the index"). On checkout
the content is re-encoded back to the specified encoding.

Please note that using the `working-tree-encoding` attribute may have a
number of pitfalls:

- Alternative Git implementations (e.g. JGit or libgit2) and older Git
  versions (as of March 2018) do not support the `working-tree-encoding`
  attribute. If you decide to use the `working-tree-encoding` attribute
  in your repository, then it is strongly recommended to ensure that all
  clients working with the repository support it.
+
For example, Microsoft Visual Studio resources files (`*.rc`) or
PowerShell script files (`*.ps1`) are sometimes encoded in UTF-16.
If you declare `*.ps1` as files as UTF-16 and you add `foo.ps1` with
a `working-tree-encoding` enabled Git client, then `foo.ps1` will be
stored as UTF-8 internally. A client without `working-tree-encoding`
support will checkout `foo.ps1` as UTF-8 encoded file. This will
typically cause trouble for the users of this file.
+
If a Git client that does not support the `working-tree-encoding`
attribute adds a new file `bar.ps1`, then `bar.ps1` will be
stored "as-is" internally (in this example probably as UTF-16).
A client with `working-tree-encoding` support will interpret the
internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
That operation will fail and cause an error.

- Reencoding content to non-UTF encodings can cause errors as the
  conversion might not be UTF-8 round trip safe. If you suspect your
  encoding to not be round trip safe, then add it to
  `core.checkRoundtripEncoding` to make Git check the round trip
  encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character
  set) is known to have round trip

Title: Git Text File Handling and Encoding
Summary
Git handles text file line endings and encodings with attributes like 'text' and 'working-tree-encoding', allowing for normalization and conversion between different encodings, with considerations for safety triggers, compatibility, and potential pitfalls, such as round trip encoding safety.