Home Explore Blog CI



git

6th chunk of `Documentation/git-pack-objects.adoc`
3521383671f6d2490a6ad4be8dd694c3b1c1a41890b9ec840000000100000bd1
 magnitude
to the hash function. This version excels at distinguishing short paths
and finding renames across directories. However, the hash function depends
primarily on the final 16 bytes of the path. If there are many paths in
the repo that have the same final 16 bytes and differ only by parent
directory, then this name-hash may lead to too many collisions and cause
poor results. At the moment, this version is required when writing
reachability bitmap files with `--write-bitmap-index`.
+
The name hash version `2` has similar locality features as version `1`,
except it considers each path component separately and overlays the hashes
with a shift. This still prioritizes the final bytes of the path, but also
"salts" the lower bits of the hash using the parent directory names. This
method allows for some of the locality benefits of version `1` while
breaking most of the collisions from a similarly-named file appearing in
many different directories. At the moment, this version is not allowed
when writing reachability bitmap files with `--write-bitmap-index` and it
will be automatically changed to version `1`.


DELTA ISLANDS
-------------

When possible, `pack-objects` tries to reuse existing on-disk deltas to
avoid having to search for new ones on the fly. This is an important
optimization for serving fetches, because it means the server can avoid
inflating most objects at all and just send the bytes directly from
disk. This optimization can't work when an object is stored as a delta
against a base which the receiver does not have (and which we are not
already sending). In that case the server "breaks" the delta and has to
find a new one, which has a high CPU cost. Therefore it's important for
performance that the set of objects in on-disk delta relationships match
what a client would fetch.

In a normal repository, this tends to work automatically. The objects
are mostly reachable from the branches and tags, and that's what clients
fetch. Any deltas we find on the server are likely to be between objects
the client has or will have.

But in some repository setups, you may have several related but separate
groups of ref tips, with clients tending to fetch those groups
independently. For example, imagine that you are hosting several "forks"
of a repository in a single shared object store, and letting clients
view them as separate repositories through `GIT_NAMESPACE` or separate
repos using the alternates mechanism. A naive repack may find that the
optimal delta for an object is against a base that is only found in
another fork. But when a client fetches, they will not have the base
object, and we'll have to find a new delta on the fly.

A similar situation may exist if you have many refs outside of
`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
`refs/pull` or `refs/changes` used by some hosting providers). By
default, clients fetch only heads and tags, and deltas against objects
found only in those other groups cannot be sent as-is.

Delta islands solve

Title: Git Delta Islands and Name Hash Versions
Summary
The git pack-objects command uses name hash versions to optimize delta compression, with version 1 prioritizing hash locality and version 2 reducing collisions by considering each path component separately, and also features delta islands, which help improve performance by matching on-disk delta relationships with client fetch patterns, especially in repository setups with separate groups of ref tips, such as hosting multiple forks in a single object store.