Home Explore Blog CI



git

9th chunk of `Documentation/gitformat-pack.adoc`
edb395e41be4a7d3ee76209de85f0dacb8a5d1b326152e190000000100000e61
 removed immediately, since doing so could race with
an incoming push which may reference an object which is about to be deleted.
Instead, those unreachable objects are stored as loose objects and stay that way
until they are older than the expiration window, at which point they are removed
by linkgit:git-prune[1].

Git must store these unreachable objects loose in order to keep track of their
per-object mtimes. If these unreachable objects were written into one big pack,
then either freshening that pack (because an object contained within it was
re-written) or creating a new pack of unreachable objects would cause the pack's
mtime to get updated, and the objects within it would never leave the expiration
window. Instead, objects are stored loose in order to keep track of the
individual object mtimes and avoid a situation where all cruft objects are
freshened at once.

This can lead to undesirable situations when a repository contains many
unreachable objects which have not yet left the grace period. Having large
directories in the shards of `.git/objects` can lead to decreased performance in
the repository. But given enough unreachable objects, this can lead to inode
starvation and degrade the performance of the whole system. Since we
can never pack those objects, these repositories often take up a large amount of
disk space, since we can only zlib compress them, but not store them in delta
chains.

=== Cruft packs

A cruft pack eliminates the need for storing unreachable objects in a loose
state by including the per-object mtimes in a separate file alongside a single
pack containing all loose objects.

A cruft pack is written by `git repack --cruft` when generating a new pack.
linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
is a classic all-into-one repack, meaning that everything in the resulting pack is
reachable, and everything else is unreachable. Once written, the `--cruft`
option instructs `git repack` to generate another pack containing only objects
not packed in the previous step (which equates to packing all unreachable
objects together). This progresses as follows:

  1. Enumerate every object, marking any object which is (a) not contained in a
     kept-pack, and (b) whose mtime is within the grace period as a traversal
     tip.

  2. Perform a reachability traversal based on the tips gathered in the previous
     step, adding every object along the way to the pack.

  3. Write the pack out, along with a `.mtimes` file that records the per-object
     timestamps.

This mode is invoked internally by linkgit:git-repack[1] when instructed to
write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
of packs which will not be deleted by the repack; in other words, they contain
all of the repository's reachable objects.

When a repository already has a cruft pack, `git repack --cruft` typically only
adds objects to it. An exception to this is when `git repack` is given the
`--cruft-expiration` option, which allows the generated cruft pack to omit
expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
later on.

It is linkgit:git-gc[1] that is typically responsible for removing expired
unreachable objects.

=== Alternatives

Notable alternatives to this design include:

  - The location of the per-object mtime data.

On the location of mtime data, a new auxiliary file tied to the pack was chosen
to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
support for optional chunks of data, it may make sense to consolidate the
`.mtimes` format into the `.idx` itself.

GIT
---
Part of the linkgit:git[1] suite

Title: Cruft Packs in Git
Summary
Cruft packs eliminate the need for storing unreachable objects in a loose state by including per-object mtimes in a separate file alongside a single pack, reducing disk space usage and improving performance by avoiding inode starvation and large directories in the .git/objects shards.