gitformat-pack(5)
=================
NAME
----
gitformat-pack - Git pack format
SYNOPSIS
--------
[verse]
$GIT_DIR/objects/pack/pack-*.{pack,idx}
$GIT_DIR/objects/pack/pack-*.rev
$GIT_DIR/objects/pack/pack-*.mtimes
$GIT_DIR/objects/pack/multi-pack-index
DESCRIPTION
-----------
The Git pack format is how Git stores most of its primary repository
data. Over the lifetime of a repository, loose objects (if any) and
smaller packs are consolidated into larger pack(s). See
linkgit:git-gc[1] and linkgit:git-pack-objects[1].
The pack format is also used over-the-wire, see
e.g. linkgit:gitprotocol-v2[5], as well as being a part of
other container formats in the case of linkgit:gitformat-bundle[5].
== Checksums and object IDs
In a repository using the traditional SHA-1, pack checksums, index checksums,
and object IDs (object names) mentioned below are all computed using SHA-1.
Similarly, in SHA-256 repositories, these values are computed using SHA-256.
== pack-*.pack files have the following format:
- A header appears at the beginning and consists of the following:
4-byte signature:
The signature is: {'P', 'A', 'C', 'K'}
4-byte version number (network byte order):
Git currently accepts version number 2 or 3 but
generates version 2 only.
4-byte number of objects contained in the pack (network byte order)
Observation: we cannot have more than 4G versions ;-) and
more than 4G objects in a pack.
- The header is followed by a number of object entries, each of
which looks like this:
(undeltified representation)
n-byte type and length (3-bit type, (n-1)*7+4-bit length)
compressed data
(deltified representation)
n-byte type and length (3-bit type, (n-1)*7+4-bit length)
base object name if OBJ_REF_DELTA or a negative relative
offset from the delta object's position in the pack if this
is an OBJ_OFS_DELTA object
compressed delta data
Observation: the length of each object is encoded in a variable
length format and is not constrained to 32-bit or anything.
- The trailer records a pack checksum of all of the above.
=== Object types
Valid object types are:
- OBJ_COMMIT (1)
- OBJ_TREE (2)
- OBJ_BLOB (3)
- OBJ_TAG (4)
- OBJ_OFS_DELTA (6)
- OBJ_REF_DELTA (7)
Type 5 is reserved for future expansion. Type 0 is invalid.
=== Size encoding
This document uses the following "size encoding" of non-negative
integers: From each byte, the seven least significant bits are
used to form the resulting integer. As long as the most significant
bit is 1, this process continues; the byte with MSB 0 provides the
last seven bits. The seven-bit chunks are concatenated. Later
values are more significant.
This size encoding should not be confused with the "offset encoding",
which is also used in this document.
=== Deltified representation
Conceptually there are only four object types: commit, tree, tag and
blob. However to save space, an object could be stored as a "delta" of
another "base" object. These representations are assigned new types
ofs-delta and ref-delta, which is only valid in a pack file.
Both ofs-delta and ref-delta store the "delta" to be applied to
another object (called 'base object') to reconstruct the object. The
difference between them is, ref-delta directly encodes base object
name. If the base object is in the same pack, ofs-delta encodes
the offset of the base object in the pack instead.
The base object could also be deltified if it's in the same pack.
Ref-delta can also refer to an object outside the pack (i.e. the
so-called "thin pack"). When stored on disk however, the pack should
be self contained to avoid cyclic dependency.
The delta data starts with the size of the base object and the
size of the object to be reconstructed. These sizes are
encoded using the size encoding from above. The remainder of
the delta data is a sequence of instructions to reconstruct the object
from the base object. If the base object is