Home Explore Blog CI



git

1st chunk of `Documentation/gitformat-pack.adoc`
fe2e1da6d7ff16ae43c672f558affd048a2a4e39e5fab28e0000000100000fa1
gitformat-pack(5)
=================

NAME
----
gitformat-pack - Git pack format


SYNOPSIS
--------
[verse]
$GIT_DIR/objects/pack/pack-*.{pack,idx}
$GIT_DIR/objects/pack/pack-*.rev
$GIT_DIR/objects/pack/pack-*.mtimes
$GIT_DIR/objects/pack/multi-pack-index

DESCRIPTION
-----------

The Git pack format is how Git stores most of its primary repository
data. Over the lifetime of a repository, loose objects (if any) and
smaller packs are consolidated into larger pack(s). See
linkgit:git-gc[1] and linkgit:git-pack-objects[1].

The pack format is also used over-the-wire, see
e.g. linkgit:gitprotocol-v2[5], as well as being a part of
other container formats in the case of linkgit:gitformat-bundle[5].

== Checksums and object IDs

In a repository using the traditional SHA-1, pack checksums, index checksums,
and object IDs (object names) mentioned below are all computed using SHA-1.
Similarly, in SHA-256 repositories, these values are computed using SHA-256.

== pack-*.pack files have the following format:

   - A header appears at the beginning and consists of the following:

     4-byte signature:
         The signature is: {'P', 'A', 'C', 'K'}

     4-byte version number (network byte order):
	 Git currently accepts version number 2 or 3 but
         generates version 2 only.

     4-byte number of objects contained in the pack (network byte order)

     Observation: we cannot have more than 4G versions ;-) and
     more than 4G objects in a pack.

   - The header is followed by a number of object entries, each of
     which looks like this:

     (undeltified representation)
     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
     compressed data

     (deltified representation)
     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
     base object name if OBJ_REF_DELTA or a negative relative
	 offset from the delta object's position in the pack if this
	 is an OBJ_OFS_DELTA object
     compressed delta data

     Observation: the length of each object is encoded in a variable
     length format and is not constrained to 32-bit or anything.

  - The trailer records a pack checksum of all of the above.

=== Object types

Valid object types are:

- OBJ_COMMIT (1)
- OBJ_TREE (2)
- OBJ_BLOB (3)
- OBJ_TAG (4)
- OBJ_OFS_DELTA (6)
- OBJ_REF_DELTA (7)

Type 5 is reserved for future expansion. Type 0 is invalid.

=== Size encoding

This document uses the following "size encoding" of non-negative
integers: From each byte, the seven least significant bits are
used to form the resulting integer. As long as the most significant
bit is 1, this process continues; the byte with MSB 0 provides the
last seven bits.  The seven-bit chunks are concatenated. Later
values are more significant.

This size encoding should not be confused with the "offset encoding",
which is also used in this document.

=== Deltified representation

Conceptually there are only four object types: commit, tree, tag and
blob. However to save space, an object could be stored as a "delta" of
another "base" object. These representations are assigned new types
ofs-delta and ref-delta, which is only valid in a pack file.

Both ofs-delta and ref-delta store the "delta" to be applied to
another object (called 'base object') to reconstruct the object. The
difference between them is, ref-delta directly encodes base object
name. If the base object is in the same pack, ofs-delta encodes
the offset of the base object in the pack instead.

The base object could also be deltified if it's in the same pack.
Ref-delta can also refer to an object outside the pack (i.e. the
so-called "thin pack"). When stored on disk however, the pack should
be self contained to avoid cyclic dependency.

The delta data starts with the size of the base object and the
size of the object to be reconstructed. These sizes are
encoded using the size encoding from above.  The remainder of
the delta data is a sequence of instructions to reconstruct the object
from the base object. If the base object is

Title: Git Pack Format
Summary
This document describes the Git pack format, which is used to store repository data, including the structure of pack files, object types, and deltified representations, as well as how checksums and object IDs are computed using SHA-1 or SHA-256.