Home Explore Blog CI



git

1st chunk of `Documentation/gitformat-chunk.adoc`
6d3f275590e9f8d9b9e6ab58dc24fccde485128f49eef7970000000100000ae2
gitformat-chunk(5)
==================

NAME
----
gitformat-chunk - Chunk-based file formats

SYNOPSIS
--------

Used by linkgit:gitformat-commit-graph[5] and the "MIDX" format (see
the pack format documentation in linkgit:gitformat-pack[5]).

DESCRIPTION
-----------

Some file formats in Git use a common concept of "chunks" to describe
sections of the file. This allows structured access to a large file by
scanning a small "table of contents" for the remaining data. This common
format is used by the `commit-graph` and `multi-pack-index` files. See
the `multi-pack-index` format in linkgit:gitformat-pack[5] and
the `commit-graph` format in linkgit:gitformat-commit-graph[5] for
how they use the chunks to describe structured data.

A chunk-based file format begins with some header information custom to
that format. That header should include enough information to identify
the file type, format version, and number of chunks in the file. From this
information, that file can determine the start of the chunk-based region.

The chunk-based region starts with a table of contents describing where
each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
where C is the number of chunks. Consider the following table:

  | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
  |--------------------|------------------------|
  | ID[0]              | OFFSET[0]              |
  | ...                | ...                    |
  | ID[C]              | OFFSET[C]              |
  | 0x0000             | OFFSET[C+1]            |

Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
Each integer is stored in network-byte order.

The chunk identifier `ID[i]` is a label for the data stored within this
file from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
and `OFFSET[i]`. This requires that the chunk data appears contiguously
in the same order as the table of contents.

The final entry in the table of contents must be four zero bytes. This
confirms that the table of contents is ending and provides the offset for
the end of the chunk-based data.

Note: The chunk-based format expects that the file contains _at least_ a
trailing hash after `OFFSET[C+1]`.

Functions for working with chunk-based file formats are declared in
`chunk-format.h`. Using these methods provide extra checks that assist
developers when creating new file formats.

Writing chunk-based file formats
--------------------------------

To write a chunk-based file format, create a `struct chunkfile` by
calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
caller is responsible for opening the `hashfile` and writing header
information so the file format is identifiable

Title: Git Format Chunk
Summary
The Git format chunk is a common concept used in Git file formats, such as the commit-graph and multi-pack-index files, to describe sections of a file and allow structured access to large files by scanning a small table of contents.