Home Explore Blog CI



git

6th chunk of `Documentation/gitformat-pack.adoc`
3990733c63d21984d296f3631a5ee996b667dc6eb7659d410000000100000fa1
	will not exist and offsets are stored as in IDX v1.
		If there is at least one offset value larger than 2^32-1, then
		the large offset chunk must exist, and offsets larger than
		2^31-1 must be stored in it instead. If the large offset chunk
		exists and the 31st bit is on, then removing that bit reveals
		the row in the large offsets containing the 8-byte offset of
		this object.

	[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
	    8-byte offsets into large packfiles.

	[Optional] Bitmap pack order (ID: {'R', 'I', 'D', 'X'})
	    A list of MIDX positions (one per object in the MIDX, num_objects in
	    total, each a 4-byte unsigned integer in network byte order), sorted
	    according to their relative bitmap/pseudo-pack positions.

TRAILER:

	Index checksum of the above contents.

== multi-pack-index reverse indexes

Similar to the pack-based reverse index, the multi-pack index can also
be used to generate a reverse index.

Instead of mapping between offset, pack-, and index position, this
reverse index maps between an object's position within the MIDX, and
that object's position within a pseudo-pack that the MIDX describes
(i.e., the ith entry of the multi-pack reverse index holds the MIDX
position of ith object in pseudo-pack order).

To clarify the difference between these orderings, consider a multi-pack
reachability bitmap (which does not yet exist, but is what we are
building towards here). Each bit needs to correspond to an object in the
MIDX, and so we need an efficient mapping from bit position to MIDX
position.

One solution is to let bits occupy the same position in the oid-sorted
index stored by the MIDX. But because oids are effectively random, their
resulting reachability bitmaps would have no locality, and thus compress
poorly. (This is the reason that single-pack bitmaps use the pack
ordering, and not the .idx ordering, for the same purpose.)

So we'd like to define an ordering for the whole MIDX based around
pack ordering, which has far better locality (and thus compresses more
efficiently). We can think of a pseudo-pack created by the concatenation
of all of the packs in the MIDX. E.g., if we had a MIDX with three packs
(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an
ordering of the objects like:

    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|

where the ordering of the packs is defined by the MIDX's pack list,
and then the ordering of objects within each pack is the same as the
order in the actual packfile.

Given the list of packs and their counts of objects, you can
naïvely reconstruct that pseudo-pack ordering (e.g., the object at
position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
slots). But there's a catch. Objects may be duplicated between packs, in
which case the MIDX only stores one pointer to the object (and thus we'd
want only one slot in the bitmap).

Callers could handle duplicates themselves by reading objects in order
of their bit-position, but that's linear in the number of objects, and
much too expensive for ordinary bitmap lookups. Building a reverse index
solves this, since it is the logical inverse of the index, and that
index has already removed duplicates. But, building a reverse index on
the fly can be expensive. Since we already have an on-disk format for
pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
too.

Objects from the MIDX are ordered as follows to string together the
pseudo-pack. Let `pack(o)` return the pack from which `o` was selected
by the MIDX, and define an ordering of packs based on their numeric ID
(as stored by the MIDX). Let `offset(o)` return the object offset of `o`
within `pack(o)`. Then, compare `o1` and `o2` as follows:

  - If one of `pack(o1)` and `pack(o2)` is preferred and the other
    is not, then the preferred one sorts first.
+
(This is a detail that allows the MIDX bitmap to determine which
pack should be used by the pack-reuse mechanism, since it can ask
the MIDX

Title: Multi-Pack-Index Reverse Indexes and Pseudo-Pack Ordering
Summary
The multi-pack-index reverse index maps an object's position within the MIDX to its position within a pseudo-pack, allowing for efficient mapping between object positions and pseudo-pack order, which is necessary for compressing reachability bitmaps, and is achieved by concatenating packs in the MIDX and ordering objects based on pack ID and offset.