that's linear in the number of objects, and
much too expensive for ordinary bitmap lookups. Building a reverse index
solves this, since it is the logical inverse of the index, and that
index has already removed duplicates. But, building a reverse index on
the fly can be expensive. Since we already have an on-disk format for
pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
too.
Objects from the MIDX are ordered as follows to string together the
pseudo-pack. Let `pack(o)` return the pack from which `o` was selected
by the MIDX, and define an ordering of packs based on their numeric ID
(as stored by the MIDX). Let `offset(o)` return the object offset of `o`
within `pack(o)`. Then, compare `o1` and `o2` as follows:
- If one of `pack(o1)` and `pack(o2)` is preferred and the other
is not, then the preferred one sorts first.
+
(This is a detail that allows the MIDX bitmap to determine which
pack should be used by the pack-reuse mechanism, since it can ask
the MIDX for the pack containing the object at bit position 0).
- If `pack(o1) ≠ pack(o2)`, then sort the two objects in descending
order based on the pack ID.
- Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in
pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1)
< offset(o2)`).
In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
objects in packs stored by the MIDX, laid out in pack order, and the
packs arranged in MIDX order (with the preferred pack coming first).
The MIDX's reverse index is stored in the optional 'RIDX' chunk within
the MIDX itself.
=== `BTMP` chunk
The Bitmapped Packfiles (`BTMP`) chunk encodes additional information
about the objects in the multi-pack index's reachability bitmap. Recall
that objects from the MIDX are arranged in "pseudo-pack" order (see
above) for reachability bitmaps.
From the example above, suppose we have packs "a", "b", and "c", with
10, 15, and 20 objects, respectively. In pseudo-pack order, those would
be arranged as follows:
|a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
When working with single-pack bitmaps (or, equivalently, multi-pack
reachability bitmaps with a preferred pack), linkgit:git-pack-objects[1]
performs ``verbatim'' reuse, attempting to reuse chunks of the bitmapped
or preferred packfile instead of adding objects to the packing list.
When a chunk of bytes is reused from an existing pack, any objects
contained therein do not need to be added to the packing list, saving
memory and CPU time. But a chunk from an existing packfile can only be
reused when the following conditions are met:
- The chunk contains only objects which were requested by the caller
(i.e. does not contain any objects which the caller didn't ask for
explicitly or implicitly).
- All objects stored in non-thin packs as offset- or reference-deltas
also include their base object in the resulting pack.
The `BTMP` chunk encodes the necessary information in order to implement
multi-pack reuse over a set of packfiles as described above.
Specifically, the `BTMP` chunk encodes three pieces of information (all
32-bit unsigned integers in network byte-order) for each packfile `p`
that is stored in the MIDX, as follows:
`bitmap_pos`:: The first bit position (in pseudo-pack order) in the
multi-pack index's reachability bitmap occupied by an object from `p`.
`bitmap_nr`:: The number of bit positions (including the one at
`bitmap_pos`) that encode objects from that pack `p`.
For example, the `BTMP` chunk corresponding to the above example (with
packs ``a'', ``b'', and ``c'') would look like:
[cols="1,2,2"]
|===
| |`bitmap_pos` |`bitmap_nr`
|packfile ``a''
|`0`
|`10`
|packfile ``b''
|`10`
|`15`
|packfile ``c''
|`25`
|`20`
|===
With this information in place, we can treat each packfile as
individually reusable in the same fashion as verbatim pack reuse is
performed on individual packs prior to the implementation of the `BTMP`
chunk.