Home Explore Blog CI



git

19th chunk of `Documentation/git-fast-import.adoc`
76ea6b55be4c2b0c77028180e28cc966e2daf296a20271960000000100000e4a
 writers,
run the repack in the background and let it finish when it finishes.
There is no reason to wait to explore your new Git project!

If you choose to wait for the repack, don't try to run benchmarks
or performance tests until repacking is completed.  fast-import outputs
suboptimal packfiles that are simply never seen in real use
situations.

Repacking Historical Data
~~~~~~~~~~~~~~~~~~~~~~~~~
If you are repacking very old imported data (e.g. older than the
last year), consider expending some extra CPU time and supplying
--window=50 (or higher) when you run 'git repack'.
This will take longer, but will also produce a smaller packfile.
You only need to expend the effort once, and everyone using your
project will benefit from the smaller repository.

Include Some Progress Messages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Every once in a while have your frontend emit a `progress` message
to fast-import.  The contents of the messages are entirely free-form,
so one suggestion would be to output the current month and year
each time the current commit date moves into the next month.
Your users will feel better knowing how much of the data stream
has been processed.


PACKFILE OPTIMIZATION
---------------------
When packing a blob fast-import always attempts to deltify against the last
blob written.  Unless specifically arranged for by the frontend,
this will probably not be a prior version of the same file, so the
generated delta will not be the smallest possible.  The resulting
packfile will be compressed, but will not be optimal.

Frontends which have efficient access to all revisions of a
single file (for example reading an RCS/CVS ,v file) can choose
to supply all revisions of that file as a sequence of consecutive
`blob` commands.  This allows fast-import to deltify the different file
revisions against each other, saving space in the final packfile.
Marks can be used to later identify individual file revisions during
a sequence of `commit` commands.

The packfile(s) created by fast-import do not encourage good disk access
patterns.  This is caused by fast-import writing the data in the order
it is received on standard input, while Git typically organizes
data within packfiles to make the most recent (current tip) data
appear before historical data.  Git also clusters commits together,
speeding up revision traversal through better cache locality.

For this reason it is strongly recommended that users repack the
repository with `git repack -a -d` after fast-import completes, allowing
Git to reorganize the packfiles for faster data access.  If blob
deltas are suboptimal (see above) then also adding the `-f` option
to force recomputation of all deltas can significantly reduce the
final packfile size (30-50% smaller can be quite typical).

Instead of running `git repack` you can also run `git gc
--aggressive`, which will also optimize other things after an import
(e.g. pack loose refs). As noted in the "AGGRESSIVE" section in
linkgit:git-gc[1] the `--aggressive` option will find new deltas with
the `-f` option to linkgit:git-repack[1]. For the reasons elaborated
on above using `--aggressive` after a fast-import is one of the few
cases where it's known to be worthwhile.

MEMORY UTILIZATION
------------------
There are a number of factors which affect how much memory fast-import
requires to perform an import.  Like critical sections of core
Git, fast-import uses its own memory allocators to amortize any overheads
associated with malloc.  In practice fast-import tends to amortize any
malloc overheads to 0, due to its use of large block allocations.

per object
~~~~~~~~~~
fast-import maintains an in-memory

Title: Optimizing Packfiles and Memory Utilization in Git Fast Import
Summary
This section discusses strategies for optimizing packfiles created by Git fast-import, including repacking historical data, emitting progress messages, and using options like --window and --aggressive to reduce packfile size and improve data access patterns, as well as factors that affect memory utilization during the import process.