Home Explore Blog CI



git

21th chunk of `Documentation/git-fast-import.adoc`
62d1c20ec4221e60a1d1706a7ffb58bcd030443e1283e94e0000000100000d00
 approximately 64 MiB of memory.

The object table is actually a hashtable keyed on the object name
(the unique SHA-1).  This storage configuration allows fast-import to reuse
an existing or already written object and avoid writing duplicates
to the output packfile.  Duplicate blobs are surprisingly common
in an import, typically due to branch merges in the source.

per mark
~~~~~~~~
Marks are stored in a sparse array, using 1 pointer (4 bytes or 8
bytes, depending on pointer size) per mark.  Although the array
is sparse, frontends are still strongly encouraged to use marks
between 1 and n, where n is the total number of marks required for
this import.

per branch
~~~~~~~~~~
Branches are classified as active and inactive.  The memory usage
of the two classes is significantly different.

Inactive branches are stored in a structure which uses 96 or 120
bytes (32 bit or 64 bit systems, respectively), plus the length of
the branch name (typically under 200 bytes), per branch.  fast-import will
easily handle as many as 10,000 inactive branches in under 2 MiB
of memory.

Active branches have the same overhead as inactive branches, but
also contain copies of every tree that has been recently modified on
that branch.  If subtree `include` has not been modified since the
branch became active, its contents will not be loaded into memory,
but if subtree `src` has been modified by a commit since the branch
became active, then its contents will be loaded in memory.

As active branches store metadata about the files contained on that
branch, their in-memory storage size can grow to a considerable size
(see below).

fast-import automatically moves active branches to inactive status based on
a simple least-recently-used algorithm.  The LRU chain is updated on
each `commit` command.  The maximum number of active branches can be
increased or decreased on the command line with --active-branches=.

per active tree
~~~~~~~~~~~~~~~
Trees (aka directories) use just 12 bytes of memory on top of the
memory required for their entries (see ``per active file'' below).
The cost of a tree is virtually 0, as its overhead amortizes out
over the individual file entries.

per active file entry
~~~~~~~~~~~~~~~~~~~~~
Files (and pointers to subtrees) within active trees require 52 or 64
bytes (32/64 bit platforms) per entry.  To conserve space, file and
tree names are pooled in a common string table, allowing the filename
``Makefile'' to use just 16 bytes (after including the string header
overhead) no matter how many times it occurs within the project.

The active branch LRU, when coupled with the filename string pool
and lazy loading of subtrees, allows fast-import to efficiently import
projects with 2,000+ branches and 45,114+ files in a very limited
memory footprint (less than 2.7 MiB per active branch).

SIGNALS
-------
Sending *SIGUSR1* to the 'git fast-import' process ends the current
packfile early, simulating a `checkpoint` command.  The impatient
operator can use this facility to peek at the objects and refs from an
import in progress, at the cost of some added running time and worse
compression.

CONFIGURATION
-------------

include::includes/cmd-config-section-all.adoc[]

include::config/fastimport.adoc[]

SEE ALSO
--------
linkgit:git-fast-export[1]

GIT
---
Part of the linkgit:git[1] suite

Title: Git Fast Import Performance and Configuration
Summary
This section explains how Git fast-import manages memory for objects, marks, branches, and files, including the use of sparse arrays, hashtables, and string pooling, and discusses configuration options and signals that can be used to control the import process, including checkpointing and ending the current packfile.