Git Fast Import Performance and Configuration

approximately 64 MiB of memory. The object table is actually a hashtable keyed on the object name (the unique SHA-1). This storage configuration allows fast-import to reuse an existing or already written object and avoid writing duplicates to the output packfile. Duplicate blobs are surprisingly common in an import, typically due to branch merges in the source. per mark ~~~~~~~~ Marks are stored in a sparse array, using 1 pointer (4 bytes or 8 bytes, depending on pointer size) per mark. Although the array is sparse, frontends are still strongly encouraged to use marks between 1 and n, where n is the total number of marks required for this import. per branch ~~~~~~~~~~ Branches are classified as active and inactive. The memory usage of the two classes is significantly different. Inactive branches are stored in a structure which uses 96 or 120 bytes (32 bit or 64 bit systems, respectively), plus the length of the branch name (typically under 200 bytes), per branch. fast-import will easily handle as many as 10,000 inactive branches in under 2 MiB of memory. Active branches have the same overhead as inactive branches, but also contain copies of every tree that has been recently modified on that branch. If subtree `include` has not been modified since the branch became active, its contents will not be loaded into memory, but if subtree `src` has been modified by a commit since the branch became active, then its contents will be loaded in memory. As active branches store metadata about the files contained on that branch, their in-memory storage size can grow to a considerable size (see below). fast-import automatically moves active branches to inactive status based on a simple least-recently-used algorithm. The LRU chain is updated on each `commit` command. The maximum number of active branches can be increased or decreased on the command line with --active-branches=. per active tree ~~~~~~~~~~~~~~~ Trees (aka directories) use just 12 bytes of memory on top of the memory required for their entries (see ``per active file'' below). The cost of a tree is virtually 0, as its overhead amortizes out over the individual file entries. per active file entry ~~~~~~~~~~~~~~~~~~~~~ Files (and pointers to subtrees) within active trees require 52 or 64 bytes (32/64 bit platforms) per entry. To conserve space, file and tree names are pooled in a common string table, allowing the filename ``Makefile'' to use just 16 bytes (after including the string header overhead) no matter how many times it occurs within the project. The active branch LRU, when coupled with the filename string pool and lazy loading of subtrees, allows fast-import to efficiently import projects with 2,000+ branches and 45,114+ files in a very limited memory footprint (less than 2.7 MiB per active branch). SIGNALS ------- Sending *SIGUSR1* to the 'git fast-import' process ends the current packfile early, simulating a `checkpoint` command. The impatient operator can use this facility to peek at the objects and refs from an import in progress, at the cost of some added running time and worse compression. CONFIGURATION ------------- include::includes/cmd-config-section-all.adoc[] include::config/fastimport.adoc[] SEE ALSO -------- linkgit:git-fast-export[1] GIT --- Part of the linkgit:git[1] suite

This section explains how Git fast-import manages memory for objects, marks, branches, and files, including the use of sparse arrays, hashtables, and string pooling, and discusses configuration options and signals that can be used to control the import process, including checkpointing and ending the current packfile.