Tips and Optimization for Git Fast Import

line. fast-import will dump a file which lists every mark and the Git object SHA-1 that corresponds to it. If the frontend can tie the marks back to the source repository, it is easy to verify the accuracy and completeness of the import by comparing each Git commit to the corresponding source revision. Coming from a system such as Perforce or Subversion, this should be quite simple, as the fast-import mark can also be the Perforce changeset number or the Subversion revision number. Freely Skip Around Branches ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Don't bother trying to optimize the frontend to stick to one branch at a time during an import. Although doing so might be slightly faster for fast-import, it tends to increase the complexity of the frontend code considerably. The branch LRU builtin to fast-import tends to behave very well, and the cost of activating an inactive branch is so low that bouncing around between branches has virtually no impact on import performance. Handling Renames ~~~~~~~~~~~~~~~~ When importing a renamed file or directory, simply delete the old name(s) and modify the new name(s) during the corresponding commit. Git performs rename detection after-the-fact, rather than explicitly during a commit. Use Tag Fixup Branches ~~~~~~~~~~~~~~~~~~~~~~ Some other SCM systems let the user create a tag from multiple files which are not from the same commit/changeset. Or to create tags which are a subset of the files available in the repository. Importing these tags as-is in Git is impossible without making at least one commit which ``fixes up'' the files to match the content of the tag. Use fast-import's `reset` command to reset a dummy branch outside of your normal branch space to the base commit for the tag, then commit one or more file fixup commits, and finally tag the dummy branch. For example since all normal branches are stored under `refs/heads/` name the tag fixup branch `TAG_FIXUP`. This way it is impossible for the fixup branch used by the importer to have namespace conflicts with real branches imported from the source (the name `TAG_FIXUP` is not `refs/heads/TAG_FIXUP`). When committing fixups, consider using `merge` to connect the commit(s) which are supplying file revisions to the fixup branch. Doing so will allow tools such as 'git blame' to track through the real commit history and properly annotate the source files. After fast-import terminates the frontend will need to do `rm .git/TAG_FIXUP` to remove the dummy branch. Import Now, Repack Later ~~~~~~~~~~~~~~~~~~~~~~~~ As soon as fast-import completes the Git repository is completely valid and ready for use. Typically this takes only a very short time, even for considerably large projects (100,000+ commits). However repacking the repository is necessary to improve data locality and access performance. It can also take hours on extremely large projects (especially if -f and a large --window parameter is used). Since repacking is safe to run alongside readers and writers, run the repack in the background and let it finish when it finishes. There is no reason to wait to explore your new Git project! If you choose to wait for the repack, don't try to run benchmarks or performance tests until repacking is completed. fast-import outputs suboptimal packfiles that are simply never seen in real use situations. Repacking Historical Data ~~~~~~~~~~~~~~~~~~~~~~~~~ If you are repacking very old imported data (e.g. older than the last year), consider expending some extra CPU time and supplying --window=50 (or higher) when you run 'git repack'. This will take longer, but will also produce a smaller packfile. You only need to expend the effort once, and everyone using your project will benefit from the smaller repository. Include Some Progress Messages ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Every once in a while have your frontend emit a `progress` message to fast-import. The contents of the messages are entirely free-form, so one suggestion would be to output

This section provides various tips and tricks for optimizing and improving the Git fast-import process, including handling renames, using tag fixup branches, importing tags, repacking repositories, and including progress messages to enhance the import experience and improve performance.