Home Explore Blog CI



git

9th chunk of `Documentation/MyFirstObjectWalk.adoc`
5df7832d9ad4dee783c500270455344b899854933d371c390000000100000a55

	if (0) {
		/* Unfiltered: */
		trace_printf(_("Unfiltered object walk.\n"));
	} else {
		trace_printf(
			_("Filtered object walk with filterspec 'tree:1'.\n"));

		parse_list_objects_filter(&rev->filter, "tree:1");
	}
	traverse_commit_list(rev, walken_show_commit,
			     walken_show_object, NULL);
----

The `rev->filter` member is usually built directly from a command
line argument, so the module provides an easy way to build one from a string.
Even though we aren't taking user input right now, we can still build one with
a hardcoded string using `parse_list_objects_filter()`.

With the filter spec "tree:1", we are expecting to see _only_ the root tree for
each commit; therefore, the tree object count should be less than or equal to
the number of commits. (For an example of why that's true: `git commit --revert`
points to the same tree object as its grandparent.)

=== Counting Omitted Objects

We also have the capability to enumerate all objects which were omitted by a
filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this,
change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is
able to populate an `omitted` list.  Asking for this list of filtered objects
may cause performance degradations, however, because in this case, despite
filtering objects, the possibly much larger set of all reachable objects must
be processed in order to populate that list.

First, add the `struct oidset` and related items we will use to iterate it:

----
#include "oidset.h"

...

static void walken_object_walk(
	...

	struct oidset omitted;
	struct oidset_iter oit;
	struct object_id *oid = NULL;
	int omitted_count = 0;
	oidset_init(&omitted, 0);

	...
----

Replace the call to `traverse_commit_list()` with
`traverse_commit_list_filtered()` and pass a pointer to the `omitted` oidset
defined and initialized above:

----
	...

		traverse_commit_list_filtered(rev,
			walken_show_commit, walken_show_object, NULL, &omitted);

	...
----

Then, after your traversal, the `oidset` traversal is pretty straightforward.
Count all the objects within and modify the print statement:

----
	/* Count the omitted objects. */
	oidset_iter_init(&omitted, &oit);

	while ((oid = oidset_iter_next(&oit)))
		omitted_count++;

	printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
		commit_count, blob_count, tag_count, tree_count, omitted_count);
----

By running your walk with and without the filter, you should find that the total
object count in each case is identical. You can also time each invocation of
the `walken` subcommand, with and without `omitted` being passed in, to confirm
to

Title: Counting Omitted Objects in the Object Walk
Summary
This section explains how to modify the object walk to count and enumerate objects omitted by a filter, using `traverse_commit_list_filtered()` and an `oidset` to store the omitted objects, and how to measure the performance impact of this modification by running the walk with and without the filter.