# File set library
This is the internal contributor documentation.
The user documentation is [in the Nixpkgs manual](https://nixos.org/manual/nixpkgs/unstable/#sec-fileset).
## Goals
The main goal of the file set library is to be able to select local files that should be added to the Nix store.
It should have the following properties:
- Easy:
The functions should have obvious semantics, be low in number and be composable.
- Safe:
Throw early and helpful errors when mistakes are detected.
- Lazy:
Only compute values when necessary.
Non-goals are:
- Efficient:
If the abstraction proves itself worthwhile but too slow, it can be still be optimized further.
## Tests
Tests are declared in [`tests.sh`](./tests.sh) and can be run using
```
./tests.sh
```
## Benchmark
A simple benchmark against the HEAD commit can be run using
```
./benchmark.sh HEAD
```
This is intended to be run manually and is not checked by CI.
## Internal representation
The internal representation is versioned in order to allow file sets from different Nixpkgs versions to be composed with each other, see [`internal.nix`](./internal.nix) for the versions and conversions between them.
This section describes only the current representation, but past versions will have to be supported by the code.
### `fileset`
An attribute set with these values:
- `_type` (constant string `"fileset"`):
Tag to indicate this value is a file set.
- `_internalVersion` (constant `3`, the current version):
Version of the representation.
- `_internalIsEmptyWithoutBase` (bool):
Whether this file set is the empty file set without a base path.
If `true`, `_internalBase*` and `_internalTree` are not set.
This is the only way to represent an empty file set without needing a base path.
Such a value can be used as the identity element for `union` and the return value of `unions []` and co.
- `_internalBase` (path):
Any files outside of this path cannot influence the set of files.
This is always a directory and should be as long as possible.
This is used by `lib.fileset.toSource` to check that all files are under the `root` argument
- `_internalBaseRoot` (path):
The filesystem root of `_internalBase`, same as `(lib.path.splitRoot _internalBase).root`.
This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.
- `_internalBaseComponents` (list of strings):
The path components of `_internalBase`, same as `lib.path.subpath.components (lib.path.splitRoot _internalBase).subpath`.
This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.
- `_internalTree` ([filesetTree](#filesettree)):
A tree representation of all included files under `_internalBase`.
- `__noEval` (error):
An error indicating that directly evaluating file sets is not supported.
## `filesetTree`
One of the following:
- `{ <name> = filesetTree; }`:
A directory with a nested `filesetTree` value for directory entries.
Entries not included may either be omitted or set to `null`, as necessary to improve efficiency or laziness.
- `"directory"`:
A directory with all its files included recursively, allowing early cutoff for some operations.
This specific string is chosen to be compatible with `builtins.readDir` for a simpler implementation.
- `"regular"`, `"symlink"`, `"unknown"` or any other non-`"directory"` string:
A nested file with its file type.
These specific strings are chosen to be compatible with `builtins.readDir` for a simpler implementation.
Distinguishing between different file types is not strictly necessary for the functionality this library,
but it does allow nicer printing of file sets.
- `null`:
A file or directory that is excluded from the tree.
It may still exist on the file system.
## API design decisions
This section justifies API design decisions.
### Internal structure
The representation of the file set data type is internal and can be changed over time.