Advanced Nix Testing: Troubleshooting, Characterisation, and Integration

``` One can debug the Nix invocation in all the usual ways. For example, enter `run` to start the Nix invocation. ### Troubleshooting Sometimes running tests in the development shell may leave artefacts in the local repository. To remove any traces of that: ```console git clean -x --force tests ``` ### Characterisation testing { #characterisation-testing-functional } Occasionally, Nix utilizes a technique called [Characterisation Testing](https://en.wikipedia.org/wiki/Characterization_test) as part of the functional tests. This technique is to include the exact output/behavior of a former version of Nix in a test in order to check that Nix continues to produce the same behavior going forward. For example, this technique is used for the language tests, to check both the printed final value if evaluation was successful, and any errors and warnings encountered. It is frequently useful to regenerate the expected output. To do that, rerun the failed test(s) with `_NIX_TEST_ACCEPT=1`. For example: ```bash _NIX_TEST_ACCEPT=1 meson test lang ``` This convention is shared with the [characterisation unit tests](#characterisation-testing-unit) too. An interesting situation to document is the case when these tests are "overfitted". The language tests are, again, an example of this. The expected successful output of evaluation is supposed to be highly stable – we do not intend to make breaking changes to (the stable parts of) the Nix language. However, the errors and warnings during evaluation (successful or not) are not stable in this way. We are free to change how they are displayed at any time. It may be surprising that we would test non-normative behavior like diagnostic outputs. Diagnostic outputs are indeed not a stable interface, but they still are important to users. By recording the expected output, the test suite guards against accidental changes, and ensure the *result* (not just the code that implements it) of the diagnostic code paths are under code review. Regressions are caught, and improvements always show up in code review. To ensure that characterisation testing doesn't make it harder to intentionally change these interfaces, there always must be an easy way to regenerate the expected output, as we do with `_NIX_TEST_ACCEPT=1`. ### Running functional tests on NixOS We run the functional tests not just in the build, but also in VM tests. This helps us ensure that Nix works correctly on NixOS, and environments that have similar characteristics that are hard to reproduce in a build environment. These can be run with: ```shell nix build .#hydraJobs.tests.functional_user ``` Generally, this build is sufficient, but in nightly or CI we also test the attributes `functional_root` and `functional_trusted`, in which the test suite is run with different levels of authorization. ## Integration tests The integration tests are defined in the Nix flake under the `hydraJobs.tests` attribute. These tests include everything that needs to interact with external services or run Nix in a non-trivial distributed setup. Because these tests are expensive and require more than what the standard github-actions setup provides, they only run on the master branch (on <https://hydra.nixos.org/jobset/nix/master>). You can run them manually with `nix build .#hydraJobs.tests.{testName}` or `nix-build -A hydraJobs.tests.{testName}`.

This chunk details continued debugging steps for Nix invocations using GDB's `run` command and offers a troubleshooting tip for cleaning test-related artifacts from the repository with `git clean -x --force tests`. It introduces 'Characterisation Testing,' a technique within functional tests that compares current Nix behavior to a former version, particularly for language test outputs and diagnostic messages. This method guards against accidental changes and allows for easy regeneration of expected outputs using `_NIX_TEST_ACCEPT=1`. The document also covers running functional tests on NixOS for environment compatibility and introduces 'Integration tests,' which are expensive tests interacting with external services or complex distributed setups, primarily run on the master branch via Hydra, and can be executed manually with specific `nix build` commands.