Troubleshooting, Characterization Testing, and Running Tests on NixOS

``` One can debug the Nix invocation in all the usual ways. For example, enter `run` to start the Nix invocation. ### Troubleshooting Sometimes running tests in the development shell may leave artefacts in the local repository. To remove any traces of that: ```console git clean -x --force tests ``` ### Characterisation testing { #characterisation-testing-functional } Occasionally, Nix utilizes a technique called [Characterisation Testing](https://en.wikipedia.org/wiki/Characterization_test) as part of the functional tests. This technique is to include the exact output/behavior of a former version of Nix in a test in order to check that Nix continues to produce the same behavior going forward. For example, this technique is used for the language tests, to check both the printed final value if evaluation was successful, and any errors and warnings encountered. It is frequently useful to regenerate the expected output. To do that, rerun the failed test(s) with `_NIX_TEST_ACCEPT=1`. For example: ```bash _NIX_TEST_ACCEPT=1 meson test lang ``` This convention is shared with the [characterisation unit tests](#characterisation-testing-unit) too. An interesting situation to document is the case when these tests are "overfitted". The language tests are, again, an example of this. The expected successful output of evaluation is supposed to be highly stable – we do not intend to make breaking changes to (the stable parts of) the Nix language. However, the errors and warnings during evaluation (successful or not) are not stable in this way. We are free to change how they are displayed at any time. It may be surprising that we would test non-normative behavior like diagnostic outputs. Diagnostic outputs are indeed not a stable interface, but they still are important to users. By recording the expected output, the test suite guards against accidental changes, and ensure the *result* (not just the code that implements it) of the diagnostic code paths are under code review. Regressions are caught, and improvements always show up in code review. To ensure that characterisation testing doesn't make it harder to intentionally change these interfaces, there always must be an easy way to regenerate the expected output, as we do with `_NIX_TEST_ACCEPT=1`. ### Running functional tests on NixOS We run the functional tests not just in the build, but also in VM tests. This helps us ensure that Nix works correctly on NixOS, and environments that have similar characteristics that are hard to reproduce in a build environment. These can be run with: ```shell nix build .#hydraJobs.tests.functional_user ``` Generally, this build is sufficient, but in nightly or CI we also test the attributes `functional_root` and `functional_trusted`, in which the test suite is run with different levels of authorization. ## Integration tests The integration tests are defined in the Nix flake under the `hydraJobs.tests` attribute. These tests include everything that needs to interact with external services or run Nix in a non-trivial distributed setup. Because these tests are expensive and require more than what the standard github-actions setup provides, they only run on the master branch (on <https://hydra.nixos.org/jobset/nix/master>). You can run them manually with `nix build .#hydraJobs.tests.{testName}` or `nix-build -A hydraJobs.tests.{testName}`.

This section covers troubleshooting test artifacts with `git clean`, details characterization testing using `_NIX_TEST_ACCEPT=1` to regenerate expected outputs for both functional and unit tests. It addresses the potential for "overfitting" in characterization tests, particularly with diagnostic outputs, and explains how this guards against unintended changes while allowing intentional interface updates. The section also describes running functional tests on NixOS using `nix build .#hydraJobs.tests.functional_user` with variations for different authorization levels. Finally, it introduces integration tests defined in the Nix flake, which run on the master branch due to their complexity and external service dependencies, and can be triggered manually via `nix build .#hydraJobs.tests.{testName}`.