.. _testrun: =================== Session 1: Test run =================== During most of the course, we will use the `IMP3 `_ pipeline to practice working with metagenomics data. IMP3 is implemented in `Snakemake `_. It is user-friendly and can be easily configured to perform single-omic metagenomic or metatranscriptomic analyses, as well as single modules, e.g. only assembly or only binning. IMP3 has two inputs: * data * a configuration file In this session, we do a first test-run of IMP3 on crunchomics. .. admonition:: External users | If you're not on crunchomics when following this documentation, have a look at `the IMP3 documentation `_ to learn how to install IMP3. The first setup --------------- .. admonition:: Note - only do this once | The following two steps only need to be done before the first run. If you've not used conda on crunchomics yet, you can initialize like so: .. code-block:: console /zfs/omics/projects/metatools/TOOLS/miniconda3/condabin/conda init bash You need to log out of crunchomics and log in again for this to have an effect. Then run .. code-block:: console conda config --set auto_activate_base false .. warning:: | IMP3, like all snakemake-based workflows that submit jobs to the compute nodes, does not like the conda base environment to be on by default. | So, you should run the last command, also if you had conda set up previously. Data for the first test run --------------------------- IMP3 comes with a small test data set. This sits on crunchomics. All you need to do is to start IMP3 with an appropriate configuration file. Configuration of the first test run ----------------------------------- There is a config file for the first run which you can copy to your ``~/personal`` folder. .. code-block:: console cp /zfs/omics/projects/metatools/TOOLS/IMP3/config/config.imp_init_test.yaml my.config.yaml This config will work as is. You could change the configuration according to your wishes. See `here `_ for more information. Dry-run ------- First, you should always perform a dry run. This will detect problems in your configuration file. .. admonition:: Dry run | "dry running" means that we test a run, without actually doing it. The IMP3 only checks whether your config file is sound and the data is where we expect it to be. It will make an output folder, but this will be mostly empty. You can start the dry run like so: .. code-block:: console cd ~/personal /zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -d my.config.yaml The console will state "``Dryrun.``" and take some time. The following output should be quite long, ending with .. code-block:: This was a dry-run (flag -n). The order of jobs does not reflect the order of execution. .. warning:: | If there is an error message, do not ignore it. Only move to the next step, once the error is resolved. Submitting the test run ----------------------- If the dry-run was successful, you're set to submit the test run to the compute nodes. .. admonition:: Compute nodes? | If you don't understand what this means, check out the `crunchomics documentation `_. First, check which compute nodes have free slots: .. code:: console sinfo -o "%n %e %m %a %c %C" This will return a table with the numbers of allocated, idle, other and total CPUs ``CPUS(A/I/O/T)``. E.g. ``omics-cn001 468565 514900 up 64 62/2/0/64`` means that omics-cn001 has 62 CPUs that are currently allocated to other jobs, 2 are idle, there are no others, and there's a total of 64 CPUs. You're probably better off choosing another node. Let's say you choose ``omics-cn002``. You can commit you job to the cluster like so: .. code:: console /zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -c -r -n TESTRUN -b omics-cn002 my.config.yaml Now, you just have to wait. You can check the status in the output folder and by checking the slurm queue for your user name. .. code:: console squeue -u YourUserID .. admonition:: GOOD LUCK! | we'll check the output next time.