Session 1: Test run

During most of the course, we will use the IMP3 pipeline to practice working with metagenomics data. IMP3 is implemented in Snakemake. It is user-friendly and can be easily configured to perform single-omic metagenomic or metatranscriptomic analyses, as well as single modules, e.g. only assembly or only binning.

IMP3 has two inputs:

  • data

  • a configuration file

In this session, we do a first test-run of IMP3 on crunchomics.

External users

If you’re not on crunchomics when following this documentation, have a look at the IMP3 documentation to learn how to install IMP3.

The first setup

Note - only do this once

The following two steps only need to be done before the first run.

If you’ve not used conda on crunchomics yet, you can initialize like so:

/zfs/omics/projects/metatools/TOOLS/miniconda3/condabin/conda init bash

You need to log out of crunchomics and log in again for this to have an effect.

Then run

conda config --set auto_activate_base false

Warning

IMP3, like all snakemake-based workflows that submit jobs to the compute nodes, does not like the conda base environment to be on by default.
So, you should run the last command, also if you had conda set up previously.

Data for the first test run

IMP3 comes with a small test data set. This sits on crunchomics. All you need to do is to start IMP3 with an appropriate configuration file.

Configuration of the first test run

There is a config file for the first run which you can copy to your ~/personal folder.

cp /zfs/omics/projects/metatools/TOOLS/IMP3/config/config.imp_init_test.yaml my.config.yaml

This config will work as is.

You could change the configuration according to your wishes. See here for more information.

Dry-run

First, you should always perform a dry run. This will detect problems in your configuration file.

Dry run

“dry running” means that we test a run, without actually doing it. The IMP3 only checks whether your config file is sound and the data is where we expect it to be. It will make an output folder, but this will be mostly empty.

You can start the dry run like so:

cd ~/personal
/zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -d my.config.yaml

The console will state “Dryrun.” and take some time. The following output should be quite long, ending with

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Warning

If there is an error message, do not ignore it. Only move to the next step, once the error is resolved.

Submitting the test run

If the dry-run was successful, you’re set to submit the test run to the compute nodes.

Compute nodes?

If you don’t understand what this means, check out the crunchomics documentation.

First, check which compute nodes have free slots:

sinfo -o "%n %e %m %a %c %C"

This will return a table with the numbers of allocated, idle, other and total CPUs CPUS(A/I/O/T). E.g. omics-cn001 468565 514900 up 64 62/2/0/64 means that omics-cn001 has 62 CPUs that are currently allocated to other jobs, 2 are idle, there are no others, and there’s a total of 64 CPUs. You’re probably better off choosing another node.

Let’s say you choose omics-cn002.

You can commit you job to the cluster like so:

/zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -c -r -n TESTRUN -b omics-cn002 my.config.yaml

Now, you just have to wait. You can check the status in the output folder and by checking the slurm queue for your user name.

squeue -u YourUserID

GOOD LUCK!

we’ll check the output next time.