Session 1: Test run
During most of the course, we will use the IMP3 pipeline to practice working with metagenomics data. IMP3 is implemented in Snakemake. It is user-friendly and can be easily configured to perform single-omic metagenomic or metatranscriptomic analyses, as well as single modules, e.g. only assembly or only binning.
IMP3 has two inputs:
data
a configuration file
In this session, we do a first test-run of IMP3 on crunchomics.
External users
The first setup
Note - only do this once
If you’ve not used conda on crunchomics yet, you can initialize like so:
/zfs/omics/projects/metatools/TOOLS/miniconda3/condabin/conda init bash
You need to log out of crunchomics and log in again for this to have an effect.
Then run
conda config --set auto_activate_base false
Warning
Data for the first test run
IMP3 comes with a small test data set. This sits on crunchomics. All you need to do is to start IMP3 with an appropriate configuration file.
Configuration of the first test run
There is a config file for the first run which you can copy to your ~/personal
folder.
cp /zfs/omics/projects/metatools/TOOLS/IMP3/config/config.imp_init_test.yaml my.config.yaml
This config will work as is.
You could change the configuration according to your wishes. See here for more information.
Dry-run
First, you should always perform a dry run. This will detect problems in your configuration file.
Dry run
You can start the dry run like so:
cd ~/personal
/zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -d my.config.yaml
The console will state “Dryrun.
” and take some time.
The following output should be quite long, ending with
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
Warning
Submitting the test run
If the dry-run was successful, you’re set to submit the test run to the compute nodes.
Compute nodes?
First, check which compute nodes have free slots:
sinfo -o "%n %e %m %a %c %C"
This will return a table with the numbers of allocated, idle, other and total CPUs CPUS(A/I/O/T)
.
E.g. omics-cn001 468565 514900 up 64 62/2/0/64
means that omics-cn001 has 62 CPUs that are currently allocated to other jobs, 2 are idle, there are no others, and there’s a total of 64 CPUs. You’re probably better off choosing another node.
Let’s say you choose omics-cn002
.
You can commit you job to the cluster like so:
/zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -c -r -n TESTRUN -b omics-cn002 my.config.yaml
Now, you just have to wait. You can check the status in the output folder and by checking the slurm queue for your user name.
squeue -u YourUserID
GOOD LUCK!