.. _runTax:

==============================================
Session 4: Run mOTUs2, MetaPhlan3, and kraken2
==============================================

In this part of the course, we will use the taxonomic profilers in the `IMP3 <https://imp3.readthedocs.io/en/latest/>`_ pipeline and also `MetaPhlan3 <https://github.com/biobakery/biobakery/wiki/metaphlan3>`_. 

In this session, you can use the IMP3 installation on crunchomics, as well as a metaphlan3 conda environment to profile example data. 


.. admonition:: External readers

   | You can easily install MetaPhlan3 as described `here <https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0#installation>`_.


.. warning:: 

   | If you haven't set up your conda enviroment yet, do the set up as described in the :ref:`first session <testrun>` now.


Data
----

In this session, you can use the small files prepared on crunchomics:

.. code-block:: console

   cd /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/READS
   ls


Or you can use the preprocessed files from the last sessions. You have them in your output, or you can use them from here:

.. code-block:: console

   cd /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/FILTERED_READS
   ls


Running taxonomic profiling within IMP3
---------------------------------------

You could re-run all steps including the taxonomy on the test run. However, since you already have the preprocessed data, all you need to do is to start IMP3 with an appropriate configuration file. In this config file, you set that you don't want to do preprocessing, but that you want to do taxonomic profiling, and, of course, you set the input to already preprocessed files. Here's an example:
There is a config file which you can copy to your ``~/personal`` folder.

.. code-block:: console

   cd ~/personal
   cp /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/04_taxonomy1/tax.config.yaml my.tax.config.yaml


This config will work as is, but you could change the input to your own data. You can also change the kraken database to one of the names in ``/zfs/omics/projects/metatools/DB`` with the word kraken in it (``UHGP_kraken2`` for human gut genomes, ``minikraken2`` for an even smaller database).  


First, you should always perform a dry run to detect potential problems in your configuration file.

.. code-block:: console

   cd ~/personal
   /zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -d my.tax.config.yaml


If the dry-run was successful, you're set to submit the test run to the compute nodes. Remember, you can commit you job to the cluster like so:

.. code:: console

   sinfo -o "%n %e %m %a %c %C"
   /zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -c -r -n TESTTAX -b omics-cn002 my.tax.config.yaml


As always, you can check the status in the output folder and by checking the slurm queue for your user name.

.. code:: console

   squeue -u YourUserID


Once the run is done, you will have a directory ``Analysis/taxonomy`` in your new output directory, which holds the outputs from mOTUs2 and kraken2.


Taxonomic profiling using MetaPhlan3
------------------------------------

MetaPhlan3 is another profiler that works very well, especially on human samples. It's other advantage is that it has a strain-level module, which we will use in a later session. 

For MetaPhlan3, we have a conda environment that you can acitvate like so:

.. code-block:: console

   conda activate /zfs/omics/projects/metatools/TOOLS/miniconda3/envs/metaphlan-3.0


To get the help, do:

.. code-block:: console

   metaphlan -h


You could run metaphlan on the filtered test reads like this:

.. code-block:: console

   metaphlan /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/FILTERED_READS/mg.r1.trimmed.phiX174_filtered.fq,/zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/FILTERED_READS/mg.r2.trimmed.phiX174_filtered.fq,/zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/FILTERED_READS/mg.se.trimmed.phiX174_filtered.fq --bowtie2out filtered_test.bowtie2.bz2 --input_type fastq -o profile.filtered_test.txt  --bowtie2db /zfs/omics/projects/metatools/DB/metaphlan --unknown_estimation


You can, of course, also use your own filtered reads instead. If you want to run metaphlan on a real data set, put the commands to activate (and deactivate) conda and the metaphlan command into a script that you submit to the cluster. You can then also choose to set the ``--nproc`` argument, e.g. ``--nproc 8`` to parallelize the profiling. 


For the purpose of this practical, you could also run the non-filtered reads:

.. code-block:: console

   metaphlan /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/READS/test.mg.r1.fastq,/zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/READS/test.mg.r2.fastq --bowtie2out unfiltered_test.bowtie2.bz2 --input_type fastq -o profile.unfiltered_test.txt  --bowtie2db /zfs/omics/projects/metatools/DB/metaphlan --unknown_estimation


To merge the profiles from several samples (or in this case, the same sample, but represented by filtered or unfiltered reads), you can use metaphlan's merge script: 

.. code-block:: console

   merge_metaphlan_tables.py profile*filtered_test.txt > merged_abundance.test.tsv


You can compare the results:

.. code-block:: console

   less merged_abundance.test.tsv
   conda deactivate


End of today's lesson. Have a nice day :-)