.. _runBinning:

=================================
Session 9: Run binning in IMP3
=================================

In this part of the course, we will use the binners in the `IMP3 <https://imp3.readthedocs.io/en/latest/>`_ pipeline to reconstruct genomes and refine them. 

In this session, you can use the IMP3 installation on crunchomics and the example data. 


.. warning:: 

   | If you haven't set up your conda enviroment yet, do the set up as described in the :ref:`first session <testrun>` now.


Data
----

In this session, you can use the small files prepared on crunchomics:

.. code-block:: console

   ls /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/ASSEMBLY
   ls /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/ALIGNMENTS/
   ls /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/ANNOTATION


Running binning within IMP3
----------------------------

We start with the already assembled data. The reads have been mapped back to the assemblies to get alignment files. The contigs have also been annotated to get the positions of open reading frames and within those, essential single-copy genes. All you need is an appropriate configuration file. In this config file, you set that you don't want to do preprocessing, assembly, annotation, or taxonomy, but that you want to do binning and some visualisation. You set the input to already preprocessed files and choose an output directory. Here's an example:
There is a config file which you can copy to your ``~/personal`` folder.

.. code-block:: console

   cd ~/personal
   cp /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/09_binning/test_binning.config.yaml .


This config will work as is, but you could change the input to your own data. 

First, you should always perform a dry run to detect potential problems in your configuration file.

.. code-block:: console

   cd ~/personal
   /zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -d test_binning.config.yaml


If the dry-run was successful, you're set to submit the test run to the compute nodes. Remember, you can commit you job to the cluster like so:

.. code:: console

   sinfo -o "%n %e %m %a %c %C"
   /zfs/omics/projects/metatools/TOOLS/IMP3/runIMP3 -c -r -n TESTBinning -b omics-cn002 test_binning.config.yaml


As always, you can check the status in the output folder and by checking the slurm queue for your user name.

.. code:: console

   squeue -u YourUserID


Once the run is done, you will have a directory ``Binning`` in your new output directory, which holds the output of all binners. IMP3 also runs DAS tool to combine and refine the results from the binners. Another step that is run is GRiD, a tool that estimates growth rates of all the bins based on the metagenomic coverage.


End of today's lesson. Next week, we will look at how to find the taxonomy of the reconstructed genomes.