.. _runBWA: ================================================= Session 3: Run BWA, inspect and filter the output ================================================= In this part of the course, we will not use the `IMP3 `_ pipeline. You have done some mapping already in the :ref:`second session `. Today, we will look at some of the commands IMP3 uses on their own. In this session, you can use the environments IMP3 provides on crunchomics to preprocess data with BWA-MEM. .. admonition:: External users | If you've got a working install of BWA and samtools, you can use those (We're using samtools version 1.9). .. warning:: | If you haven't set up your conda enviroment yet, do the set up as described in the :ref:`first session ` now. Data ---- For this session, there are a few small files prepared on crunchomics: .. code-block:: console cd /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/READS ls .. code-block:: console less COSMIC_vag.test.r1.fq In addition, you can use reference files from the IMP3 databases or the test set: .. code-block:: console ls /zfs/omics/projects/metatools/DB/filtering/ ls /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/REFERENCES Running BWA-MEM indexing ------------------------ Normally, you first need to build an index of you references. You can do this like so: .. code-block:: console cd ~/personal cp /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/REFERENCES/mg.assembly.merged.fa . conda activate /zfs/omics/projects/metatools/TOOLS/IMP3/conda/76e02d85877600c41ac84cb7bc27a189 bwa index mg.assembly.merged.fa .. admonition:: RESOURCES! | Don't do this on the frontend, if you are using a real genome (it's okay for the test data). .. warning:: | You need to copy the reference file to your personal space or scratch, because you can't write into the examples folder on metatools. You could also use the indices in the example data folder. Running mapping with BWA-MEM ---------------------------- In the next step, you run the actual mapping. .. code-block:: console bwa mem -v 1 -t 1 mg.assembly.merged.fa /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/READS/test.mg.1.fastq /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/READS/test.mg.2.fastq >> test_alignment.sam You can find the help for bwa mem by typing ``bwa mem``. You can run other data, for example, the ``COSMIC_vag.test.r1|2.fq`` with the ``hg38.fa``. Take a look at the alignment file --------------------------------- The output of the BWA-MEM step is an alignment file in .sam format. There is a long header, but after that (for example at the end of the file), you see the alignment information as discussed in the lecture. .. code-block:: console tail test_alignment.sam Converting the alignment file to binary (compressed) format ----------------------------------------------------------- Usually, we will not want to store the big .sam file on our disk. Instead, we convert it to .bam format. .. code-block:: console samtools view --threads 1 -bS test_alignment.sam > test_alignment.bam You could also do this in one step with the mapping: .. code-block:: console bwa mem -v 1 -t 1 /zfs/omics/projects/metatools/DB/filtering/hg38.fa /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/READS/COSMIC_vag.test.r1.fq /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/READS/COSMIC_vag.test.r2.fq |\ samtools view --threads 1 -bS - > COSMIC_alignment.bam You can always look at a .bam file with ``samtools view``: .. code-block:: console samtools view COSMIC_alignment.bam | less Filtering the .bam file ----------------------- You can use ``samtools view`` also to filter the reads. For example, you can use ``-f 4`` to find only reads which don't map your reference. .. code-block:: console samtools view -f 4 COSMIC_alignment.bam | less Likewise, ``-F 4`` only gives you reads which map. But be careful, there's also a second read. Consult `explain flags `_ to find other flags you may want to set. `This page `_ contains the manual of samtools. More complex filtering ---------------------- In IMP3, we use this command to remove reads mapping to a host genome: .. code-block:: console samtools merge --threads 1 -u - \ <(samtools view --threads 1 -u -f 4 -F 264 COSMIC_alignment.bam) \ <(samtools view --threads 1 -u -f 8 -F 260 COSMIC_alignment.bam) \ <(samtools view --threads 1 -u -f 12 -F 256 COSMIC_alignment.bam) | \ samtools view --threads 1 -bF 0x800 - | \ samtools sort --threads 1 -m 3G -n - | \ bamToFastq -i stdin -fq COSMIC_filtered.r1.fq -fq2 COSMIC_filtered.r2.fq conda deactivate Can you figure out what happens here?