.. _checkAnno: ===================================== Session 7: Take a look at annotations ===================================== In the last part of the course, you have used the analysis module of the `IMP3 `_ pipeline. In this session, you can explore the annotations a bit further. Data ---- In this session, you can use a .gff file prepared on crunchomics: .. code-block:: console cd /zfs/omics/projects/metatools/SANDBOX/Metagenomics101/EXAMPLE_DATA/ANNOTATION ls Quick look at the annotations ----------------------------- Here are a couple of bash tricks you can use to get a quick overview of the annotations. Take a look at the file: .. code-block:: console head annotation_CDS_RNA_hmms.gff tail -n 5 annotation_CDS_RNA_hmms.gff less annotation_CDS_RNA_hmms.gff (You exit `less` with `q`) How many annotated sequences? .. code-block:: console wc -l annotation_CDS_RNA_hmms.gff How many are CDS? .. code-block:: console grep " CDS " -c annotation_CDS_RNA_hmms.gff What fields are the in column 3? .. code:: console cut -f 3 annotation_CDS_RNA_hmms.gff | sort | uniq -c You could have also looked which other fields there are in column 3: .. code:: console grep " CDS " -v annotation_CDS_RNA_hmms.gff | cut -f 3 | sort | uniq -c How many sequences are annotated with a KEGG KO? .. code:: console grep "kegg_ko" -c annotation_CDS_RNA_hmms.gff Which KO is most common? .. code:: console grep "kegg_ko" annotation_CDS_RNA_hmms.gff | sed 's#.*kegg_ko=\([^;]*\).*#\1#' | tr ',' '\n' | sort | uniq -c | sort -n .. admonition:: Do it yourself (easy) | Try to adapt the code above to the Pfam domains. .. admonition:: Do it yourself (not so easy) | Try to find the cog annotations of the genes are annotated with the most common KO. .. admonition:: Do it yourself (interesting) | Try to find the Pfam annotations of the genes are annotated with the most common KO. Check the Pfam DB and KEGG DB for what these annotations mean. Do they correspond to a similar function? End of today's lesson. Next time, we will look at quantifying the functional annotations.