Data filtering and normalization¶
The first step in metabarcoding (or any other kind of) analyses is to check your data.
The Text Summary
and Library Size Overview
tabs of the Data Integrity Check page give some important summary statistics.
These can help you decide on how to proceed with the analyses.

Library size overview for the mammalian gut example data in MicrobiomeAnalyst. The library size is the number of reads per sample.¶
Question 4.
Question 5.
Lets >> Proceed
Data Filtering¶
Data filtering aims to remove low quality or uninformative features to improve downstream statistical analysis. Features with very small counts in very few samples should be excluded from analyses because they are likely due to sequencing errors or low-level contaminations.
Disable data filtering and Submit
.

Feature filtering settings to disable filtering in MicrobiomeAnalyst.¶
Question 6.
Systematically change the minimum count
filtering option (disable low variance filtering, we will not focus on that in this course).
Question 7.
Lets disable data filtering and >> Proceed
.
Data Normalization¶
The second important step in data preparation for metabarcoding analyses is data normalization. Normalization removes biases due to for instance unequal sampling depth and thus allows us to directly compare the community composition of different samples.
Lets first disable data normalization, Submit
, >> Proceed
and select Rarefaction Curve
.
This gives a set of rarefaction curves, where each curve represents a sample of the gut microbiome of one of the mammalian species.

Rarefaction curves for the mammalian gut samples in MicrobiomeAnalyst (parameter settings: data source = original, steps = 20, group based on = none).¶
The saturation (leveling off) of the rarefaction curve gives information on how well the species richness estimate based on the sample reflects the true species richness in the environment (here the mammalian gut).
Question 8.
Especially when the rarefaction curves are not saturated, the sampling depth (sequence sample size) can have a major impact on the species richness estimate.
Question 9.
One data normalization method that can remove biases due to unequal sampling depth is to rarefy upto the smallest sampling depth (i.e. the rarefaction depth). For samples with a higher sampling depth, this means taking a random subsample of the reads (without replacement) up to the rarefaction depth.
Using the results of the Library Size Overview
on the Data Integrity Check page and the rarefaction curve, mentally draw a line at the rarefaction depth.
Question 10.
Because rarefying often results in the loss of a lot of data, it may be worth to consider excluding certain samples from the analyses rather than accepting a very low rarefaction depth.
Question 11.
Library Size Overview
on the Data Integrity Check page, decide on the best strategy for sample filtering and normalization.Now implement this strategy on 1) the sample filtering
tab of the Data filtering page and 2) the Data normalization page.
>> Proceed
again to the Rarefaction curve
.
Question 12.