Metaphlan2 vs vsearch for fungal metagenomic shotgun sequences

colinbrislawn · October 19, 2018, 10:56pm

Hello Hayden,

And you are asking all the right questions!

I want to clarify why PCR amplicons are different than genomic reads, and why this difference means they need to be processed differently.

PCR amplicons (16S v4, 18S, ITS, etc) are all from the same region of the same gene.

read1  ------------
read2 --------------
read3 -------------

Shotgun reads (metagenomes, metatranscriptomes) are from ALL regions of ALL genes.

read1        -----------
read2 ------------
read3         ----------------

Because these reads are so very, very different, they are also processed differently.

Amplicons get clustered into OTUs or denoised into ASVs. OTUs or ASVs represents a unique region of a single gene that was targeted for PCR. Each OTU or ASV is ~90-300 bp long.
Shotgun reads get assembled into much longer reads. Each contig holds many genes and is ~1000-50,000 bp long. Totally different, right?

Nope! Vsearch clusters sequences into OTUs, dada2 denoises sequences into ASVs, and programs like metahit and Spades assemble sequences into contigs. Metaphlan2 does not assemble your reads, and it doesn't cluster your reads, and it doesn't denoise your reads.

So what does Metaphlan2 do?

Metaphlan2 "relies on ~1M unique clade-specific marker genes" for

unambiguous taxonomic assignments;
accurate estimation of organismal relative abundance;
species-level resolution for bacteria, archaea, eukaryotes and viruses;
strain identification and tracking
orders of magnitude speedups compared to existing methods.
metagenomic strain-level population genomics

(That's from the metaphlan2 documentation.)

Let me know if that helps. I think Metaphlan2 is the perfect Qiime 2 plugin for your metagenomic reads, so let us know what you find.

Have a good Friday,
Colin