constructing a consensus sequence from merged reads after denoising

nietof · January 20, 2024, 5:01pm

This is a very basic question about size of the gene of interest compared to size of the merged reads. Please forgive my ignorance. My target gene/amplicon is CO1 and about 708 bps in size based on the primers used to build the libraries. Is there a qiime2 plugin that builds a consensus sequence to the full size of the amplicon or does it work out the taxonomic assignment based on the merged reads?
Thank you
Fernando

gregcaporaso · January 22, 2024, 5:39pm

Hi @nietof,
Generally, following a workflow like that outlined in the Moving Pictures tutorial, taxonomy assignment would be based on the merged reads. One thing to check though is whether your reads are merging (you can see this in the SampleData[DADA2Stats] that is generated as output from qiime dada2 denoise-paired) as your target is long relative to typical Illumina read lengths.

Alternatively, you could try a workflow like closed-reference OTU clustering, if you have a reference dataset of COI that you want to map your reads onto. The downside with that, however, is that any reads that don't map onto your reference will be discarded.

Hope this helps!

nietof · February 4, 2024, 3:30pm

Greg
Thank you!
I ran it using feature-classifier sklearn and I think the assignments are correct but there were lots of features that classified only at the order level. I built a classifier using ReSCript from NCBI sequences for the order lepidoptera.
The stats DADA2 says I got between 46% and 77% non-chimeric sequences merged. The size range of the merged sequences is between 126 and 213 with 157 average. The reads where originally 150 bps long. I got 3,582 features.
I will try to the closed reference OTU clustering.
Thank you
Fernando