How to add taxonomy to feature-table.qza from open-reference-otu-picking

Nicholas_Bokulich · March 22, 2018, 3:32pm

Sure, you could set a standard trimming length to see what effect that has, but I am not sure it will have much effect. Instead you could set a lower Q score filter, or a longer trim length (based on quality score profiles) to try to eke out a slightly longer read length with more information.

117 really is not a very long read length, so it is not a surprise if you do not get very deep classification, particularly for 18S (which is a relatively low-information marker gene). Trying to increase this read length would be most effective, so long as it does not lead to lower-quality reads that fail to pass denoising.

I misunderstood: I was under the impression that you were asking about the effect of read orientation on the naive-bayes-sklearn classifier (where I believe read orientation in the query could impact outcome). This is not an issue with classify-consensus-vsearch, as you indicated.

I do not believe that read orientation is an issue during denoising but perhaps @wasade can help us here. One issue I see is that reads that might effectively belong to the same organism are going to be dereplicated as two separate sequence variants, inflating alpha diversity and creating redundancy during taxonomy classification (though if reverse/forward reads do not overlap then there is not much you can do to correct this even by reorienting).

wasade · March 22, 2018, 4:11pm

Deblur does not currently attempt to orient the reads, so if the input data are reverse complemented, the output Deblur sequence variants will be as well. Does that make sense?

Best,
Daniel

MMC_northS · March 22, 2018, 4:29pm

Hi @wasade,

so, in that sense, deblur will work but probably it will duplicate all variants that have the same sequence in reverse complement, understanding the seocnd one like other variant.

The organisms will be good because you make assignment in both directions, but you will have inside each organism more variants than really you have because for one sequence forward count like one variant and reverse complement of the same sequence (if you have) will count like other variant right??

Nicholas_Bokulich · March 22, 2018, 4:50pm

I answered this above:

If your reads are 117 nt long, they probably do not overlap 100% (depending on the amplicon you are targeting) so will be separate SVs and receive separate taxonomic assignments even if you reoriented the reverse reads.

BenKaehler · March 27, 2018, 6:14am

Hi @MMC_northS, sorry for the slow response, and sorry that I won't add much: Nick was right in what he said initially. classify-consensus-blast and classify-consensus-vsearch both allow a both option for --p-strand, so you should rely on those if you don't know that all of your reads are from the same strand. classify-sklearn assumes that all of the reads are from the same strand, and attempts to autodetect which one. This can be overridden using the --p-read-orientation option, but that won't help if there is a mix of orientations in your sample.

MMC_northS · March 27, 2018, 7:04am

Hi @BenKaehler, thank you for your answer. OK, so from your answer, I will use for my sequences analyses only BLAST and Vsearch. Untill now with Vsearch I have better results so I will continue with that one.
Thank you again for your time.

system · April 27, 2018, 1:04pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.