Keeping singletons after using DADA2.

termofilos · July 13, 2020, 8:38am

Hi,

Is there a way to keep singletons while running DADA2 through QIIME2? I noticed that more recent version of DADA2 allow for a "DETECT_SINGLETONS=TRUE" setting that then allowed for the detection of singleton sequences. Is there anyway to enable singleton pass through in QIIME2?

Thanks!

Mehrbod_Estaki · July 16, 2020, 8:10pm

Hi @termofilos,
The detect singletons parameter is not available in q2-dada2, however with the 2020.6 release of DADA2 you can set --p-pooling-method to pseudo and this will be more robust to detecting rare ASVs, the tradeoff being longer run time. But still not singletons...though you may end up with some as a result of merging at a later step.
With Deblur on the other hand you can tell it to include singletons.
That being said though, there's a reason why both of these methods by default avoid singletons. Singletons are tricky business with error-prone sequencing data because it is very hard to determine if they are real or not, with the reality probably being more aligned with them not being real. So the conservative approach of excluding singletons is a good practice in my opinion.

termofilos · July 16, 2020, 8:28pm

Hi @Mehrbod_Estaki ,

Thanks for the prompt response and suggestions! Does this mean that singleton inclusion is possible through Deblur on QIIME2?

The sequences being processed through this method are synthetic long reads using UMIs, so quality wise, we're looking almost error free sequencing. It's been suggested to us to keep the singletons given this benefit, but noticed it to be a bit harder to attain those results through QIIME2. We've most definitely kept the usual DADA2 process as is for the analyzing our amplicon as we also see keeping singletons an issue with that type of data.

EDIT: I'd also like to add that since we're using these synthetic long read type sequences, we don't ever merge sequences per say. It sounds like any output of rarer ASVs may be omitted if it happens at the merging step?

Mehrbod_Estaki · July 16, 2020, 8:41pm

Hi @termofilos,

I'm not familiar with the UMI approach, so I'm not 100% sure if the error profile built into Deblur would work with that, but given that these are still Illumina runs it maybe ok, even though I'm assuming these are metatransciptomic data? How long are these reads? Also, Deblur does require a positive filter, the default is Greengenes (16S) but can be replaced manually by something else. So if you can find a suitable reference database for a positive filter, then it might work. If not, it might be easier for you to either use DADA2 standalone in R, or just use vsearch to do OTU picking if you think your sequences are basically error-free.

@wasade, any suggestions regarding use of Deblur here?

Then you shouldn't see any singletons if there's no merging. DADA2 gets rid of singletons by default as it initially denoises forward and reverse reads separately, however in some cases after it merges them, some singletons can re-appear.

wasade · July 16, 2020, 8:55pm

If run directly, the positive filter can be omitted with Deblur. Singleton inclusion is possible with the --p-min-size parameter IIRC.

Best,
Daniel

termofilos · July 17, 2020, 12:45am

Thanks @wasade and @Mehrbod_Estaki !
I'll look into these options and see what works best. Sounds like using DADA2 as a standalone may be best.

The reads are mostly around 1400 (full length 16S) and are non-metatrans data, just long marker gene sequences from genomic DNA.

One last question:
If I use DADA2 or even just OTU picking using vsearch, is there a way to filter out any reads shorter than say 1000bp? Most of my sequences are above 1000bp, but there are a handful that are significantly shorter and I'd probably like to take those out. Thanks again!

Mehrbod_Estaki · July 17, 2020, 4:59am

Hi @termofilos,
On second thought, since these are long-reads, I don't think Deblur would be a suitable choice here, at least as far as I'm aware it has not been validated/benchmarked with long reads. DADA2 also is currently not suited for UMI long reads according to this recent thread. But there is a paper cited there that might be worth looking into to see what they have done.

From the top of my head I don't think you can do min length with OTU clustering in QIIME 2 but I believe the standalone vsearch does allow this with the --fastq_minlen parameter.
Sorry I don't have any concrete solutions for you! Perhaps others can qiime in if they have more reasonable solutions.

termofilos · July 17, 2020, 5:10am

No worries @Mehrbod_Estaki ! A more recent paper used DADA2 with some modified parameters (some of which included the DETECT_SINGLETONS=TRUE change). I think sticking with DADA2 outside of QIIME2 may be best. Thank you so much for your help, greatly appreciated!

system · August 17, 2020, 11:10am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.