Low sequence retain after DADA2 step

Hi! i know this has been asked lots of times, but i'm getting very low sequences after DADA2.

This seqs correspond to V3-V4 region (expected amplicon size ~460)
I tried triming F- 280 and R- 220 and without trim, but i lost A LOT in both, any advice? or help?

fulldenoising-stats.qzv (1.2 MB) Denoising stats for NO trimmed.

shortdenoising-stats.qzv (1.2 MB) Denoising stats for trimmed

palmsequences.qzv (290.7 KB) sequences it self.

Hi @Francisco,

From a trimming perspective, I think the dada2 supports the shorter reads. (You’re merging like 75%, wow!) You’re losing most of your reads after the merge, in the chimera detection step. So, are you working with a COI or something where standard chimera detection won’t work? Do you just have a lot of singletons because you have few samples and a lots of depth and therefore rare things are identified as chimeric based on the distribution? (This is a guess, I work less with DADA2 and Chimera removal). You could try doing DADA2 without chimera removal and then running through a secondary program to handle them.

Best,
Justine

Hi!

Dont know what a COI Is, but im working with 12 samples divided in 4 groups of 3; i took the samples from coconut palms Trunks (maybe thats the reason from lots of singletons? )

Witch secondary program would u recommend?

Hi @Francisco,

Okay, I came back and took a closer look at the quality plots and your read lengths are a bit weird to me. What kind of sequencing did you do? How did you handle things like primer trimming and PhiX removal?

Best,
Justine

Hi @jwdebelius

As you say, read lenghts are weird.

I sequenced with Zymo Biomics (with the Quick-16S Primer Set V3-V4) and when they send the data, they give you a report wich contains tables generated with Qiime2, also they give the sequences without primers or barcodes.

Why i dont use the report? well, they dont share the primers used, the report needs filtering but no artifacts are provided.

About sequencing, the report states this: “The final library was sequenced on Illumina® MiSeq™ with a v3 reagent kit (600 cycles). The sequencing was performed with 10% PhiX spike-in.”

Hi @Francisco,

Based on a 2x300 kit, your read lengths are too long, so it seems unlikely to me that they’ve removed primers and barcodes based on what Im seeing. Im not sure if you could do PhiX removal manually… perhaps @Mehrbod_Estaki or @Nicholas_Bokulich can help more with that?

Best,
Justine

DADA2 and Deblur both have built-in PhiX removal by default so I doubt that is an issue here. But as @jwdebelius mentioned the issue is coming at the chimera removal steps (once you include a reasonable truncating parameter), so something about your reads are acting funky. I don’t know much about Zymbo’s V3V4 primers, or their processing protocols but usually when I see uneven F vs R reads it is usually a good indicator that something else is in those reads that is not biological, which is the fastest way to throw off DADA2 and chimera detection. I would consult with Zymo again and show them the results of your DADA2 run. If they can’t give you the primer sequences (which I’m almost sure they can’t) then at least they can give you the reads with all non-biological and primers reads removed. A last ditch effort could also be to simply trim something like 40 bp from the 5’ of your reads reads during DADA2 with trim-left and see how that affects chimera removal. If the results are good you can start dialing back that 40bp to see where the optimal position is where you retain most reads without losing much resolution.

1 Like

Hi! @Mehrbod_Estaki

I wrote to zymo asking about the DADA2 pipeline they used.
Also i dont think i could get the primers (but asked for them). so i hope to replicate their DADA2 pipeline or at least get an insight in to whats appening.

Also i will try cutting from the 5’ and see whats happens.

I’ll give an update soon.

-Francisco

1 Like

Hi
Got in touch with a senior research associate and this was the answer:

“Unfortunately, the parameters of our bioinformatics pipeline, including those used in DADA2, are proprietary. Our service is unable to offer assistance for when you choose to analyze the raw data files yourself.”

EDIT: Apparently the raws seqs are demultiplexed but they do contain primers / barcodes, they provided me the trimming lengths so ill try with that.

Thanks for the help!!!

1 Like

Hi @Francisco,
That is not surprising at all, these companies do need to protect their propriety products (whatever that means) to stay profitable, and is by no means exclusive to just Zymo. Just part of the business model. What I was suggesting before was that if they can’t give you that information perhaps they can help you by clarifying what is in those reads that they are different lengths and how they recommend you remove all non-biological sequences. They may prefer doing it themselves or just give you the lengths so you can trim them off or something.

EDIT: Aha, sorry you must have updated your post while I was reading. That’s good to hear that you got some more info, hope that works out. Keep us posted!

3 Likes

After processing with the trimming lengths proportioned for the primers / barcodes, I went from ~10% seqs passing chimeric removal to ~70% (a huge difference).

Thanks for the help! it wouldn’t pass my mind the seqs actually had those primers!.

3 Likes

Glad it worked out @Francisco! Not removing primers/linkers etc. is much more common than you think, one of the downsides of not having standardized protocols. I guess it just needs to be emphasized more across board.

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.