Ion Torrent PGM, 16S rRNA dada2 denoise

Harry · December 12, 2017, 7:11pm

Hi Qiime 2 Team,

I am using dada2 to denoise my 16S rRNA seqs from PGM. My seqs have been pre-checked by mothur to screen out bad quality seqs. My seqs range from 300-400 bps, and I do not want to lose any base pairs.

Could I just de-activate --p-trunc-len function by giving parameter 0?
does dada2 require that all seqs are of the same length as deblur?
If I set --p-trunc-len 350, will sequences shorter than 350 bp be deleted?

Thanks so much

Thanks

colinbrislawn · December 12, 2017, 9:14pm

Hello Harry,

I'm sure the dada2 developer can provide a recommendation for PGM data, but I can answer some of your specific questions.

1. Could I just de-activate --p-trunc-len function by giving parameter 0?
Yes. "If 0 is provided, no truncation or length filtering will be performed."

2. does dada2 require that all seqs are of the same length as deblur?
The dada2 program itself supports variable length (see #55) but I'm not sure how it's configured in Qiime 2.

3. If I set --p-trunc-len 350, will sequences shorter than 350 bp be deleted?
Yes.

Check out this page for more info on the dada2 settings:
https://docs.qiime2.org/2017.11/plugins/available/dada2/denoise-paired/

I hope this helps,
Colin

ebolyen · December 13, 2017, 10:54pm

Hi @Harry!

Thanks for answering @colinbrislawn, here's a couple of the more nitty-gritty specifics:

The official recommendation is to set trim-left to 15 for Ion Torrent data.

Variable length is supported in q2-dada2 as well.

Harry · December 14, 2017, 4:33pm

Thank you so much @ebolyen @colinbrislawn!

The problem is that q2-dada 2 takes so long time, and my computer always crashed after running for about 30 hours...

I saw you also suggested using R to run dada2 and import feature-table and rep-seqs back into qiime 2 to get core-metrics. But the tutorial is for paired-seqs (Illunima), may I know if you minding advising how to deal with Ion Torrent seqs? What should I do when it comes to the steps like mergePairs?

Thank you so much.

BTW, how should I cite the the suggestion: The official recommendation is to set trim-left to 15 for Ion Torrent data.? is there any publication that I can cite?

colinbrislawn · December 14, 2017, 10:51pm

Hello Harry,

It shouldn't be crashing! How much RAM does your computer have? Which reference database are you using?

In this post, setting n-jobs to 1 is suggested for running dada2. How many jobs are you running?

Let us know how we can help,
Colin

Harry · December 15, 2017, 12:03am

I use 2015 Macbook Pro, I think my computer is ok, probably because I opened too many tasks.

The problem is that dada2 takes too long, several days passed, it is still running, and I am thinking run dada2 in R and import output back into qiime2. But the tutorial for dada2 in R is for Illumina, do you have any tutorials for ion torrent.

it looks like that only trimLeft =15 is not enough, always give errors in filterAndTrim, let alone the following steps which aim to pair the sequences, after all, Ion Torrent out-put is single end.

Thank you so much.

colinbrislawn · December 15, 2017, 5:17am

Hello Harry,

You should be able to process your Ion Torrent data just like Illumina data, except you change a few settings and only use the forward read. So while you may pass trimLeft = 15 (or = 10) and skip the pairing, the rest should be the same.

How far did you get in R with your data? I think only the first steps would be different and all downstream is the same, but @benjjneb can confirm this.

Also, take a look at this dada2 paper that includes Ion Torrent data. Does that give you some ideas?

Keep in touch,
Colin

Harry · December 15, 2017, 5:23am

Thank you! @colinbrislawn!!

Yes, I ask some of my colleagues who are expert in R, they found a problem in special symbol in strsplit function. Now it is running, but still in learnErrors, looks will take some time. I will update and post here after.

Appreciate again for your unreservable help and suggestion!

colinbrislawn · December 15, 2017, 5:38am

OK good!

btw you can run leanErrors() on a subset of your full data set, which should still be pretty good and a lot faster.

Basically, combine your reads with linux cat then use vsearch --fastx_subsample to take, say 10% of them. This input will be 10x faster, while still maintaining accuracy.

Let us know how it goes!

Colin

Harry · December 15, 2017, 5:49am

Thank you Colin @colinbrislawn.

em~~I did not understand your suggestion , I am pretty weak in data processing.
My understanding is that:

I should use Linux terminal to run vsearch instead of R?
what does it mean by 10% ? How about the rest 90%, is it possible?
Could the output be imported back into qiime2 to get the core-metrics?

Sorry, I may have asked too muuuuch.

thermokarst · December 15, 2017, 12:30pm

Hi @Harry!!

Ah ha, we have seen something like this before!

MacOS cleans up files that are about 3 days old in the temp dir, and it can lead to all kinds of crazy issues, and seems like a likely culprit to me for what you are talking about here...

One workaround is to set your TMPDIR env var to anywhere else besides the default value (outside of the default value's structure), so something like $ export TMPDIR=/Users/Harry/qiime2-tmp could work for you.

Please give that a shot and let us know!

colinbrislawn · December 15, 2017, 4:46pm

Hello Harry,

Try Matt's suggestion about MacOS first. I bet that will fix your issue.

We can try subsampling later.

Colin

system · January 27, 2018, 9:11pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.