I am using dada2 to denoise my 16S rRNA seqs from PGM. My seqs have been pre-checked by mothur to screen out bad quality seqs. My seqs range from 300-400 bps, and I do not want to lose any base pairs.
Could I just de-activate --p-trunc-len function by giving parameter 0?
does dada2 require that all seqs are of the same length as deblur?
If I set --p-trunc-len 350, will sequences shorter than 350 bp be deleted?
I'm sure the dada2 developer can provide a recommendation for PGM data, but I can answer some of your specific questions.
1. Could I just de-activate --p-trunc-len function by giving parameter 0?
Yes. "If 0 is provided, no truncation or length filtering will be performed."
2. does dada2 require that all seqs are of the same length as deblur?
The dada2 program itself supports variable length (see #55) but I'm not sure how it's configured in Qiime 2.
3. If I set --p-trunc-len 350, will sequences shorter than 350 bp be deleted?
Yes.
The problem is that q2-dada 2 takes so long time, and my computer always crashed after running for about 30 hours...
I saw you also suggested using R to run dada2 and import feature-table and rep-seqs back into qiime 2 to get core-metrics. But the tutorial is for paired-seqs (Illunima), may I know if you minding advising how to deal with Ion Torrent seqs? What should I do when it comes to the steps like mergePairs?
Thank you so much.
BTW, how should I cite the the suggestion: The official recommendation is to set trim-left to 15 for Ion Torrent data.? is there any publication that I can cite?
I use 2015 Macbook Pro, I think my computer is ok, probably because I opened too many tasks.
The problem is that dada2 takes too long, several days passed, it is still running, and I am thinking run dada2 in R and import output back into qiime2. But the tutorial for dada2 in R is for Illumina, do you have any tutorials for ion torrent.
it looks like that only trimLeft =15 is not enough, always give errors in filterAndTrim, let alone the following steps which aim to pair the sequences, after all, Ion Torrent out-put is single end.
You should be able to process your Ion Torrent data just like Illumina data, except you change a few settings and only use the forward read. So while you may pass trimLeft = 15 (or = 10) and skip the pairing, the rest should be the same.
How far did you get in R with your data? I think only the first steps would be different and all downstream is the same, but @benjjneb can confirm this.
Also, take a look at this dada2 paper that includes Ion Torrent data. Does that give you some ideas?
Yes, I ask some of my colleagues who are expert in R, they found a problem in special symbol in strsplit function. Now it is running, but still in learnErrors, looks will take some time. I will update and post here after.
Appreciate again for your unreservable help and suggestion!
btw you can run leanErrors() on a subset of your full data set, which should still be pretty good and a lot faster.
Basically, combine your reads with linux cat then use vsearch --fastx_subsample to take, say 10% of them. This input will be 10x faster, while still maintaining accuracy.
MacOS cleans up files that are about 3 days old in the temp dir, and it can lead to all kinds of crazy issues, and seems like a likely culprit to me for what you are talking about here...
One workaround is to set your TMPDIR env var to anywhere else besides the default value (outside of the default value's structure), so something like $ export TMPDIR=/Users/Harry/qiime2-tmp could work for you.