Errors in the pipeline of DADA2, Fearture-table merge-seqs, and phylogeny align-to-tree-mafft-fasttree.

Moon · October 28, 2020, 3:07pm

Dear technical staff,

I have four batches of 16S sequencing data sets for one project.

Therefore, I firstly used cutadapt demux-paired and cutadapt trim-paired to trim the seqs, and then used DADA2 to denoise them. Though the most of seqs were removed, I thought the remained reads would be enough for the following analysis.

Secondly, I used Fearture-table merge to merge the produced table (by DADA2), and used Fearture-table merge-seqs to merge the the produced rep-seqs (by DADA2).

When I run the following commend:

qiime phylogeny align-to-tree-mafft-fasttree --i-sequences merged_rep-seqs.qza --p-n-threads 20 --o-alignment aligned-rep-seqs.qza --o-masked-alignment masked-aligned-rep-seqs.qza --o-tree unrooted-tree.qza --o-rooted-tree rooted-tree.qza

The process was always be killed.

I have searched the forum and a number of answers showed that it might be due to the lack of memory.

I observed the usage of memory, it turned out that it used more than 200GB memory, and still kept increasing. And when the memory was filled, the process was killed.

Is it because my rep-seqs containing too much seqs?
I guess the step of merge-seqs made the file containing too much seqs???
However, there is no parameter in the merge-seqs to reduce the number of the seqs.

How can I fix this error?
I just want to analysis these data sets together, like what we have always done in QIIME1.

Many thanks to you!

Moon · October 28, 2020, 4:02pm

My version of qiime2 is the newest.
2020.11.0.dev0

ChrisKeefe · October 28, 2020, 4:53pm

Thanks for putting together such a clear and detailed question, @Moon!

Good sleuthing ! This is, indeed, an out of memory error. Here's an experiment you can try. Sometimes when parallelizing processes, each thread contributes a significant amount of memory usage.

If you're running out of memory, try decreasing the number of threads to 4. If it still gives you OOM errors, you can try dropping the number of threads even farther. Overall, this will likely mean a longer run time, but that's probably OK if it runs!

If this doesn't work for you, you could experiment with the --p-parttree option. This algo estimates rather than fully calculating the tree, and is designed for use with large data sets. I have no idea whether this is actually going to help with memory usage, but it may be worth trying.

Alternately, you could try looking at another alignment tool. I've had a good experience with fragment-insertion, but only you know whether that approach is right for your study.

Good luck!
Chris

ChrisKeefe · October 29, 2020, 4:23pm

2 posts were split to a new topic: Cutadapt failing to remove all primers

ChrisKeefe · October 29, 2020, 4:24pm

A post was split to a new topic: DADA2 rep seqs have strange names. Why?

ChrisKeefe · October 29, 2020, 4:28pm

@Moon, your cutadapt and DADA2 questions were unrelated to your alignment question, and have been moved to separate topics. Please give unrelated questions their own topics - it helps keep things tidy.

Let us know how your experiments with parttree and fragment-insertion go.
Good luck!
Chris

Moon · November 2, 2020, 7:30am

Dear ChrisKeefe, you are right! Memory is not enough to deal with so much rep-seqs.
And the large amount of rep-seqs is owing to the error in cutadapt step. I did not strip the primer completely.

ChrisKeefe · November 2, 2020, 4:40pm

Sounds like you were right too! Glad you got this all worked out.

system · December 3, 2020, 10:40pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.