MAFFT crashes after about 1 hour

marty_pl · December 2, 2023, 2:41pm

I am running the latest version of QIIME2 (2023.09) in WSL. My 16S V3V4 data file (after quality check) contains >2M reads from 50 samples. To build the phylogenetic tree I've run the following command set:

qiime phylogeny align-to-tree-mafft-fasttree
--i-sequences dereplicated_seqs.qza
--p-parttree
--o-alignment aligned_dereplicated_seqs.qza
--o-masked-alignment masked_aligned_dereplicated_seqs.qza
--o-tree unrooted-tree.qza
--o-rooted-tree rooted-tree.qza

After about an hour of calculations I've got the following message:

Plugin error from phylogeny:
Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '--parttree', '/tmp/qiime2/marcin/data/60f0e223-4b0f-4539-bec5-1a0b9e7dbfb5/data/dna-sequences.fasta']' returned non-zero exit status 1.
Debug info has been saved to /tmp/qiime2-q2cli-err-net1s7ty.log

I have checked the log but frankly I have no Idea what should I look for.

Do you have any ideas what could have happened?

Marcin

timanix · December 4, 2023, 8:24am

How much of RAM do you have available?
I would try to filter my feature table to get rid of rare sequences (for example, counted less than 10) and sequences that are found in 1-3 samples only. Then I would filter my rep-seqs.qza file providing filtered feature table. It can significantly decrease computational time and load on the machine.

SoilRotifer · December 4, 2023, 3:23pm

Hi @marty_pl ,

I just wanted to add to @timanix's comments. Two-million reads is quite a lot of sequences. Can you provide the details / commands you used for the quality control you've already run? Are these denoised sequences? I assume not, as these appear to be simply dereplicated. I'd suggest denoising your sequences prior to constructing alignments and phylogenies. This will help reduce the number of sequences too. Otherwise you'll likely need a substantial amount of RAM, i.e. anywhere from 24-64 GB.

marty_pl · December 4, 2023, 9:01pm

Hi @timanix and @SoilRotifer , thanks you for your advice.

Just to provide you with some context:

I am a QIIME2 newbie and I am trying to figure out how it would work on my own data. I have a *.fna file that I've got from another lab and according to my best knowledge the sequences were quality checked. I did dereplication just in case and to see how it works (I am still learning). The file contains V3V4 data from 50 samples, this gives around 40 000 reads per sample. Those samples are divided into 2 equal groups. I am trying to compare those groups and look for any differences. I am using a computer with 32GB RAM with 8 processors and running QIIME2 in WLS.

After I've encountered the issue I looked through the forum and I've found a topic where somebody claimed that for files containing >1M reads one needs >300GB of RAM which is completely out of my reach.

Filtering out rare features seems like a good idea to reduce the amount of RAM needed. However, as I am pretty new to QIIME I haven't had a chance yet to learn how feature table filtering works. I would appreciate any hints

Thanks once again

marty_pl

SoilRotifer · December 4, 2023, 9:12pm

This is very vague. I'd recommend that you obtain the raw FASTQ files from your colleagues. Then import the raw data into QIIME 2. Once there you can perform all of the quality control and analysis. Also, you'll need the raw FASTQ files if you would like to denoise your data. Otherwise you'll be limited to OTU clustering.

I suggest you work through the tutorials, and the Cancer Microbiome Intervention Tutorial. For the latter, you'll occasionally have to select the Command Line (q2cli) option, to view the actual "command-line" commands.

system · January 5, 2024, 3:12am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.