I am running the latest version of QIIME2 (2023.09) in WSL. My 16S V3V4 data file (after quality check) contains >2M reads from 50 samples. To build the phylogenetic tree I've run the following command set:
After about an hour of calculations I've got the following message:
Plugin error from phylogeny:
Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '--parttree', '/tmp/qiime2/marcin/data/60f0e223-4b0f-4539-bec5-1a0b9e7dbfb5/data/dna-sequences.fasta']' returned non-zero exit status 1.
Debug info has been saved to /tmp/qiime2-q2cli-err-net1s7ty.log
I have checked the log but frankly I have no Idea what should I look for.
How much of RAM do you have available?
I would try to filter my feature table to get rid of rare sequences (for example, counted less than 10) and sequences that are found in 1-3 samples only. Then I would filter my rep-seqs.qza file providing filtered feature table. It can significantly decrease computational time and load on the machine.
I just wanted to add to @timanix's comments. Two-million reads is quite a lot of sequences. Can you provide the details / commands you used for the quality control you've already run? Are these denoised sequences? I assume not, as these appear to be simply dereplicated. I'd suggest denoising your sequences prior to constructing alignments and phylogenies. This will help reduce the number of sequences too. Otherwise you'll likely need a substantial amount of RAM, i.e. anywhere from 24-64 GB.
I am a QIIME2 newbie and I am trying to figure out how it would work on my own data. I have a *.fna file that I've got from another lab and according to my best knowledge the sequences were quality checked. I did dereplication just in case and to see how it works (I am still learning). The file contains V3V4 data from 50 samples, this gives around 40 000 reads per sample. Those samples are divided into 2 equal groups. I am trying to compare those groups and look for any differences. I am using a computer with 32GB RAM with 8 processors and running QIIME2 in WLS.
After I've encountered the issue I looked through the forum and I've found a topic where somebody claimed that for files containing >1M reads one needs >300GB of RAM which is completely out of my reach.
Filtering out rare features seems like a good idea to reduce the amount of RAM needed. However, as I am pretty new to QIIME I haven't had a chance yet to learn how feature table filtering works. I would appreciate any hints
This is very vague. I'd recommend that you obtain the raw FASTQ files from your colleagues. Then import the raw data into QIIME 2. Once there you can perform all of the quality control and analysis. Also, you'll need the raw FASTQ files if you would like to denoise your data. Otherwise you'll be limited to OTU clustering.
I suggest you work through the tutorials, and the Cancer Microbiome Intervention Tutorial. For the latter, you'll occasionally have to select the Command Line (q2cli) option, to view the actual "command-line" commands.