Hi there,
I’ve been following this amazing tutorial to build my own COI database for a gut content metabarcoding project (which is also in fact my first metabarcoding project, so very new to the field). Up until the “Step 3 – Dereplicating”, everything went fine, but then I ran into some trouble.
When trying to run the qiime rescript dereplicate command, I cannot obtain the desired output, and only get a “Killed” final message. After running the command with the --verbose option, I got the following outputs, which didn’t really help me to figure out the source of the problem:
qiime rescript dereplicate --i-sequences bold_ambi_hpoly_length_filtd_seqs.qza --i-taxa bold_rawTaxa.qza --p-mode 'super' --p-derep-prefix --o-dereplicated-sequences bold_derep1_seqs.qza --o-dereplicated-taxa bold_derep1_taxa.qza --p-threads 3 --verbose
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: vsearch --derep_prefix /tmp/qiime2/martindogniez/data/056f2a80-816e-4d12-a1e6-2ad92a9e69d3/data/dna-sequences.fasta --output /tmp/tmpb5l1koss --uc /tmp/tmpznh2_er8 --xsize --threads 5
WARNING: The derep_prefix command does not support multithreading.
Only 1 thread used.
vsearch v2.22.1_linux_x86_64, 15.1GB RAM, 12 cores
GitHub - torognes/vsearch: Versatile open-source tool for microbiome analysis
Reading file /tmp/qiime2/martindogniez/data/056f2a80-816e-4d12-a1e6-2ad92a9e69d3/data/dna-sequences.fasta 100%
6044907994 nt in 9446234 seqs, min 250, max 1600, avg 640
Sorting by length 100%
Dereplicating 100%
Sorting 100%
4032586 unique sequences, avg cluster 2.3, median 1, max 15293
Writing output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
Killed
My guess would be than I’m running into some memory issue, as the initial sequence file is really large (9.446.234 sequences), and I’m running the analysis on my own laptop for now (32BG of RAM). However, I’m planning to switch to the cluster of my university for the classification part, so I wanted to know if this would solve the problem, of if the issue lies somewhere else.
Thanks in advance for any help !