Hi,
I am trying QIIME 2 on one of my wife’s experiments that she had processed with QIIME 1.9 in 2015. She used dual-indexed primers in her experiment as published in https://doi.org/10.1186/2049-2618-2-6 (Fadrosh, D.W., Ma, B., Gajer, P. et al.).
I didn’t try QIIME 2 for the whole preprocessing steps because it was unclear to me whether it’s easy to do. Instead, I used joined paired-ends generated by QIIME 1.9 (that is, a trimmed seqs.fna).
Here’s my setup:
% conda --version
conda 4.8.1
% conda env list
# conda environments:
#
base * /home/svcqiime/miniconda3
qiime2-2019.10 /home/svcqiime/miniconda3/envs/qiime2-2019.10
% uname -a
Linux qiimem 5.4.0-050400-generic #201911242031 SMP Mon Nov 25 01:35:10 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
For the compatibility reasons, I installed a Ubuntu 18.04 LTS on my computer. It’s running on bare metal, and not under a hypervisor.
% cat /etc/issue
Ubuntu 18.04.4 LTS \n \l
The machine has an AMD Ryzen 9 3950x CPU (16 cores, 32 threads, not overclocked), 32GB RAM, 60GB swap, about 400GB free space on the partition that is supposed to process the data, and about 156GB /tmp partition.
Steps I took are from OTU Clustering tutorial:
Import the QIIME 1.9 processed FASTA:
% qiime tools import --input-path input.fasta
–output-path imported.qza --type ‘SampleData[Sequences]’
Dereplicate:
% qiime vsearch dereplicate-sequences
–i-sequences imported.qza
–o-dereplicated-table table.qza
–o-dereplicated-sequences rep-seqs.qza
Import SILVA:
% qiime tools import --input-path SILVA_132_QIIME_release/rep_set/rep_set_16S_only/97/silva_132_97_16S.fna --output-path silva_132_97 --type ‘FeatureData[Sequence]’
% qiime tools import --input-path SILVA_132_QIIME_release/taxonomy/16S_only/97/taxonomy_7_levels.txt --input-format HeaderlessTSVTaxonomyFormat --output-path silva_132_97_ref_taxonomy --type ‘FeatureData[Taxonomy]’
Do open-reference OTU clustering
% qiime vsearch cluster-features-open-reference --i-table table.qza --i-sequences rep-seqs.qza
–i-reference-sequences silva_132_97.qza --p-perc-identity 0.97
–o-clustered-table table-or-97.qza --o-clustered-sequences rep-seqs-or-97.qza
–o-new-reference-sequences new-ref-seqs-or-97.qza
Assigning taxonomy:
% qiime feature-classifier classify-consensus-vsearch --i-query rep-seqs.qza
–i-reference-reads new-ref-seqs-or-97.qza --i-reference-taxonomy silva_132_97_ref_taxonomy.qza
–p-threads 20 --p-maxaccepts all --o-classification consensus-classified.qza
After 5 days and 23 hours, the last plugin filled up /tmp partition with a 156GB temporary file. vsearch didn’t notice that it had actually failed and cannot produce any meaningful output; there is no space to write any output. I ran htop, and saw that vsearch continued execution, occupying 20 CPU threads. After briefly taking a look at its source code, I saw that they don’t handle file system errors, except fopen
s.
https://github.com/torognes/vsearch/search?q=fprintf&unscoped_q=fprintf
fclose calls aren’t checked either. That’s a vsearch bug, and I plan to create an issue on vsearch.
QIIME creates a temporary file name, and passes to vsearch. It might be important to document this behavior, and the destination directory can be specified externally by setting TMPDIR, TEMP or TMP environment variables. I am planning to re-run this experiment by setting TMPDIR environment variable, but I would like to know whether I am taking right steps until taxonomy assignment.
% uptime
23:00:14 up 5 days, 3:59, 6 users, load average: 0.04, 0.36, 4.87
Here’s an excerpt from the file:
% head /tmp/tmp3iq6ls_e
5cc706cbf283249918f00da69c7e32a7a15cf45d LMPG01000001.429.1927 99.8 404 1 0 1 404 1 1472 -1 0
5cc706cbf283249918f00da69c7e32a7a15cf45d FPLS01001796.15.1482 99.5 404 2 0 1 404 1 1468 -1 0
5cc706cbf283249918f00da69c7e32a7a15cf45d GQ385284.1.1439 99.5 404 2 0 1 404 1 1433 -1 0
5cc706cbf283249918f00da69c7e32a7a15cf45d GQ389082.1.1461 99.5 404 2 0 1 404 1 1457 -1 0
5cc706cbf283249918f00da69c7e32a7a15cf45d HG005350.1.1480 99.5 404 2 0 1 404 1 1465 -1 0
5cc706cbf283249918f00da69c7e32a7a15cf45d JN196137.1.1414 99.5 404 2 0 1 404 1 1414 -1 0
5cc706cbf283249918f00da69c7e32a7a15cf45d JN628330.1.1446 99.5 404 2 0 1 404 1 1445 -1 0
5cc706cbf283249918f00da69c7e32a7a15cf45d KM263160.1.1349 99.5 404 2 0 1 404 1 1348 -1 0
5cc706cbf283249918f00da69c7e32a7a15cf45d EU835403.1.1351 99.3 405 2 1 1 404 1 1351 -1 0
5cc706cbf283249918f00da69c7e32a7a15cf45d AY328628.1.1479 99.3 404 3 0 1 404 1 1467 -1 0
I would be very happy, if you could help me regarding the steps I need to take.
Best regards,
Ismail