Thank you for your time,
I’d like to preface that I am a novice user of Qiime2 and other bioinformatics techniques; my ideas likely reflect this. However, I genuinely enjoy increasing the speed of computational tasks and would be thrilled to help improve the speed of Qiime2.
It strikes me that the
qiime feature-classifier extract-reads command only uses 1 core of the CPU and thus takes a very long time to execute. Can we increase the number of cores to speed up the process? I think this would be an excellent contribution to Qiime especially if training classifiers improves accuracy of taxonomic assignment (i.e., many researchers may want to apply custom trained classifiers). I know read/write speeds can be a bottleneck on some operations but they don’t seem to be here. Additionally, read/write speeds are increasing with advancements in SSD tech such as NVME.
My idea is as follows:
- Take the reference sequence file and divide it into multiple smaller parts. The number of divisions would be based on the number of desired cores. The exact locations of the division split would be the end of the reverse primer. For a 4 core processor, this would result in 4 files with approximately equal size (e.g., ref-seq-1.fasta, ref-seq-2.fasta, ref-seq-3.fasta, and ref-seq-4.fasta).
- Analyze each of the files on separate cores by piping the files into GNU ‘parallel’ executing multiple
qiime feature-classifier extract-readscommands. The output would consist of 4 files whose names mimic the input names (e.g., trimmed-ref-seq-1.fasta, trimmed-ref-seq-2.fasta, trimmed-ref-seq-3.fasta, and trimmed-ref-seq-4.fasta.)
- Concatenate the files together keeping the original order. This could employ the ‘cat’ command (e.g., ‘cat trimmed-ref-seq-1.fasta trimmed-ref-seq-2.fasta trimmed-ref-seq-3.fasta trimmed-ref-seq-4.fasta’).
- Import the final file back into qiime.
If this works it should reduce the processing time quite dramatically (i.e., 4 times less for 4 cores, 20 times less for 20 cores, etc.). I would like to create this functionality. Am I missing something dramatic?
Thank you again for your time and consideration. I really appreciate the Qiime resources. Without the tutorials and forum answers, I would have had a very hard time characterizing my microbial samples.