I have created a custom DB, which includes curated 16S from SILVA and RDP Bacteria and Archaea reads and the fungal ITS UNITE reads. I curated them separately and extracted common and unique reads based on family nomenclature and removed those which do not have proper nomenclature. The purpose of doing this is then the chances of getting false positives will be less.
I have formatted the fasta and taxonomy file, and ran the classifier on the individual Archaea (SILVA + RDP reads) and UNITE reads successfully.
But, when I am running the combined Bacterial reads from SILVA and RDP (2615591) for testing purposes. It is taking too long for the classifier to complete. Also, In the Archaea samples I have combined RDP and SILVA reads and they ran properly, so formatting should not be an issue.
I am running on my organization’s HPC high_mem node for the past 4 days. I am worried, how much time will it take to run the complete custom DB with 3320193 reads.
Do you have any suggestions to get it done on time or at least know if it is running properly and if this is a feasible way of doing it?
following is the command I used:
# -q highmem.q
module load qiime2/latest
qiime feature-classifier fit-classifier-naive-bayes
–o-classifier Bac_all_classifier.qza![Screenshot (46)|690x388]