Dear All - I'm attempting to BLAST my 18S dada2 results against the PR2 database. This is my understanding so far:
i need to get the PR2 sequences (supplied as a fasta file) and associated taxaonomy (supplied as a text file), as two separate qza files into Qiime2. These PR2 files are located here: Releases · pr2database/pr2database · GitHub. Using qiime import tools and specifying --type FeatureData[sequence] i can import the pr2_version_4.10.0_dada2.fasta into Qiime2 (that said, I'm not sure how to check it). I get error messages ("is not a(n) TSVTaxonomyFormat file" when i attempt to import the pr2_version_4.10.0_merged.tsv as a text file. i note that the merged file contains both the taxonomy and sequences (hence, i assume, 'merged'). The taxonomy file also has headers.
The advice here: Use PR2 in Qiime suggests that i assess the format of the SILVA text-taxonomy files and reformat the PR2 taxonomy file correspondingly. The response from 'ygao1' appears to include importing the same file twice, calling one 'seqeunce.qza' and the following one 'ref-sequence.qza'. A link to a 2nd location is made:Search results for 'pr2' - QIIME 2 Forum .However, this 2nd location does not provide much more detail that i can see.
I note that the PR2 database is based on an 8-level taxonomy whereas SILVA is based on a 7-level taxonomy. I can (in R) reformat the PR2 taxonomy text file (with a column each for Kingdom-Species) to combine the taxonomy into a single string (e.g. D_0__Kingdom_D_1__Phylum...) as per SILVA. However, i need to know what the cross-referencing idenifier is between the fasta 'sequence' and text 'taxonomy' files so that BLAST can make the link. This identifier would logically be the 'pr2_accession' number but i'm not sure what the respective fasta and txt file structures should be to ensure this (if correct) works. Guidance and/or sources of information for formating the fasta and text files to enable BLASTing would be much appreciated.
Many thanks.