how do I import reference sequences for feature classification?

Nir_Friedman · April 30, 2019, 1:19pm

Hi again.....
I am trying to import Silva_132 DB but I cant find the right command for that. I found an answer that u gave before:

So, I downloaded the database and I am using this command:
qiime tools import --input-path /home/qiime2/Desktop/Datasets/Silva_132_release --output-path silva_132_db.qza --type FeatureData[Sequences]

But I am getting this error: Sequences is not a variant of FeatureData.field['type']..
What am I doing wrong? I want to use this DB as BLAST+ taxonomy classifier.
After I will import the db ( according to your suggestion) where can I find a good example for a feature-classifier command? ( In qiime 1.... you gave us few examples for each script, this was very useful for "slow" people like myself......)
Tx
Nir

Nicholas_Bokulich · April 30, 2019, 6:53pm

Did you follow the link to another post at the bottom of the forum topic you listed? Reposting here:

That user shows how to import the necessary files and use them for the blast classifier, answering both of your questions. (the errors they report are related to the custom database they made; the commands they report there are technically correct).

Good luck!

Nir_Friedman · May 1, 2019, 6:44am

Hi,
My problem is that the database file that I downloaded composed of many files and not only the txt file as he mentioned.
I downloaded the file from Silva according to your link, unzip it and now I need to import it.
Shouldn't I import all the files in it? according to Yer Lor question, he imports only one txt file.
Tx
Nir

Nicholas_Bokulich · May 1, 2019, 12:27pm

pick one file from inside the rep_set directory. E.g., silva_132_99_16S.fna if you are using 16S data.

find the corresponding taxonomy in the taxonomy directory.

Nir_Friedman · May 3, 2019, 2:07pm

Hi,
Thanks a lot for your answers. Now I have all the raw data that I need in order to create a good OTU (feature) table so I could start the analysis.
in Qiime 1 the perfect outputs were:

7 phylogeny levels table ( with reads count for each sample ) and
OTU table ( 99% or 100% identity) with the corresponded sequences.
Now,
For 1, I need to use: qiime taxa collapse, right?
But for 2, Which plugin and which script should I use?
Tx
Nir

Nicholas_Bokulich · May 3, 2019, 3:08pm

Correct; collapse N times where N = the number of taxonomic levels you want to report.

You will need to export your feature table and the sequences and merge these outside of QIIME 2. It sounds like maybe you are planning to upload these to R for making plots? In which case phyloseq or whatever R package you are using should have methods for merging these data on the sample IDs.

Nir_Friedman · May 3, 2019, 3:36pm

Hi,
The 7th level is species right? and as far as I understand one of the advantages of using DADA2 over qiime 1 procedure ( pick OTU, rep_seq , etc' ) is that each Sequence Variants is now 100% "OTU" , so we don't use anymore 97% clustering, right?
So in this case, can I get the sequences only for the 7th level phylogeny without export to R?
( Since I am going to analyze many samples from many runs, I am trying to build a quick and convenient pipeline so the less transformations/exports is better).
Thanks again for your answer, I am not taking it as granted at all !
Nir

Nicholas_Bokulich · May 3, 2019, 4:10pm

depends on the database. In greengenes and SILVA, yes.

Unless if you want to cluster your ASVs. I do not advice is, but some others do this. The point is that there are many different options in QIIME 2, you do not need to follow one path...

The sequences artifact output from dada2 contains what you want. It is not collapsed at level 7 — there is no way to collapse at species level and take the corresponding sequences.

system · June 3, 2019, 10:10pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.