--i-query-sequences ARTIFACT PATH FeatureData[Sequence]
Sequences to test for exclusion [required] --i-reference-sequences ARTIFACT PATH FeatureData[Sequence]
Reference sequences to align against feature
sequences [required]
Is the query-sequences the REPRESENTATIVE file produced in DADA2 step? Or what?
What is the reference sequence? is the the SILVA or Greengene database? I asked my colleague he told me there are different reference sequences, so I do not know should I use the SILVA or Greengene database or it is a different story!
@Mehrdad, this question could be answered by looking at your own data and reading the documentation carefully. All QIIME 2 artifacts have a specific type and format — you can determine the type and format by using QIIME 2 view or qiime tools peek, and then answer for yourself "is this an appropriate input to this action, given the artifact types listed in the documentation"? But here goes:
Yes, that is a valid input.
You should talk to your colleague a little more. SILVA and Greengenes are examples of reference databases that you could use here — it is up to you to decide what is appropriate here.
Your assumptions are all wrong. You are inputting the wrong type of artifact, and the error message makes it very clear what you did wrong. You should use the reference SEQUENCES, not a taxonomic classifier.
–i-query-sequences ARTIFACT PATH FeatureData[Sequence]
Sequences to test for exclusion [required]
–i-reference-sequences ARTIFACT PATH FeatureData[Sequence]
Reference sequences to align against feature
sequences [required]
I know but I want to check my data whether I have contamination or not!
My major problem is I do not know what is the sequence reference. I visited NCBI website for refseq but the existing files there are huge while the reference sequence file in the here is negligible. Should I provide it from NCBI or a relevant site or it is available in QIIME2 page? Thanks
It all depends on what your goal is. If you are trying to filter out sequences that do not resemble bacteria, you can use greengenes and even a small database (like greengenes 97%) would do if you reduce the percent identity parameter — you are just trying to remove anything that does not resemble bacteria, so you do not necessarily need a very in-depth reference database like NCBI.
I re-tested the SILVA and Greengene databases, but I was given the previous error when I use quality control plugin that's why I would like to know what is the reference sequence in this parameter. You suggested the classifiers already; however, you denied using them as well.
** Parameter 'reference_sequences' received an argument of type TaxonomicClassifier. An argument of subtype FeatureData[Sequence] is required.**
Debug info has been saved to /tmp/qiime2-q2cli-err-4hubmz_q.log
My question is if these kind of files are wrong to use for this command, what file is required? Or if you were me, what file would you use? Honestly I do not know what is the reference sequence. Please clear it to me.
I have unassigned microorganisms in the bar plot, so I need to remove them. In this step it is crucial to do that.
@Mehrdad I have never suggested using trained classifiers for this command. DO NOT USE TRAINED CLASSIFIERS. Those are only used for taxonomy classification, and that is what this error is persistently telling you. I have already described how to pre-check your types above:
I also directed you to the tutorial that explains how to do this. Use a DATABASE, not a classifier. You can find some links to eligible databases here — if those are unsatisfying to you, you can find any other FASTA file and import it as a FeatureData[Sequence artifact to use as a reference here.
This command is not strictly necessary to do that. You could just filter out all features that are unassigned.