Two parameters in the Command

I am unsure what are they?

--i-query-sequences ARTIFACT PATH FeatureData[Sequence]
Sequences to test for exclusion [required]
--i-reference-sequences ARTIFACT PATH FeatureData[Sequence]
Reference sequences to align against feature
sequences [required]

  1. Is the query-sequences the REPRESENTATIVE file produced in DADA2 step? Or what?

  2. What is the reference sequence? is the the SILVA or Greengene database? I asked my colleague he told me there are different reference sequences, so I do not know should I use the SILVA or Greengene database or it is a different story!

https://docs.qiime2.org/2019.1/plugins/available/quality-control/exclude-seqs/

:tulip::blush:

if the assumptions are true, I am having an error!

@Mehrdad, this question could be answered by looking at your own data and reading the documentation carefully. All QIIME 2 artifacts have a specific type and format — you can determine the type and format by using QIIME 2 view or qiime tools peek, and then answer for yourself "is this an appropriate input to this action, given the artifact types listed in the documentation"? But here goes:

Yes, that is a valid input.

You should talk to your colleague a little more. SILVA and Greengenes are examples of reference databases that you could use here — it is up to you to decide what is appropriate here.

Your assumptions are all wrong. You are inputting the wrong type of artifact, and the error message makes it very clear what you did wrong. You should use the reference SEQUENCES, not a taxonomic classifier.

2 Likes

Hi sir,
I am unaware of the two phrases in QIIME2 for quality control. Please tell me what they would be? I mean I do not know what kind of files are involved in the following parameters.
https://docs.qiime2.org/2019.1/plugins/available/quality-control/exclude-seqs/

–i-query-sequences ARTIFACT PATH FeatureData[Sequence]
Sequences to test for exclusion [required]
–i-reference-sequences ARTIFACT PATH FeatureData[Sequence]
Reference sequences to align against feature
sequences [required]

No any discussion in this regard!

Thank you very much for your reply!

@Mehrdad,
Please do not cross-post. You already asked this question, and received an answer, above.

exclude-seqs is not an essential command. In addition to the help documentation, you may see this tutorial for more explanation.

1 Like

Thanks for reminding!

I know but I want to check my data whether I have contamination or not!

My major problem is I do not know what is the sequence reference. I visited NCBI website for refseq but the existing files there are huge while the reference sequence file in the here is negligible. Should I provide it from NCBI or a relevant site or it is available in QIIME2 page? Thanks

It all depends on what your goal is. If you are trying to filter out sequences that do not resemble bacteria, you can use greengenes and even a small database (like greengenes 97%) would do if you reduce the percent identity parameter — you are just trying to remove anything that does not resemble bacteria, so you do not necessarily need a very in-depth reference database like NCBI.

2 Likes

I re-tested the SILVA and Greengene databases, but I was given the previous error when I use quality control plugin that's why I would like to know what is the reference sequence in this parameter. You suggested the classifiers already; however, you denied using them as well.

The error is here:

--i-query-sequences RepresenDenoisedLibA.qza
--i-reference-sequences silva-132-99-nb-classifier.qza
--p-method blast
--p-perc-identity 0.97
--p-perc-query-aligned 0.97
--o-sequence-hits hits.qza
--o-sequence-misses misses.qza

Plugin error from quality-control:

** Parameter 'reference_sequences' received an argument of type TaxonomicClassifier. An argument of subtype FeatureData[Sequence] is required.**

Debug info has been saved to /tmp/qiime2-q2cli-err-4hubmz_q.log

My question is if these kind of files are wrong to use for this command, what file is required? Or if you were me, what file would you use? Honestly I do not know what is the reference sequence. Please clear it to me.

I have unassigned microorganisms in the bar plot, so I need to remove them. In this step it is crucial to do that.

:tulip::pray:

@Mehrdad I have never suggested using trained classifiers for this command. DO NOT USE TRAINED CLASSIFIERS. Those are only used for taxonomy classification, and that is what this error is persistently telling you. I have already described how to pre-check your types above:

I also directed you to the tutorial that explains how to do this. Use a DATABASE, not a classifier. You can find some links to eligible databases here — if those are unsatisfying to you, you can find any other FASTA file and import it as a FeatureData[Sequence artifact to use as a reference here.

This command is not strictly necessary to do that. You could just filter out all features that are unassigned.

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.