Evaluating and controlling data quality with q2-quality-control

I have several questions regarding, Evaluating and controlling data quality with q2-quality-control — QIIME 2 2021.8.0 documentation
I got my barplot of my Zymo mock communities and so I want to evaluate the data quality of my mock and be able to use it to evaluate my sample set with this known microbial composition.
Although,I was able to run the tutorial but running my mock data set was a different story.

I got confused with the two inputs : query and reference sequence.

  • What is the query sequence, is this the output, representative sequences?
  • How about the reference sequence, at what step is this generated?

mockb1barplot.qzv (326.8 KB)

Hi @Imee19 ,
This tutorial might be a little clearer, since it shows how to integrate this plugin in a full analysis:

the query sequences can be anything you want, but yes this would typically be your representative sequences for your ASVs/OTUs. The query sequences are the input that you want to compare against the reference.

The reference is usually one or more known sequences that you are comparing against. E.g., the exclude-seqs aligns a set of query sequences against the reference sequences to separate hits/misses into different outputs. So you could align against a full reference database (e.g., SILVA 16S sequences) or just a subset of representative sequences to remove non-target DNA (e.g., host genes or others that do not align to the reference).

Good luck!

Hi Nick,
I actually did use the representative sequences under query and classification.qza which was trained with Greengenes. I also tried every possible input I can use for these two and every time , it outputs as an error.


I also did the tutorial link you sent me but at the very end I got an error. I thought it must my input expected.tsv file. My mock community is from Zymo, so I just typed in the microbial communities present but did not add any relative abundance, is the percent 16S the relative abundance. I think this should be easy to run if the inputs are in correct format.

Also I tried again, the other tutorial, even followed the format of the expected.tsv supplied

Convert to biom and import

biom convert \
  -i expected-taxonomy-mod.tsv \
  -o expected-taxonomy.biom \
  --table-type="OTU table" \
  --to-json
qiime tools import \
 --type FeatureTable[RelativeFrequency] \
 --input-path expected-taxonomy.biom \
 --input-format BIOMV100Format \
 --output-path expected-taxonomy.qza

but I got an error in my bio convert step..

Screen Shot 2021-11-10 at 12.03.09 pm|690x431 MockExpected.tsv (879 Bytes)

As a follow up on the second issue encountered, I was able to get passed the previous issue on the tsv file,I added an extra underscore and it worked but at the final step I got an error again.
I used the input files: taxonomy:classification.qza,table dada_table_final.

Thank you!

Hi @Imee19 ,
Glad you could solve it! The first error you were getting was because you were passing the wrong data type... you can use qiime tools peek to inspect the type of a QIIME 2 artifact if you are not sure what type it is.

You can also look at the contents of the files in that tutorial before importing to see how your own files should be formatted.

Good luck!

Actually, I thought I was but got an error at the last step.What does this error mean. I thought since I got the data in correct format, it's going to work.

Hi @Imee19 - there is an issue with your screenshot, the text in it is not legible, can you please re-submit your error message? Thanks!

1 Like

Hi Matt,
I have attached the screenshot here.

Thank you,
Imelda

Hi @Imee19 ,
You are using the wrong types of artifacts as input. That action expects two different sequence files as input (query and reference).

Instead it looks like you input a feature table and a taxonomy file. Are you instead trying to filter a feature table using taxonomy classifications? That would be a different action in q2-taxa