I’m trying to follow the Evaluating and controlling data quality with q2-quality-control tutorial using the data from our lab. We used this following MOCK community: HM-783D, which contains 20 bacterial strains. I do not know how I can obtain the reference-seqs.qza and the qc-mock-3-expected.qza for this MOCK community? I just know its composition.
At the same time, I received two files related to the MOCK sample (one for the forward reads and the other one for the reverse reads) from the sequencing center, along with the rest of the raw data.
So, I created a folder, where I put only the two MOCK files (forward and reverse reads). From there, I imported this sample into Qiime2 (using Casava 1.8 paired-end demultiplexed fastq option) --> denoised using DADA2, which allowed me to get a table.qza file and a rep-seqs.qza. Is this approach correct? These files would be the query-table.qza and the qc-mock-3-observed.qza that I need for this tutorial, respectively?
There is a growing database of mock community datasets and resources in mockrobiota that you should check out. In particular, it sounds like your mock community may have the same exact composition as mock-21 and/or mock-23. These were generated from (I think) the same mock community from BEI, but using the high concentration rather than low concentration (HM-783D) product. So check the expected taxonomies and expected sequences in the source directory of each of those mock communities — you may be able to just use those files as-is for your data.
(if you use any of the resources in mockrobiota, please make sure to cite mockrobiota and the original source publication for any datasets that you use, since these are not part of QIIME2)
If mockrobiota does not have what you need:
qc-mock-3-expected.qza is essentially just a composition table converted to a biom, converted to a FeatureData[RelativeFrequency] artifact. You can export the data from that file to take a look at the original tab-separated table, and figure out how to format/convert your own.
reference-seqs.qza is only used by the evaluate-seqs action, so you can still use evaluate-composition without it. This is a fasta of expected sequences that correspond to the members of the mock community; if you don’t have that, you are unable to use the evaluate-seqs action.
Hi, Thanks for your reply. It helped me a lot. I checked out the mockrobiota website and I downloaded the expected-sequences.fasta related to MOCK-21. Now, I’m trying to import the expected fasta file related to this MOCK community into QIime2, using:
Either there is something wrong with the file you downloaded, or with your version of QIIME2. What version of QIIME2 are you running? Did you inspect the file that you downloaded to make sure it looks okay?
In any case, the file I attached should work for you.
Hi, Thanks very much for sending me the qza file. I think I was actually using a wrong file, as I was able to import another fasta file with no issues. I’m using Qiime2-2018.2 (via VirtuaBox). I was also able to use qiime quality-control evaluate-seqs plugin (Evaluating sequence quality tutorial). The results (related to the comparison between my query sequences and the expected sequences) seemed to be good (from what I understood), there were some mismatches though. Is there a way of obtaining an overall sequencing error rate based on the eval-seqs-test.qzv file?
Great! Some amount of mismatches is to be expected… no denoising method is perfect
I suppose you could tally the total number of mismatches across all sequence variants, weighted by the abundance of each sequence. Would that accomplish what you need? To do so, you can download the results from the QZV as a TSV and analyze in R or jupyter notebooks.