Mock community assessment and quality control?

bloman2 · March 8, 2021, 2:57pm

Hello,

We have sequenced mock communities as references for our sequenced samples, but beyond looking at whether or not the output is close to "expected", we aren't sure what else to do with this data.

Is there potential for a future QIIME2 tool that would help correct sample data based on mock community results, or give some sort of confidence value on the data as a whole? Not sure how this would work, since it would be impossible to include all bacteria present in samples in the mock community (e.g. can't assess primer/sequencing bias for ASVs that aren't in the mock community), but would there be some way to do this if you control for everything properly, as in take the mock community through the entire pre-sequencing pipeline (DNA extraction, amplicon library prep, etc)?

At the end of the day, we all need a good control to demonstrate to the community (and perhaps more importantly manuscript reviewers) that our sequencing pipeline is reliable.

Does anyone have thoughts on this?

Thank you!!!
Brett

Nicholas_Bokulich · March 8, 2021, 3:22pm

Welcome to the forum, @bloman2 !

This is a great General Discussion topic, and those are some great ideas.

How are you performing this comparison? If you are not already, you could use some quantitative methods to perform this comparison, e.g., to measure how accurately your mock community represents the expected results. See the "visualizers" section of this plugin:

We have a tutorial demonstrating how to use some of these actions here:

Mock communities never look perfect (because of the various sources of bias including those you list) but these methods can be used to get a sense of overall quality... in practice, though, the first time you use a mock community you cannot tell whether its accuracy reflects the quality of the mock community used, or the quality of the run. So you put the same community on every run you do, and you can detect when the quality degrades (indicating a run error) as shown in this paper:

This sort of process (using mock communities as a "canary in the coal mine") would accomplish this idea:

As for this idea:

There is definitely room for improvement. I plan to update q2-quality-control at some point very soon, including to add this method (which uses negative controls, not mock communities):

I think I saw a publication in the past 2 years that used mock communities in a similar way, but I cannot find it now. One issue with using mock communities for this is that they can have their own issues — misannotation, poor contruction, poor quantification, contamination, etc — that are separate from the sequencing run. These can of course be controlled and validated, and validated mock communities can be purchased commercially (maybe this is what you are using), but they still poorly reflect the diversity of real samples so there may be issues with using mock communities for "denoising" data (data correction), as you describe.

So long story short is that I recommend combining negative and positive controls as run standards, and negative controls can certainly be used for decontaminating data, mock communities for assessing data "quality" (with the caveats considered above).

P.S., here is some relevant discussion from the forum past:

bloman2 · March 11, 2021, 4:06pm

Wow, thanks for the very thorough response and bringing all of this to my attention. I'll look through your suggestions and see what I can do
Thanks again!
Brett

Midnighter · March 13, 2021, 11:52am

Some great links to dig into @Nicholas_Bokulich!

I have been pondering the same question and there is the following paper which I would like to apply to my work, however, I have not yet been able to implement the methods due to time constraints.

ahfitzpa · March 24, 2021, 4:31pm

I've been trying to incorporate decontam in R on HPC and quality control mock community plugin, following basic clean up and classification in qiime2. However, I have an issue that perhaps other people commenting in this thread may have found a work around to. The decontam outputs are a FeatureTable(RelativeFrequency), taxonomy file and the filtered rep-seqs file.Prior to decontam I used to run qiime taxa collapse to level 7 on the dada2 FeatureTable output and converted it to a relative frequency table. How can I work with the decontam output in the mock community plugin, if I cannot collapse the taxonomy for relative frequency feature table?

Nicholas_Bokulich · March 25, 2021, 2:26pm

Hi @ahfitzpa ,

Ignore the first two outputs. You only need the rep-seqs file, containing the non-contaminant sequences (I presume).

import that sequence file into QIIME 2 as a FeatureData[Sequence] artifact
Use qiime feature-table filter-features to filter the original feature table (pre-decontam). This will remove the contaminant sequences (provided that you select the correct exclude or include option).
Collapse that feature table.

Good luck!

P.S., some day (maybe later this year) we hope to add decontam to a QIIME 2 plugin, which will make this process much more streamlined... I do not have an ETA but you can keep an eye on the release notes with each release, that is where new features like this will be announced.