understanding redbiom "contexts" for q2-clawback


I’m checking this tutorial because it was mentioned in the workshop October, 2020. I’m having troubles understanding this quote:

Could anyone explain it briefly what is a “context” and why was the choice “Deblur”? I have checked redbiom github page but I still can’t understand.

Thank you.

Hi @the_dummy,

A context is a logical partition of data. For the caching we do against Qiita, a context can be interpreted as a group of samples sequenced in a generally technically consistent manner, with identical bioinformatic processing applied. That is a loaded statement, so I’ll expand a little more.

The V4 16S contexts in Qiita are based on the preparation information users upload when they create their study. Specifically, the user has described the samples in that preparation has having a target_gene of 16S or 16S rRNA, and a target_subfragment of V4. However, we do not currently restrict on the exact primers used, and there are a variety of V4 primers. Assuming the user is accurate in the information they report, and assuming the molecular work and sequencing were reasonable, we would expect the data produced to correspond to 16S V4. But, because of possible issues preparation and sequencing, the possibility of some inconsistencies in the primers, it would be conservative to describe the context as “generally technically consistent.” That said, looking at the context as a whole (specifically the Deblur 16S V4 90nt one), we see broad patterns that make biological sense. And, from what we can tell, the majority of the data are using the EMP V4 primer set.

For the bioinformatics, every single sample in a context was processed identically and independently. And this brings me to Deblur. Deblur uses a static error model and can operate in a sample independent fashion (with --p-min-reads ). This is the ASV method Qiita uses to mitigate technical bias across sequencing runs. Closed reference OTU contexts are also available.