A context is a logical partition of data. For the caching we do against Qiita, a context can be interpreted as a group of samples sequenced in a generally technically consistent manner, with identical bioinformatic processing applied. That is a loaded statement, so I’ll expand a little more.
The V4 16S contexts in Qiita are based on the preparation information users upload when they create their study. Specifically, the user has described the samples in that preparation has having a
16S rRNA, and a
V4. However, we do not currently restrict on the exact primers used, and there are a variety of V4 primers. Assuming the user is accurate in the information they report, and assuming the molecular work and sequencing were reasonable, we would expect the data produced to correspond to 16S V4. But, because of possible issues preparation and sequencing, the possibility of some inconsistencies in the primers, it would be conservative to describe the context as “generally technically consistent.” That said, looking at the context as a whole (specifically the Deblur 16S V4 90nt one), we see broad patterns that make biological sense. And, from what we can tell, the majority of the data are using the EMP V4 primer set.
For the bioinformatics, every single sample in a context was processed identically and independently. And this brings me to Deblur. Deblur uses a static error model and can operate in a sample independent fashion (with
--p-min-reads ). This is the ASV method Qiita uses to mitigate technical bias across sequencing runs. Closed reference OTU contexts are also available.