To finalize the output we pruned the taxa according to the decontam results.
However, this final output does not remove all taxa inside negative control. Is this correct?
What are these taxa inside the negative control?
In general, we recommend that folks do not cross-post questions, as it's mostly the same people on the Qiime2 GitHub and the Qiime2 forums and we will see it either way.
Did you know that the Illumina platform has a barcode-hopping problem (that Illumina wrote a paper about )?
Those taxa in your NTC could be the most common taxa in your real samples, that were misassigned to your NTC through barcode hopping / crosstalk. This is part of the reason that removing all taxa found in your NTC from your full study is a bad idea!
In addition, could this false contaminant in negative control be caused by a low mass library in control? Since the adapter are probably much dense, maybe in the sequencing machine lane they steal/attached to the unbind strain?
Another addition (sorry if it keeps coming), decontam use two signatures to classify taxa into "contaminant". Is it possible that there are other signatures? Probably yes.
Here is another question closely related to the "false" contaminant in the negative control. Speaking of negative control, does the position of negative control matter?
When treating the sample, the common practice is to do it serially (one by one). Looking at the decontam guide page (at the library size plot), could that y=x shape by blue dots (control sample) are caused by them being positioned at the end (tail) of the sample processing sequence?
Yes, that looks a great tool! There's also uncross, though I have not used either of these tools.
As mentioned in that discussion, physical cross contamination between samples and optical index missassignment on the flow cell are two different sources of noise, and should be handled differently. decontam does not address barcode hopping, for example.
Possible, yes.
I'm not sure that would happen in the machine itself... Illumina's explanation for barcode hopping / crosstalk is that clusters on the flow cell are too close together physically, leading to overlap in the scan of the lane and getting the wrong barcode on the real read.
That's a great question for @benjjneb, the author of Decontam. Let's see what he has to say!
No, we've never observed anything like that. The plot you are looking at is ordered on the x-axis by library size, it is not ordered by the sequencing/sampling process.