Got reads from samples that were NOT in the flowcell

lca123 · April 2, 2019, 12:30pm

Hi there,
We were to sequence 81 samples in the last Miseq run, but a last minute mistake made us to not apply a few. So the final pool had 73 samples. However, we did not deleted the names of these samples from the SampleSheet which means Miseq "didn't know" they were out. They run like fake samples. We expected they had no reads, of course. But now I got the fastq and surprisingly found that there are a few reads from within those fake samples.

Negative-reaction-control_S82_R1.fastq.gz      34
Negative-extraction-control-1_S30_R1.fastq.gz     35
IG19-202_S2_R1.fastq.gz  37 not run
IG19-233_S21_R1.fastq.gz 39      not run
IG19-234_S22_R1.fastq.gz 41      not run
IG19-208_S8_R1.fastq.gz  44      not run
IG19-235_S23_R1.fastq.gz 48      not run
Negative-extraction-control-2_S81_R1.fastq.gz     80
IG19-175_S50_R1.fastq.gz 85      not run
IG19-174_S49_R1.fastq.gz 110     not run
IG19-176_S51_R1.fastq.gz 140     not run
IG19-196_S66_R1.fastq.gz 1713
IG19-200_S70_R1.fastq.gz 5184
IG19-195_S65_R1.fastq.gz 5235
IG19-199_S69_R1.fastq.gz 5346
IG19-198_S68_R1.fastq.gz 5387
...
Positive-control_S80_R1.fastq.gz       17621
Undetermined.fastq.gz 318405
--Total number of reads on the run = 2540687

Does someone had any experience or clue on why that happened?
I am on two hypothesis:

barcode cross-talk even within samples that really don't exist. Somehow the machine found those reads belonged to those samples...
contamination issues. Even the samples were not real their read numbers are close to those from the negative controls. Would that indicate there is actually (maybe also) a contamination around? But that would mean too that 34, 35 and 80 reads within ~2.5 million would be interpreted as significant, which I really don't know.

Any help?
Cheers, Leo.

Nicholas_Bokulich · April 2, 2019, 12:41pm

Hi @lca123,
This is the scary sort of thing that happens all the time whether we are looking or not. Now you've just had the opportunity to experience these errors first-hand!

An additional possibility is sequence error in the barcodes causing one barcode to be misread as another.

You have a few choices:

ignore and move on. It is unlikely that you can really thoroughly remove these errors from your real data, but the read counts are so low that these inherent errors likely will have no impact (incidentally, this is another reason why we use abundance filters to exclude samples with low read counts — and similarly another reason why negative controls will often register a few reads, such as the negative control that I see in your run).
You could use these fake samples as quasi-negative controls and try to see if there are trends in what ASVs/taxa are observed, and potentially use that to filter your real samples if these appear to be obvious contaminants and not cross-contaminants.
You can examine the barcode PHRED scores to see if there are any trends, e.g., lower quality scores as described in this paper that you can use as a threshold to exclude low-quality reads (not currently supported in QIIME 2).

lca123 · April 2, 2019, 6:17pm

Thank you, Nicholas.
I am going to check it and provide a follow-up after having any conclusion.

lca123 · April 5, 2019, 1:31pm

Hi Nicholas,
I've looked a little bit more on theese data and, fortunatelly, seems no big deal. I got taxonomies assigned to them, but the ASVs are unique and not even all got something assigned to. They also didn't match the ASVs from the negative controls which turns out an unlikely contamination. So I am moving on mainly because of it and because of the so low number of reads.
Thank you for the suggestions!
Leo

Micro_Biologist · April 6, 2019, 7:19am

There are a few things that can cause this:

Contamination from the samples you prepped but didn't run into the other samples.
Sequences from the previous run contaminating this run, Illumina estimate this is approximately 0.1% (cant remember where I got that from sorry).
Cross-talk contamination.

If you aren't using unique dual indexes, check them out as they should reduce cross talk considerably.

system · May 7, 2019, 1:19pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.