Is there anything I can do about a large number of "undefined" reads?

I am running Qiime2 2019.7 in a Conda environment. I am working with data which I got back demultiplexed from an Illumina miseq 2x300 kit.

My dataset is comprised of 279 samples + 1 sample that is “undefined” which I believe consists of all the reads which Illumina couldn’t properly demultiplex. I have two undefined fastq’s (F and R), so I know this is an issue from the actual Illumina software, but I thought that someone here might be able to direct me to some useful information. In particular I am wondering the following.

  1. What causes a read to be undefined? Is this an issue with the barcode getting messed up during the demultiplexing process?

  2. Is there any way to salvage even some of these reads? I’m including my demux.qzv and my table.qzv to show just how many reads are showing up as undefined. It is my single biggest sample, which is a little depressing.

Again, I realize that half of this issue is outside of QIIME2, but if anyone has any information I would be extremely appreciative!

Illumina-demux.qzv (313.4 KB)

table.qzv (1003.4 KB)


Hi @bpscherer,

The best way to get these answers is to directly contact your sequencing lab. I could name a few things below but this isn’t exhaustive by any means:

  1. A few (certainly not all) scenarios where you might get undefined reads: a) barcode hopping, b) incorrect basecalling on the index fails to map to any sample, c) low quality scores at the barcodes leading to them being discarded, d) contamination/flowcell leakage, e) corrupted data files during processing/transfer

  2. Salvaging is not really possible…if you had self-correcting barcodes and for some reason the demultiplexing process didn’t account for this, you maybe able to re-run with the correction. I doubt this is the case, I’m sure the sequencing folks would have done this for you already. Perhaps increasing error tolerance on the index assignment could rescue some, but again you need to be really careful because that could lead to mis-assigning reads to wrong samples. Again, I think this is something the sequencing facility would have taken into account themselves.

Best bet is to talk to your facility and see if there’s anything they recommend on their end.