Import Guidelines for NextSeq Data?

import
(benjamin w.) #1

NextSeq data isn’t demultiplexing well. Not sure how to proceed. Barcodes in our MiSeq are usually reverse complemented, but with the NextSeq runs they’re on the forward reads. 12 base pairs, but not the usual Golay with the EMP guidelines. Wondering if someone could help point me the way to see how I can get these imported correctly. Thank you very much. Ben

Forward read demultiplexed visualization

Paired reads demultiplexed visualization

We formatted the data in such a way as we have a multiplexed file of forward and reverse fastq.gz files from the NextSeq. The different samples are within these files. This is the way we processed the EMP MiSeq files, but I think there’s some incompatibility with the way that NextSeq headers/files are being seen by Qiime2.

Praying to the mods, unqueue me.

Thanks mods, may you be blessed with the rains down in Africa

(Mehrbod Estaki) #2

Hi @ben,
Just to get the ball rolling with troubleshooting this.

Is there a specific reason that lead you to think this? Can you provide some examples of your reads?

You mention that the barcodes on your forward reads but are there any on the reverse reads as well or just on the forward?
Did you run a dual-index run but only have barcodes in one direction?
Have you checked to see if all your reads are actually in the right orientation and not somehow mixed? This is probably not the case, but if so it could lead to some wonky demultiplexing.

1 Like
(benjamin w.) #3

Sure, these are stool samples in the links, read assignment less than 1000 seems very unlikely.

No barcodes in reverse
Only barcodes on the foward
All sequences are processed by the center we send them to, they are usually and consistently excellent

pairing in Qiime1.9.1 gives me 30,000,000 reads for the entire run (6 plates)
Assignments are similar in Qiime1.9.1 and Qiime2

(benjamin w.) #5

So, I’ve actually tried running the NextSeq in Qiime1.9.1, here’s the Split_Libraries_Fastq.py results:

modifiers:
-barcode 12

Quality filter results
Total number of input sequences: 29073029
Barcode not in mapping file: 28580648
Read too short after quality truncation: 19635
Count of N characters exceeds limit: 144
Illumina quality digit = 0: 0
Barcode errors exceed max: 0

Here are the top samples:

|BAR0188.stool.m02|11538|
|BAR0202.stool.baseline|11002|
|BAR0436.stool.baseline|8679|
|BAR0440.stool.baseline|8464|

1 Like
(Mehrbod Estaki) #6

Thanks for the update @ben. It’s obvious that you should be getting more read assignments and that the issue is at the demultiplexing step.

Did you happen to check for possible mixed orientation of your reads as I mentioned above? You could provide a sumsample of your reads?
If you ran a dual-index run but only have barcodes in one direction, you may run into an issue as described here. That may be something you can also discuss with your sequencing facility if unsure.

1 Like
(benjamin w.) #7

agreed, I feel like this problem is similar to what that person in biostars is having, need to look into this further.

1 Like
(Colin J Brislawn) #8

Did the qiime 1 demultiplexing work well? If you are happy with it, you could demultiplex with qiime 1, then import into qiime 2 for downstream analysis. https://docs.qiime2.org/2019.1/tutorials/importing/#sequences-without-quality-information-i-e-fasta

Just a thought! 🤷

Colin

1 Like
(benjamin w.) #9

Hi Colin, thanks, no it didn’t demultiplex correctly either. Using the Joined reads command I ended up an approximately 16 gigabyte file of joined reads, and 23 gigabyte file in forward and reverse unjoined reads.

From this, I tried to demultiplex, which is what you see above:

Quality filter results
Total number of input sequences: 29073029
Barcode not in mapping file: 28580648
Read too short after quality truncation: 19635
Count of N characters exceeds limit: 144
Illumina quality digit = 0: 0
Barcode errors exceed max: 0

From the NextSeq data I was only able to get 30,000,000 joined sequences, of which only 70,000? were demultiplexed correctly.

Ben

(Colin J Brislawn) #10

Hey Ben,

This seems to be a big problem:

Barcode not in mapping file: 28580648

Do you know why these barcodes are missing?

Also, do you know why the reads aren’t joining? I join my reads with vsearch and it tells me the reasons reads fail to join.

Give us more data so we can look for clues! :face_with_monocle:

Colin

2 Likes
(benjamin w.) #11

Agreed, and yes, I saw that - not sure why 30,000,000 sequences are missing barcodes from the barcode file. I can provide the cat from the three fastq and the barcode file if that would help.

There’s also this interesting bit: https://www.biostars.org/p/317492/ where the index orientation will depend on the reverse complement of the adaptor? This is a rabbit hole, we’re speaking with our sequencing core. Ben

(benjamin w.) #12

Small update, we solved the problem, the primers for some reason, which work excellently in MiSeq are not working in NextSeq. We are contacting Illumina and addressing the issue. Thanks for your help.

I worked this out by "cat"ing the seq.fna file, making sure that barcodes were present, then "cat"ing the barcode file and finding a lot of trash barcodes. I asked the core to bcl2fastq and look @ the highest barcodes and it turns out there were a lot of issues w/ their barcodes. Essentially, we think that it failed with the index read.

4 Likes
(Mehrbod Estaki) #13

Thanks for updating us @ben Glad you figured it out and hopefully it’ll all be sorted on your end as well.

1 Like
(benjamin w.) #14

Thank you, we contacted Illumina and there are slight differences between the NextSeq and MiSeq protocols which may lead to variations. These are being worked out now. I guess this is a cautionary tale for those trying to switch between platforms. Illumina has been excellent in helping and our core is great, so hopefully we will get to the bottom of this. Ben

3 Likes