Nanopore Long Read Demultiplexed Full 16S rRNA Seq

Hello, I'm new here and had a few questions. So the lab I'm working in doesn't have a bioinformatician, so I've taken that role and have been learning on the fly. So I did the sequencing using the 16S Barcoding Kit 24 V14 (SQK-16S114.24) protocol and the MinION MK1C sequencing device. I selected the super high accuracy base caller, demultiplexed it, and trimmed the adapter/primers using guppy (v6.3.8). After that, I filtered based on length and quality using chopper. Everything up to here was fine. Now, how do I process everything in QIIME2. I used the "SampleData[SequencesWithQuality]" manifest import method and analyzed that way. I got my quality plot and taxonomy plot, but I don't think this was the right method as this is the error I get when I open the quality plot page on QIIME, "Danger: Some of the forward PHRED quality values are out of range. This is likely because an incorrect PHRED offset was chosen on import of your raw data."

Does anybody have any suggestions on what I could do so I avoid that error and get accurate data? Another thing I wanted to do was truncate the noise at the beginning and past bp 1550, as there's lots of noise shown on the quality plot. I tried dada2, but it doesn't work for nanopore data.

Any help would be greatly appreciated.

Hello Malek,

Welcome to the forums! :qiime2:

I found this thread, which has a lot of similar questions.

One key issue is that NanoPore will produce long, low-quality, and variable-length reads.

Variable-length reads break most assumptions about amplicons :upside_down_face:

Let us know what you try next!
Colin

1 Like

So for me, I'm looking at the full genome of the 16S rRNA gene (V1-V9). We're looking at animal feces as well as a few other things. Attached is a picture of the quality plot for one of the barcodes I had (10 total). The quality score is high, so I'm not too concerned about low quality reads. Species level classification is much higher than what I'm looking for, I'm aiming to create taxonomy plots at the family level (like in the 2nd photo). I just want to truncate everything from 0-80bp, and everything after 1550bp. I want the pipeline to not give me the PHRED error message I got. Finally, if I could get an ASV or OTU table, that would be beyond great.

P.S. I've looked at the MetONTIIME package. Would I insert my raw fastq files (directly from the MK1C) and start the analysis from the beginning? I also tried to follow the package, but it's a bit confusing.

What I've done up till now so far:

  1. Files from MK1C (demultiplexed, super high accuracy option, adapter/primer trimmed)
  2. Filtered using chopper (1200-2000bp, Q=13)
  3. Imported into QIIME2, classifier was GreenGenes
  4. Export taxonomy plots at family level


Barcode 1

1 Like

There is lots to unpack here!

To start with, I'm not the developer of MetONTIIME, though I have built genomics pipelines.

If you try the MetONTIIME pipeline, it looks like it does work with fastq files

  • concatenateFastq: in case workDir is the output directory generated by MinKNOW, this process concatenates all fastq files corresponding to each barcode (in workDir/barcode) and compresses them to fastq.gz; if workDir already contains fastq.gz files for each barcode, set the process to "false".

Does the MK1C make that file?

Thank you for showing the quality scores. Let's compare this to 16S V4 from an Illumina machine.

Your quality looks great, for NanoPore. It's still a very different machine.

I'm hesitant to comment on a pipeline for full-length amplicons, as I've not worked with NanoPore data before.

I'm glad you got the taxonomy assignment with GreenGenes working!
Does that match your expected community composition?

1 Like

So the MK1C does separate all the fastq files into barcodes, and each barcode is its own folder. I then went ahead and concatenated all the files into each barcode, so I had 10 files total.

It matches with what I expected and read about, but I just want to make sure its in the right amounts (Relative frequency).

I'll take a deeper look at theMetONTIIME. I'll have to see how to get the metontiime2.conf configuration file and set the desired options

1 Like

Why not just run wf-16s - an EPI2ME workflow that outputs abundance info and is built specifically to handle the output data from SQK-16S114.24?

1 Like

I've never used epi2me before, and only recently did I find out about it. I got almost everything I wanted to work on QIIME, so I thought to stick with it. Have you used epi2me, especially the workflow you recommended to me? Is it difficult to use?

I can appreciate that my very short response lacks any helpful description of how to use an alternative analysis tool, but that was intentional. I tend not to judge whether a piece of software is easy; like a hammer, even the simplest tool can be difficult for some, as challenges can come from physical, economic, or other constraints. You might find EPI2ME is easy though! And that’s the whole point: that software is supposed to simplify a user’s experience in getting from a sample to an answer.

Nevertheless, I don’t want to turn this thread and this forum into a Nanopore forum (they have their own community and support pages for that). I merely wanted to offer you a suggestion that if you’ve already created Nanopore data, you might consider analyzing it using their tools given that your specific analytical aims match a tool they have built, while to my knowledge there is no pipeline yet built to seamlessly integrate metabatcoding Nanopore reads in QIIME - but I could be wrong and would love to learn if such methods exist.

Good luck :slight_smile:

1 Like