can we process nanopore data using qiime2...?

Anil_Kumar_Chauhan · August 18, 2020, 6:37am

devonorourke · August 18, 2020, 4:18pm

Define "process".
What type of Nanopore data? Using what chemistry and flowcell?

Anil_Kumar_Chauhan · August 19, 2020, 7:41am

By process I mean can we get insight into microbial diversity using 16S rRNA metagenomics data obtained using nanopore Minion sequencing

devonorourke · August 19, 2020, 10:44am

I mean technical processing. And the technical steps you want to complete depend on how your data was generated. You haven't answered the questions about that yet (a minIon isn't a flowcell, that's hardware):

What specific kit chemistry (what version of what kit)?
What flowcell type (v.9.4.1? the new 10.3?)?
Are you using the latest basecalling with Guppy? What version? Or are you using Bonito instead?

There are several steps in sequence processing that are done in QIIME that do not, to my knowledge, exist for ONT data.

You likely need to do these on your own, outside of QIIME:

fast5 to fastq basecalling
binning reads by barcode (if multiplexed)
Remove adapters, barcodes

Then, this data could be imported into QIIME, but none of the existing tools support ONT data for denoising. You could cluster, I suppose, but depending on your coverage, and whether or not you have associated Illumina data, you might consider polishing those reads to identify consensus sequence variants. Those tools do not exist in QIIME, but are well documented on the Nanopore Community forum.

Perhaps you have a set of sequence variants you want to classify. QIIME could support this, but how would you be able to take advantage of the longer Nanopore read? I don't know if any full length 16S reference set available internal to QIIME, so you'd likely need to build your own classifier/database first. The tools exist to do that on QIIME, fortunately. Alternatively, you could trim your long reads to an existing 16S region (say V4), then classify using databases like SILVA or Greengenes that are already available to use.

If you happen to do all of this outside of QIIME, you certainly can import the frequency table, taxonomy, and sequences. You should then have no difficulty using the available diversity tools. However, you'd still need to consider whether the tolls themselves are suited for the data. There is no single answer to that, because it entirely depends on what you did to your raw data to get it to that stage of analysis.

I'd recommend the Nanopore Community forum as a starting point if you find yourself in a situation where you have raw data but no further training on how to procees these reads - there are loads of folks posting about the most up to date issues with a product that is much more prone to technical changes than most QIIME users deal with. Nevertheless, as you navigate through these tasks, it would be great to hear what you are accomplishing on this forum too! It's possible that more tools can be built to work in QIIME, but it will require interested researchers like yourself to identify the current gaps and put forth solutions.

Good luck

Anil_Kumar_Chauhan · August 20, 2020, 7:26am

ok Thank you for the support

Nicholas_Bokulich · August 20, 2020, 7:31am

No reference sets are internal to QIIME 2, but we do provide pre-trained classifiers for some commonly used 16S reference databases (including full-length SILVA SSU) on docs.qiime2.org — so yes, pre-trained classifiers for full-length 16S are available. And of course QIIME 2 can work with all sorts of user-provided data, e.g., other databases or marker genes, so it is flexible if the pre-trained classifiers are not quite what you are after.

system · September 20, 2020, 1:31pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.