MinION 16S data for diversity analysis?



I have 16S data from MinION that is already demultiplexed, basecalled and I got the taxonomic classification result for those as well. Now I have taxa id, read id, run id and accuracy for each samples. I was wondering whether it is possible to perform diversity analysis in qiime2 using these data? If so, what format should it be imported in ?



(Nicholas Bokulich) #2

You will need to obtain a biom-format feature table and import as a FeatureTable[Frequency] artifact. It is not clear to me what the shape of your data is, so I recommend checking out the biom-format documentation to determine how to format your data correctly.

Sequences must be in fasta format. Taxonomy must be in tab-delimited format like this:

seq-id <tab> taxonomy

If that is unclear, I recommend exporting example data files from the tutorials to examine their contents and format your data accordingly.

You can import your fasta sequences like this:

qiime tools import \
  --input-path sequences.fna \
  --output-path sequences.qza \
  --type 'FeatureData[Sequence]'

and tab-delimited taxonomy like this:

qiime tools import \
  --input-path taxonomy.tsv \
  --output-path taxonomy.qza \
  --type 'FeatureData[Taxonomy]'


Thanks for the response. The taxonomic classifications are working great!
However, for (phylogeny) diversity analysis, I require the feature table(sequence) which I do not have. I have my reads mapped to the NCBI taxid. Is it possible to generate sequence table from the NCBI taxid?


1 Like

(Nicholas Bokulich) #4

I think not — unless if you are able to extract that information from NCBI.

One possibility would be to pull the sequence for each taxon’s type strain, but that would be labor-intensive and inexact.

I’d recommend simply using non-phylogenetic methods for your analysis if you have no way to get the sequence information.

1 Like

(Devon O'rourke) #5

Are you talking about the ‘sequencing_summary.txt’ file output from Albacore?
How did you perform basecalling? Did you run Centrifuge for classification?

Post a few lines of the files you have. Assuming you have the demultiplexed fastq’s, you can get a .biom file in a few steps.

Also, did you trim the adapters with Porechop (or equivalent)? Did you error correct with Nanopolish (or equivalent)? And if no, then did you classify with the raw reads? That’s not going to be what you want to do if you’re interested in diversity - there’s way, way to much noise.




I performed the demultiplexing in deepbinner, basecalling in guppy and analysed the fastq files from guppy in EPI2ME platform (16S analysis). I used the csv files generated from the platform (both QC and taxa files), merged it, filtered the reads which were below the threshold and kept only the columns that are required for constructing a feature table (I have a set of illumina data for the same samples; I downloaded the feature table into biom and then into tsv file to access it and made my minION read table look similar using pandas). For the taxonomic table, I mapped the taxid to NCBI database to pull the entire taxonomic rank and ordered it using python; so in the end, it produced a table which was similar to the feature(taxonomy) table that is generated in qiime2.

Hope it was understandable


(Devon O'rourke) #7

Using any software from Ryan Wick is a good idea in my opinion ! :slight_smile:

The last time I ran EPI2ME was over 8 months ago, and the classifier under the hood was Centrifuge. A quick glance of the Community Nanopore forum doesn’t suggest that anything’s changed, but that was just a cursory search. It might be worth posting your questions on that forum too.

In my humble opinion, there are a few things to consider before trying to shoehorn the Nanopore EPI2ME workflow into QIIME. First, note that EPI2ME is set up for speed, and that’s exactly why it’s using Centrifuge. It’s a short read aligner that leverages a kmer sketch of a database - it’s not a global alignment analogous to something like VSEARCH (see their paper for more details). This means you can rapidly classify loads of sequences; that’s great for real time sequencing when you are shooting for a sort of 30,000 foot view perspective. I’m not so sure it’s what you want if you’re going to calculate alpha or beta diversity though, especially if you haven’t corrected your raw reads.

To further complicate matters, it’s important to note that Centrifuge (and therefore EPI2ME) is not using the same database typical to most QIIME users - it’s not Greengenes, it’s NCBI. Does that matter to you? Could your resulting classifications be different in part because of a database that is perhaps less well curated? Note that you can run Centrifuge with your .fastq files directly without using EPI2ME at all, and you can build whatever database you want for Centrifuge to work with. It might be interesting for you to test how their default NCBI database compares with something like Greengenes. I’d certainly like to know.

One other thing to circle back to: EPI2ME is probably not correcting your reads prior to classification, and this is absolutely something to resolve if that’s the case. It always seemed like the prepackaged workflows through EPI2ME were a few versions behind of their standalone software, so I’d suspect that even though you ran the data through Guppy, you probably could improve your read and consensus accuracy with Nanopolish. It’s unclear whether that’s the case though, because it’s not clear which version of Guppy you’re running - if you can post the specific versions of the software you’ve used that’ll help. See Ryan’s preprint about basecaller comparisons - you’ll find that Guppy certainly is the way to go if you’re using the most recent version, but the larger improvements to cleaning up the noisy reads can also be related to training your classifier with your own data ahead of time.

Let’s circle back to your original question:

The short answer is of course you can perform the tests; the question you’re going to wrestle with is if the results are worth considering if you go about it the EPI2ME way using Centrifuge, or if you want to take those fastq files and make more of a manual effort to classify things with a different approach. Both this dog study and this sludge paper use MinION 16S data, but both use something other than EPI2ME to get their data classified.

What you’d probably want to do - shout out to @Nicholas_Bokulich here - is to run a 16S experiment with one or several mock communities. Until you have a known community, you’re just guessing at which method is better. One step in that direction is this paper which did this for the Zymogen mock community, but they did full metagenomic, not just 16S, so you can’t really use it as a benchmark for what you’re doing. But hey, that’s good news - an opportunity for an experiment!

I think you want to check in on the Nanopore forums first before QIIME to get a sense of how to tackle that question. There are hundreds of Nanopore users doing 16S work - connecting with those folk might be your best bet to get help with workflows tackling questions of diversity. A few Twitter folks to consider following: Arwyn Edwards (@arwynedwards), Devin Drown (@ArcticBiology), Mads Albertsen (@MadsAlbertsen85)… there are many others.

Good luck

1 Like