IonTorrent 16S Metagenomic Kit - considerations and problems

Hi you all! How are you?

I’ve been working with IonTorrent 16S Metagenomic Kit for a month now and, even if I have tried to find the answer to my questions among the forum, there are many thing I have some doubts about.

As you will probably know, the Kit amplifies 7 regions using two different primers (Pool 1: V2, V4, V8; and Pool 2: V3, V6–7, V9). Once the sequentiation is done, we get one .BAM file for each sample (so the data is already demultiplexed).

Searching for a pipeline to follow, I came up with the following one, described by @cjone228:

  1. Import sequences as SingleEndFastqManifestPhred33V2 format
  2. DADA2 ( denoise-pyro )
  • assigns ASVs

  • concern is that alpha diversity is artificially inflated due to multiple V regions

    qiime dada2 denoise-pyro
    –i-demultiplexed-seqs ./samples.qza
    –p-trunc-len 0
    –p-trim-left 15
    –p-n-threads 2
    –o-representative-sequences ./Dada2_output/rep-seqs-dada2.qza
    –o-table ./Dada2_output/table-dada2.qza
    –o-denoising-stats ./Dada2_output/stats-dada2.qza

*in order to visualize the DADA2 output I used qiime metadata tabulate for visualizing stats-dada2.qza, qiime feature-table summarize to visualize table-dada2.qza and qiime feature-table tabulate-seqs to visualize rep-seqs-dada2.qza.

  1. Closed reference OTU clustering ( qiime vsearch cluster-features-closed-reference )
  • this is in an attempt to deflate artificially inflated alpha diversity by “collapsing” multiple amplicons different variable regions corresponding to the same feature
qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path ./gg_13_8_otus/rep_set/99_otus.fasta \
--output-path ./ClosedReferenceOTU/99_otus_gg.qza

qiime vsearch cluster-features-closed-reference \
--i-table ./Dada2_output/table.qza \
--i-sequences ./Dada2_output/rep-seqs.qza \
--i-reference-sequences ./ClosedReferenceOTU/99_otus_gg.qza \
--p-perc-identity 0.99 \
--o-clustered-table ./ClosedReferenceOTU/table-cr-99.qza \
--o-clustered-sequences ./ClosedReferenceOTU/rep-seqs-cr-99.qza \
--o-unmatched-sequences ./ClosedReferenceOTU/unmatched-cr-99.qza

qiime tools import \
--type 'FeatureData[Taxonomy]' \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path ./gg_13_8_otus/taxonomy/99_otu_taxonomy.txt \
--output-path ./ClosedReferenceOTU/ref-99-taxonomy.qza

qiime taxa barplot \
--i-table ./ClosedReferenceOTU/table-cr-99.qza \
--i-taxonomy ./ClosedReferenceOTU/ref-99-taxonomy.qza \
--m-metadata-file ./samples-metadata.tsv \
--o-visualization ./ClosedReferenceOTU/taxa-bar-plots-cr-99.qzv

4 Diagostic alpha rarefaction curve

  • determine sampling depth
qiime tools import \
--input-path ./gg_13_8_otus/trees/99_otus.tree \
--output-path ./99_otus_rooted-tree.qza \
--type 'Phylogeny[Rooted]'

qiime diversity alpha-rarefaction \
  --i-table  ./ClosedReferenceOTU/table-cr-99.qza  \
  --i-phylogeny ./99_otus_rooted-tree.qza \
  --p-max-depth 70000 \
  --m-metadata-file samples-metadata.tsv \
  --o-visualization Alpha-rarefaction/alpha-rarefaction.qzv
  1. Download corresponding phylogenetic tree (rooted) and import
  2. Core metrics analysis
  3. Taxonomic classification using vsearch ( qiime feature-classifier classify-consensus-vsearch )
mkdir Taxonomic-Analysis

qiime feature-classifier classify-consensus-vsearch \ 
 --i-query ./Dada2_output/rep-seqs.qza \ 
 --i-reference-reads ./ClosedReferenceOTU/99_otus_gg.qza \ 
 --i-reference-taxonomy ./ClosedReferenceOTU/ref-99-taxonomy.qza \  
 --o-classification ./Taxonomic-Analysis/vsearch_gg_classification.qza 

**Here I get the following problems:

       There were some problems with the command: 

(1/5) Missing option ‘–i-query’.
(2/5) Missing option ‘–i-reference-reads’.
(3/5) Missing option ‘–i-reference-taxonomy’.
(4/5) Missing option ‘–o-classification’. ("–output-dir" may also be used)
(5/5) Got unexpected extra argument ( )

Besides this final error, I have some other doubts that make me think I’m not doing the analysis in a correct way:

  1. My FastaQ files have the following structure:

[email protected]?B6;<6<<<<<[email protected][email protected]@;;;6,666;;550555554;4/;6;…/-)-353773C?:;>C>A;800/+/).—)–
@[email protected]?CDEA?@DC@@@CCA<<//.+0.-)–)-)-36-0+.88)…8C??1>=<<+/±-5)–)[email protected]<9.44;;4;40<[email protected]<@C

How can I know if they contain barcodes or primers sequences? Is it possible to remove them? Is it completely necessary to remove them if I use this pipeline? (I read something that with dada2 pyro it was not necessary)

  1. Which is the difference between the qiime2 vsearch cluster-features-closed-reference and qiime feature-classifier classify-consensus-vsearch ? Sorry if the question is a fool one, but I am really new with this kind of analysis… ! :frowning:

  2. I have performed the qiime2 vsearch cluster-features-closed-reference and now I don’t know what we are supposed to do with the unmatched sequences . When I saw the taxa-bar-plot of my sequences I was really surprised at first, because I didn’t get any unassigned %, but this is not actually true, is it? Because that taxa-bar-plot doesn’t take into account the unmatched sequences, it just separates them. Then, which is the next step to be able to characterize that sequences? Or do we just “forget” about them for the taxonomy?

  3. Should be better to separate the sequences based on their 16S region? I was thinking that maybe my high % of unmapped sequences is due to having sequences from 7 regions and mapping them with Greengenes reference database that is focused only in region V4… I wonder if it would be a better idea to take just my V4 sequences and remake the analysis with them. If the answer is yes… How can I do that separation? (I read something about this in How to separate hypervariable regions sequenced with 16s Metagenomics Ion torrent , but I don’t know how to do ir exactly.

I know there’s so much information, doubts and problems here, but as I said at the beginning, I am really new in this kind of analysis and I have been using qiime2 just for a month now…

Thank you so much in advanced :slight_smile:

Best regards!

1 Like

A post was split to a new topic: Command “missing option” error

You ask many very valid questions here, @MiriamGorostidi , and I have been hesitant to reply because I can’t answer most of them. In the future, you might consider breaking separate questions out into separate posts, just to reduce the complexity of each post.

For now, I’ve extracted one Technical Support question from your topic above, to be discussed in a separate topic. I @ mentioned you there, so we can keep this thread clean.

The only other thing I can contribute here are some resources on question 2. You’ve likely read the docs already, but the feature-classifier paper might shed some light on your question #2.

With my limited experience, that’s the best I can do here. Anyone with more relevant experience is encouraged to chip in.

There’s many question about this type of data from ion torrent metagenomic kit.
Question 1: Raw sequences contain primers and they are about the 20 first and last bases of each sequence (with variants). Primers are not informed but it is possible to deduct them from raw sequences dataset. You can then separate each region with cutadapt.

The bigest question remaining for me is how to integrate the dada2 preprocessed dataset from each variable region? (like question question 4. @MiriamGorostidi) sum abondance of the same taxa taken from different regions seems not norrect to me. or the only way is to analyze each region separately?
I have read this interesting dicussion of @cjone228 but no real answer to this part.

1 Like