Hi you all! How are you?
I've been working with IonTorrent 16S Metagenomic Kit for a month now and, even if I have tried to find the answer to my questions among the forum, there are many thing I have some doubts about.
As you will probably know, the Kit amplifies 7 regions using two different primers (Pool 1: V2, V4, V8; and Pool 2: V3, V6–7, V9). Once the sequentiation is done, we get one .BAM file for each sample (so the data is already demultiplexed).
Searching for a pipeline to follow, I came up with the following one, described by @cjone228:
- Import sequences as
SingleEndFastqManifestPhred33V2
format - DADA2 (
denoise-pyro
)
-
assigns ASVs
-
concern is that alpha diversity is artificially inflated due to multiple V regions
qiime dada2 denoise-pyro
--i-demultiplexed-seqs ./samples.qza
--p-trunc-len 0
--p-trim-left 15
--p-n-threads 2
--o-representative-sequences ./Dada2_output/rep-seqs-dada2.qza
--o-table ./Dada2_output/table-dada2.qza
--o-denoising-stats ./Dada2_output/stats-dada2.qza
--verbose
*in order to visualize the DADA2 output I used qiime metadata tabulate
for visualizing stats-dada2.qza
, qiime feature-table summarize
to visualize table-dada2.qza
and qiime feature-table tabulate-seqs
to visualize rep-seqs-dada2.qza
.
- Closed reference OTU clustering (
qiime vsearch cluster-features-closed-reference
)
- this is in an attempt to deflate artificially inflated alpha diversity by “collapsing” multiple amplicons different variable regions corresponding to the same feature
qiime tools import \ --type 'FeatureData[Sequence]' \ --input-path ./gg_13_8_otus/rep_set/99_otus.fasta \ --output-path ./ClosedReferenceOTU/99_otus_gg.qza qiime vsearch cluster-features-closed-reference \ --i-table ./Dada2_output/table.qza \ --i-sequences ./Dada2_output/rep-seqs.qza \ --i-reference-sequences ./ClosedReferenceOTU/99_otus_gg.qza \ --p-perc-identity 0.99 \ --o-clustered-table ./ClosedReferenceOTU/table-cr-99.qza \ --o-clustered-sequences ./ClosedReferenceOTU/rep-seqs-cr-99.qza \ --o-unmatched-sequences ./ClosedReferenceOTU/unmatched-cr-99.qza qiime tools import \ --type 'FeatureData[Taxonomy]' \ --input-format HeaderlessTSVTaxonomyFormat \ --input-path ./gg_13_8_otus/taxonomy/99_otu_taxonomy.txt \ --output-path ./ClosedReferenceOTU/ref-99-taxonomy.qza qiime taxa barplot \ --i-table ./ClosedReferenceOTU/table-cr-99.qza \ --i-taxonomy ./ClosedReferenceOTU/ref-99-taxonomy.qza \ --m-metadata-file ./samples-metadata.tsv \ --o-visualization ./ClosedReferenceOTU/taxa-bar-plots-cr-99.qzv
4 Diagostic alpha rarefaction curve
- determine sampling depth
qiime tools import \ --input-path ./gg_13_8_otus/trees/99_otus.tree \ --output-path ./99_otus_rooted-tree.qza \ --type 'Phylogeny[Rooted]' qiime diversity alpha-rarefaction \ --i-table ./ClosedReferenceOTU/table-cr-99.qza \ --i-phylogeny ./99_otus_rooted-tree.qza \ --p-max-depth 70000 \ --m-metadata-file samples-metadata.tsv \ --o-visualization Alpha-rarefaction/alpha-rarefaction.qzv
- Download corresponding phylogenetic tree (rooted) and import
- Core metrics analysis
- Taxonomic classification using vsearch (
qiime feature-classifier classify-consensus-vsearch
)
mkdir Taxonomic-Analysis qiime feature-classifier classify-consensus-vsearch \ --i-query ./Dada2_output/rep-seqs.qza \ --i-reference-reads ./ClosedReferenceOTU/99_otus_gg.qza \ --i-reference-taxonomy ./ClosedReferenceOTU/ref-99-taxonomy.qza \ --o-classification ./Taxonomic-Analysis/vsearch_gg_classification.qza
**Here I get the following problems:
There were some problems with the command:
(1/5) Missing option '--i-query'.
(2/5) Missing option '--i-reference-reads'.
(3/5) Missing option '--i-reference-taxonomy'.
(4/5) Missing option '--o-classification'. ("--output-dir" may also be used)
(5/5) Got unexpected extra argument ( )
Besides this final error, I have some other doubts that make me think I'm not doing the analysis in a correct way:
- My FastaQ files have the following structure:
@YFMOF:00012:00014
GACGTCATCCCCCCACCTTCCTCCAGTTTATCACTGGCAGTCTCCTTTGAGTCCGGCCTAACCGCTGGCAACAAAGGATAGGTG
+
?@?B6;<6<<<<<666=6D@=@@;;;6,666;;550555554;4/;6;../-)-353773C?:;>C>A;800/+/).---)--
@YFMOF:00014:00022
GTTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGGTCAAGTCATCATGCCCTTACGACCAGGGCTACACACG
+
@?@?CDEA?@DC@@@CCA<<//.+0.-)--)-)-36-0+.88)..8C??1>=<<+/+--5)--)55@<9.44;;4;40<=@<@C4.44:;A:A6;;BB?;;@A;;;;;;;0;;;.4<?CADACCDC9222;./)---)--329:A999:/:5:@@@<000///---1--
How can I know if they contain barcodes or primers sequences? Is it possible to remove them? Is it completely necessary to remove them if I use this pipeline? (I read something that with dada2 pyro it was not necessary)
-
Which is the difference between the
qiime2 vsearch cluster-features-closed-reference
andqiime feature-classifier classify-consensus-vsearch
? Sorry if the question is a fool one, but I am really new with this kind of analysis… ! -
I have performed the
qiime2 vsearch cluster-features-closed-reference
and now I don’t know what we are supposed to do with the unmatched sequences . When I saw the taxa-bar-plot of my sequences I was really surprised at first, because I didn’t get any unassigned %, but this is not actually true, is it? Because that taxa-bar-plot doesn’t take into account the unmatched sequences, it just separates them. Then, which is the next step to be able to characterize that sequences? Or do we just “forget” about them for the taxonomy? -
Should be better to separate the sequences based on their 16S region? I was thinking that maybe my high % of unmapped sequences is due to having sequences from 7 regions and mapping them with Greengenes reference database that is focused only in region V4... I wonder if it would be a better idea to take just my V4 sequences and remake the analysis with them. If the answer is yes... How can I do that separation? (I read something about this in How to separate hypervariable regions sequenced with 16s Metagenomics Ion torrent - #2 by Lauren , but I don't know how to do ir exactly.
I know there's so much information, doubts and problems here, but as I said at the beginning, I am really new in this kind of analysis and I have been using qiime2 just for a month now...
Thank you so much in advanced
Best regards!