Hi QIIME2 community,
My colleague @Lauren I have been using the Ion torrent 16S Metagenomics Kit to sequence some clinical samples that we have. Our sample sets are both longitudinal. The kit consists of two primer sets, one for V2-4-8 and one for V3-6,7-9. The products of the reactions are pooled per sample, and then the rest of library prep is completed. Each sample gets its own barcode. Ion Torrent produces single end reads.
The issue here is that the reads from each of the 6 variable regions (in forward and reverse orientations) are all given to the user in one fastq file per sample on the server for the sequencing machine. Therefore, there are essentially 12 amplicons within the one fastq file. On top of this, the primer sequences from the kit are proprietary, so we cannot easily separate them (we are not bioinformaticists by training).
We spoke with @ebolyen and others on the QIIME2 team (@gregcaporaso, @thermokarst, @Mehrbod_Estaki) at the QIIME2 Workshop at the NIH in early January and got some ideas on how we could possibly analyze our data, and we wanted to re-hash it here in case anyone had any ideas on how we could move forward. We are graduate students, and this is also very valuable clinical data, so we would really like to be able to analyze it! We also know that others are having the same issues.
Possible Analysis Pipeline:
- Import fastq files using manifest file format
- Perform separate
DADA2
step for each sequencing run, then mergeDADA2
results
- Use
denoise-pyro
option p-trim-left 15
- Do not truncate`
- Explore
feature table
- Create phylogenetic tree using
SEPP (fragment insertion)
-
Alpha rarefaction
and selection of rarefaction depth using diagnostic curves -
Core metrics analysis
(only metrics which include phylogeny)
- alpha diversity --> Faith’s PD only
- beta diversity --> Weighted and Unweighted Unifrac only
-
Taxonomic classification
(from feature table in step 3) using pre-trained full length 16S Green Genes classifier
-
@Nicholas_Bokulich in your reply to the thread here, you suggest this classifier for Ion Torrent data but later suggest
VSEARCH
instead. Could you please clarify?
Going off of another thread we saw on here, other people are also wondering how to align reads to the 16S gene to possibly separate the amplicons. One new method we saw was SMURF. Any thoughts on that, anyone? Again, we do not have the primer sequences.
We just wanted to check and see if A) this is an acceptable way to move forward and B) if anyone has any further ideas on how we could separate the amplicons so we can do further (ie. longitudinal and ANCOM) analyses. Any and all input would be appreciated. Thank you!