Qiime Analysis pipeline

mallika · April 22, 2021, 9:16am

hello
I'm new to metagenomic pipelines. Currently, I have 500 sample of crohns patients(feces) and 467 samples of healthy patients (feces). All these sample files are in fasta format as follows:

GVJ07HB01AWJS1_cs_nbp_rc cs_nbp=28-337 sample=C_4022_01_S1 rbarcode=GACTCTGA primer=V1-V2 subject=4022 body_site=stool center=UPENNBL barcode_mismatch=0 primer_mismatch=0
ACTAGGCGTTAACACATAGCAAGCGAGGGGACGAGCATCATCAAAGCTTGCTTTGATGGATGGCGACCGGCGGCACGGTGAGTAACACGTATCCAACCTGCCGACAACACTGGGATAGCCTTTCGAAAGAAAGATTAATACCGGATGGCATAATTTTCCCGCATGGGATAATTATTAAAGAATTTCGGTTGTCGATGGGGGATGCGTTCCATTAGGCAGTTGGCGGGGTAACGGCCCACCAAGACAACGATGGATAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACTGAGACACGGTCCAAACTCC
GVJ07HB01C9SIH_cs_nbp_rc cs_nbp=28-381 sample=C_4022_01_S1 rbarcode=GACTCTGA primer=V1-V2 subject=4022 body_site=stool center=UPENNBL barcode_mismatch=0 primer_mismatch=0
TGGTAAGAAGTTTGTAGTCCTGGCGTCAGGATGAACGCTGCGGCGTGCCTAACACATGCAAGTCGAGCGTAAGCGGTTTTAGGAAGTTTTCGGATGGATTAAACTGACTGAGCGGCGGACGGGTGAGTAACGCGTGGGTACCTGCCTCATACAGGGGGATAACAGTTAGAAATGGCTGCTAATACCGCATAAGCACACAGCTTCGCATGGAGCAGTGTGAAAAACTCCGGTGGTATGAGATGGACCCGCGTCTGATTAGCTAGTTGGTAAGGTAACGGCTTACCAAGGCGACGATCAGTAGCCGACCTGAGAGGGTGACCGGCCACATTGGGACTGAGACACGGCCCAAACTCC

After reading up on the forum and the qiime2 website, I came to conclusion that I can combine the 967 fasta files into a single fasta file using qiime1 via virtual machine - add_qiime_labels.py, which looks as follows:

image1253×214 43.4 KB

Reading up further on the forum, I'm to analyze the combined file following through steps from Clustering sequences into OTUs using q2-vsearch — QIIME 2 2021.2.0 documentation
And this is where I have questions and please let me know if the above logic behind the pipeline is correct.
In the OTU clustering tutorial, it essentially clusters based on similarity in denovo step but I believe this tutorial from UCLA (https://qcb.ucla.edu/wp-content/uploads/sites/14/2017/12/QCB_W11-Metagenomics-Analysis_BS_day2.pdf ) asks me to know a few things like which region it is based on before I choose between different OTU methods. I am unaware as to how to find the region of my data (data is from HMP)

Is there a filtering step that I'm missing?

The end product from OTU step via either of three methods would be a rep-seq file and table artifact, which would be piped into the moving tutorial picture at featuretable and featuresummaries step ? Would this pipeline be accurate for a foolproof analysis?

Thanks!

thermokarst · May 10, 2021, 2:38pm

Hi @mallika, I am so sorry for the slow reply, I'm not sure how this post got missed by our moderation team (myself included). The UCLA docs you linked to are for QIIME 1, but much of the reasoning outlined there still holds true in QIIME 2.

This is really good information to have, and you will almost certainly want/need this in other aspects of your analysis. As well, for any publications that might come out of this analysis, you'll most likely need to report that information, as well. If you don't know it, please contact your sequencing center. In the case of the HMP, I think you'll need to reference the source publication(s) to learn about the specific samples you might have downloaded.

Yes!

It's not really much of a pipeline, just one initial step along the path of a full analysis. Unfortunately there are many more places that things can go wrong - if you run into problems though, you know where to find us!

:qiime2:

system · June 10, 2021, 8:39pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.