Hi, I am trying to analyze a previously published study in qiita (Qiita) as part of a class. It appears to have six prep templates that make up the 1500 samples. I assume they must have added each sequencing lane/run as a prep template.
I am using the 150bp deblur feature tables (the ref-hit outputs). I am a little unclear about how to merge seqs.fa files in order to build a phylogenetic tree.
I would like to clarify some things based on this previous thread
Just to clarify, the reference-hit.seqs.fa files include all sequences, or is it just the representative set?
Put another way - In the command that Daniel suggests, am I inputing the reference-hit.seqs.fa that includes all sequences in the entire feature table (not just unique) and getting an output of rep seqs?
qiime tools import --input-path reference-hit.seqs.fa --output-path rep-seqs.qza --type FeatureData[Sequence].
My major question here is how do I make a single phylogenetic tree using the six reference-hit.seqs.fa files that I will download from the six prep template files on qiita?
indeed, Qiita stores each different processing as a separate “prep”. This can be either different sequencing run or different sequencing conditions (e.g. the use of different primers)
if you want to build your tree from reference-hit.seqs.fa files alone, a way to go would be to merge all of the fasta files together first
Just to complement @tomasz’s reply. Note that you can create a meta-analysis (combine all the bioms) and generate a single BIOM for all the studies, once you have that file, you can follow this example to get a phylogenetic tree: Importing deblur data from QIITA.