Hi:
I want to build a fungal ITS tree based on ghost-tree, but I can't understand the author's tutorial very well, the following are the steps I wrote by myself, some of the steps I'm confused, can someone modify it for me? Any help is greatly appreciated:
- build a pre-built ghost tree by my self, because I I don't know how to choose 0.80, 0.90, 1.00 ghost-trees. Can I build 0.99 ghost-trees?
qiime tools import
--input-path SILVA_132_SSURef_Nr99_tax_silva_full_align_trunc.fasta
--type FeatureData[AlignedSequence] --input-format AlignedRNAFASTAFormat
--output-path SILVA_132_SSURef_Nr99_tax_silva_full_align_trunc.qza
Silva Taxonomy File
qiime tools import
--input-path tax_slv_ssu_132.txt
--type SilvaTaxonomy
--output-path tax_slv_ssu_132.qza
--input-format SilvaTaxonomyFormat
Silva Accession ID Map
qiime tools import
--input-path tax_slv_ssu_132.acc_taxid
--type SilvaAccession
--output-path tax_slv_ssu_132.acc_taxid.qza
--input-format SilvaAccessionFormat
extract fungi
qiime ghost-tree extract-fungi
--i-aligned-silva-file SILVA_132_SSURef_Nr99_tax_silva_full_align_trunc.qza
--i-accession-file tax_slv_ssu_132.acc_taxid.qza
--i-taxonomy-file tax_slv_ssu_132.qza
--o-aligned-seqs silva_fungi_only_full_aligned_132.qza
### Filter alignment positions, I don't understand how these two parameters are set: 0.9, 0.8
qiime ghost-tree filter-alignment-positions \
--i-aligned-sequences-file silva_fungi_only_full_aligned_132.qza \
--p-maximum-gap-frequency 0.9 \
--p-maximum-position-entropy 0.8 \
--o-aligned-seqs silva_fungi_only_full_aligned_132_FILTERED.qza
ghost-tree extensions group-extensions \
'sh_refs_qiime_ver8_97_10.05.2021.fasta' 0.97 \
'otu_map_97_qiime_ver8_97_10.05.2021.txt'
ghost-tree scaffold hybrid-tree-foundation-alignment \
'otu_map_97_qiime_ver8_97_10.05.2021.txt' \
'sh_taxonomy_qiime_ver8_97_10.05.2021.txt' \
'sh_refs_qiime_ver8_97_10.05.2021.fasta' \
'silva_fungi_only_full_aligned_132_FILTERED.fasta' \
'ghost_tree_97_qiime_ver8_97_10.05.2021
### So, I got the pre-ghost-tree: ghost_tree_97_qiime_ver8_97_10.05.2021 (ghost_tree.nwk and ghost_tree_extension_accession_ids.txt)
#### 2. the results of DADA2 was used reclustered:
time qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux-trimmed.qza \
--p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 0 --p-trunc-len-r 0 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza
qiime vsearch cluster-features-closed-reference \
--i-table table.qza \
--i-sequences rep-seqs.qza \
--i-reference-sequences sh_refs_qiime_ver8_97_10.05.2021.qza \
--p-perc-identity 0.97 \
--o-clustered-table table-cr-97.qza \
--o-clustered-sequences rep-seqs-cr-97.qza \
--o-unmatched-sequences unmatched-cr-97.qza
#### At the last, I only filter the 0.97 pre-built ghost tree to match the IDs inside my table-cr-97.qza file? Is the above process correct? What changes should I make? Thank you very much