What to do about a single non-corresponding tree tip?

EmerickL · August 1, 2018, 4:37pm

Greetings Qiime community, I typically do analysis in R after classification, tree building and BIOM exporting but am giving QIIME2's other features a go.

I am starting with alpha and beta diversity metrics but have run into the following error:

All feature_ids must be present as tip names in phylogeny. feature_ids not corresponding to tip names (n=1): 429b5b12899efa00c5d61dc11c424c7f

I could not find this feature ID in my table.qza after performing a search.

I am not sure how to remedy this. My next approach was to go into the tree.nwk, which is upstream in the pipeline of the midpoint rTree (rooted-tree-filtered.qza) being used in the diversity commands. I opened the .nwk in a text editor and searched there. Still didnt find this missing tip/feature ID.

Here are the scripts I'm running on my University's cluster.

Import:

# ----------------Load Modules--------------------
module load qiime2/2018.4

# ----------------Housekeeping---------------------
#rm -r demux*.q*
cd data

# ----------------Commands------------------------

#Import Data in qiime2 artifact
qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path /ufrc/strauss/emerickl/WCT/data/raw_data \
  --source-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza

qiime demux summarize \
  --i-data demux-paired-end.qza \
  --o-visualization demux-paired-end.qzv

DADA

 cp data/demux-paired-end.qza features/dada2input.qza

 qiime dada2 denoise-paired \
      --i-demultiplexed-seqs dada2input.qza \
      --output-dir output \
      --p-n-threads 14 \
      --o-table table.qza \
      --o-representative-sequences rep-seqs.qza \
      --p-trunc-len-f 251  --p-trunc-len-r 250

Feature table

qiime feature-table summarize \
  --i-table table.qza \
  --o-visualization table.qzv \
  --m-sample-metadata-file 	#PATH TO VALIDATED MAPPING FILE

qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv

qiime diversity alpha-rarefaction \
  --i-table table.qza \
  --o-visualization alpha-rarefaction.qzv \
  --p-max-depth 8200 		
  --p-metrics chao1,simpson,shannon 
  --m-metadata-file			#PATH TO VALIDATED MAPPING FILE

Taxonomy (classifier trainer at bottom)

qiime feature-classifier classify-sklearn \
 --i-reads rep-seqs.qza \
  --o-classification taxonomy.qza \
  --i-classifier 	/ufrc/strauss/emerickl/SILVA_nb_99_V3-V4.qza

qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv

qiime taxa barplot \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --o-visualization taxa-bar-plots.qzv \
  --m-metadata-file WCTmetaData.tsv

BIOM export and other stuff

qiime tools export \
  table.qza \
  --output-dir ../biom

qiime tools export \
  taxonomy.qza \
  --output-dir ../biom

module load qiime/1.9.1

cd ../biom

biom convert \
  -i feature-table.biom \
  -o feature-json.biom \
  --table-type="OTU table" \
  --to-json

sed -i s/Taxon/taxonomy/ taxonomy.tsv | sed -i s/Feature\ ID/FeatureID/ taxonomy.tsv

biom add-metadata \
  -i feature-json.biom \
  -o feature_w_tax.biom \
  --observation-metadata-fp taxonomy.tsv \
  --observation-header FeatureID,taxonomy,Confidence \
  --sc-separated taxonomy --float-fields Confidence

filter_samples_from_otu_table.py \
  -i feature_w_tax.biom \
  -o filtered-table.biom \
  -n 5000	#Pay attention to this number, change it according to the table visualization

filter_taxa_from_otu_table.py \
  -i filtered-table.biom \
  -o table_wo_chl_mit.biom \
  -n D_2__Chloroplast,D_4__Mitochondria

normalize_table.py \
  -i table_wo_chl_mit.biom \
  -a DESeq2 \
  --DESeq_negatives_to_zero \
  -o DESeq2_table.biom

biom add-metadata \
  -i DESeq2_table.biom \
  -o DESeq2_w_tax.biom \
  --observation-metadata-fp taxonomy.tsv \
  --observation-header FeatureID,taxonomy,Confidence \
  --sc-separated taxonomy --float-fields Confidence

normalize_table.py \
  -i table_wo_chl_mit.biom \
  -a CSS \
  -o CSS_table.biom

biom convert \
 -i table_wo_chl_mit.biom \
 -o feature-table.tsv \
 --to-tsv \
 --table-type "OTU table"

sed -i s/"#OTU ID"/FeatureID/ feature-table.tsv
sed -i '1d' feature-table.tsv

Build Trees

qiime feature-table filter-seqs \
 --i-data ../features/rep-seqs.qza \
 --m-metadata-file feature-table.tsv \
 --p-no-exclude-ids \
 --o-filtered-data rep-seqs-filtered.qza

qiime alignment mafft \
  --i-sequences rep-seqs-filtered.qza \
  --p-n-threads 12 \
  --o-alignment aligned-rep-seqs-filtered.qza

qiime alignment mask \
  --i-alignment aligned-rep-seqs-filtered.qza \
  --o-masked-alignment masked-aligned-rep-seqs-filtered.qza

qiime phylogeny fasttree \
  --i-alignment masked-aligned-rep-seqs-filtered.qza \
  --o-tree unrooted-tree-filtered.qza

qiime phylogeny midpoint-root \
  --i-tree unrooted-tree-filtered.qza \
  --o-rooted-tree rooted-tree-filtered.qza

qiime tools export \
  rooted-tree-filtered.qza \
  --output-dir .

SciKit train

qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path  SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fa \
  --output-path SILVA_132_99_otus.qza

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --source-format HeaderlessTSVTaxonomyFormat \
  --input-path  SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt \
  --output-path SILVA_132_99_tax.qza

qiime feature-classifier extract-reads \
  --i-sequences SILVA_132_99_otus.qza \
  --p-f-primer GTGYCAGCMGCCGCGGTAA  \
  --p-r-primer  GGACTACNVGGGTWTCTAAT \
  --p-trunc-len  300 \
  --o-reads SILVA_132_99_otus_515-926.qza

qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads SILVA_132_99_otus_515-926.qza \
  --i-reference-taxonomy SILVA_132_99_tax.qza \
  --o-classifier SILVA_nb_99_V3-V4.qza

EmerickL · August 3, 2018, 1:14am

I wasnt sure how to search for the id directly in the feature_table.BIOM inside the table.qza compressed archive, so I converted it to a .tsv and still did not find this missing id.

thermokarst · August 3, 2018, 4:28am

Where is this file coming from? This looks different than the --o-representative-sequences rep-seqs.qza from your DADA2 step. Did you filter these seqs? If so, then your tree will be constructed using fewer features than your table, which would clarify the cause of this error message. You can filter your table down to match the features present in your rep-seqs.

EmerickL · August 3, 2018, 1:20pm

I ran:

qiime feature-table filter-features \
  --i-table features/table.qza \
  --m-metadata-file biom/feature-table.tsv \
  --o-filtered-table filtered-table.qza

after making sure featuretable.tsv (extracted from the table.qza) didnt have the 1 feature id not in the tree, I was able to successfully run:

qiime diversity core-metrics-phylogenetic \
  --i-phylogeny biom/rooted-tree-filtered.qza \
  --i-table features/table.qza \
  --p-sampling-depth 5590 \
  --m-metadata-file features/WCTmetaDataQ2.tsv \
  --output-dir core-metrics-results

system · September 3, 2018, 7:28pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.