Trouble in alignment and taxonomy assignment for dataset with big dataset

Hi, i am quite new for this analysis.
I am running analysis of microbiota for gastric biopsy samples of 250 patients.
I already tried to use the workflow suggested in the Moving Picture tutorial (dada2, denoise etc) using a small trial dataset (8 samples). However, the barplot of taxonomy showed 80% are unnasigned.
So i tried to use the closed reference by vsearch, and I could have less “unassigned” in the taxonomy barplot.
I was performing these 7 steps on the small trial dataset and succeed to get taxonomy barplot analysis. However, i got troubles when i run the dataset contains 250 samples.

So my steps were as follows:

  1. Import paired fastq

  2. Perform dereplicate

  3. Closed reference clustering by vsearch (not plugin one)
    vsearch -usearch_global seqs.fna -db ./database/SILVA_138_SSURef_NR99_tax_silva.fasta - strand plus -id 0.99 --blast6out blast_mapping_silva.blast6 --matched seq_mapping_silva.fasta -uc otu_table_mapping_silva.uc

  4. convert uc file into biom table.

  5. import biom table into the featuretable.qza for qiime2 and the seq_mapping_silva.fasta as sequence.qza

  6. Create phylogenetic tree
    when i run this , i get error because my sequence is more than 1 million
    so i cannot continue analyzing core matrics alpha beta diversity .

  7. Then i tried to assign taxonomy by running
    qiime feature-classifier classify-sklearn --i-classifier ./database/silva-138-99-nb-classifier.qza --i-reads seq_mapping_silva_TRIAL.qza --o-classification
    It took me several days to assign taxonomy.

Do you have any suggestions for me?

Hi @kartika,

Welcome to the :qiime2: forum!

The nice thing about closed reference clustering is that you just inherit your tree and taxonomy from the database. You don’t need do a new taxonomic classification or build a tree. Just import the one that already exists and continue with your analysis.

Best,
Justine

1 Like

Thank you for your answer Justine,

It is really nice to hear that I don`t have to make tree and assign taxonomy, but my next problem is: I do not know how to obtain these data.

Blockquote

Just import the one that already exists and continue with your analysis.

Blockquote

I tried again to do a vsearch to obtain otu table in biom format and the sequences.

vsearch -usearch_global ./rep-seqs/dna-sequences.fasta -db ./database/gg_13_5_tax_with_hOTUs_99_reps.fasta
-strand plus -id .99 -threads 8
-uc closedref.99.map.uc --biomout closedref.99.biom --matched checkedgg_otu.fna --blast6out checkedgg.blast6

so my output files are

  1. uc file
  2. biom file
  3. sequences file
  4. blast file

my question is :
what steps i need to do if i want to get

  1. core metric analysis in qiime2
  2. taxonomy barplot

I really appreciate it if you can help me through this.

Hi @kartika,

When you downloaded the greengenes database you used, there should be an accompanying taxonomy and tree file. It looks like you may have modified the database so you may need to track down the modifications.

Once you locate your taxonomy and tree files, taxonomy is (probably) imported as headerless taxonomy:

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path [input taxonomy path] \
  --output-path [output dir]/taxonomy.qza

Your tree is probably in newick format, so the import would most likely be:

qiime tools import \
  --input-path [input path].tre \
  --output-path [output dir]/tree.qza \
  --type 'Phylogeny[Rooted]'

Once you have your tree.qza and taxonomy.qza, you can follow the tutorial of your choice to generate taxonomy barplots and alpha and beta diversity.

Best,
Justine