taxonomy using QIIME2 at similarity 97%

Hi @colinbrislawn
I was using QIIME2 for amplicon sequence from 454 gsjunior sequencer and I want to enquire if with QIIME2 can I do taxonomy and OTU clustering with greengenes at a similarity 97%?
If yes then what commands should I use?

Thanks in Advance
Saraswati Awasthi

Yes!

Qiime2 supports OTU clustering in a variety of ways, as shown in this part of the Overview Tutorial:
https://docs.qiime2.org/2024.10/tutorials/overview/#clustering

We have a full tutorial about all these classic OTU methods:
https://docs.qiime2.org/2024.10/tutorials/otu-clustering/

There are many things you can do with Qiime2. Let us know if you have more questions.
(I've been pretty busy these days, but even if I can't answer, someone else on the forums will!)

1 Like

Below was the pipleline followed to analyze the GS junior data with QIIME1, but now I want to the analysis at 97% similarity with QIIME2 and the same parameters for the QC (which are mentioned below) of the samples so can you tell me the commands that can be used:
QC:

Raw sequencing reads were quality trimmed using the QIIME pipeline (v 1.8.0) using the default parameters with the following modifications: one mismatch was allowed in the forward primer sequence and two mismatches were allowed in the reverse primer sequences; and barcode length was set to 11bp. The minimum average quality score was set at 25 (default). The sequences were denoided using the denoiser program. The denoised reads were clustered into operational taxonomic units (OTUs) with a 97% cutoff using Usearch v 5.2.326. At this stage potential chimeras were removed using de novo and reference based chimera detection in Uchime. Singletons were removed from the subsequent analysis.

Diversity:

The representative sequences of OTUs were chosen with default parameters, taxonomy was assigned to representative sequence of each OTU, the representative sequences were aligned to the greengenes’ core reference alignment (v 13_8) and filtered based on the pipelines default parameters. The phylogenetic tree and an OTU table were constructed. OTUs classified as eukaryotic chloroplasts were discarded from the OTU table.

Alpha diversity was calcultaed for a random subset of sequences from QIIME. For anlaysis of taxa at each of the individual site, the OTU table was summarised at different taxonomic levels. All charts and bar graphs used to vizualize diversity at higher taxa levels were generated in excel. SIMPER anlayis was carried out using PAST (v 3.0) to obtain OTUs differentiating the drain and river sites. Bubble graphs used to represent the result of the SIMPER analysis were generated in R. Season or site specific OTUs and core of drains were deciphered using QIIME scripts. Seasonal variation in OTUs were visualized using the heatmaps generated in R. Correlation studies and linearity studies were performed in R. Bioenv (or BEST) analysis was conducted using QIIME scripts to identify which environmental factors best described variations in the bacetrial community structure. For ecological analyses, the number of sequences was normalised and analyzed using cluster analyses with Bray-Curtis similarity measurement in PAST using 1000 bootstraps.

Thanks in Advance
Saraswati

The commands to replicate OTU clustering in Qiime2 are outlined in the OTU clustering tutorial I linked above.

Where you able to run all the commands in that tutorial with the example data?
Where you able to import and run them with your real gsjunior 454 data?

Hey @colinbrislawn !
Thanks for responding!
I was successfully able to import the files and then did denoising using dada2, but I am doubtful about denoising because most of the reads are not passing the filter. I am posting the screenshot of the denoised summary of the sequences. In only one of the samples, 95% reads passed the filter. Let me know the modifications required in this command:
qiime dada2 denoise-pyro --i-demultiplexed-seqs /home/storage10TB/GARIMA_AMPLICON/Files/F/single_end_demux.qza --p-trunc-len 320 --o-representative-sequences /home/storage10TB/GARIMA_AMPLICON/Files/F/rep_seqs.qza --o-table /home/storage10TB/GARIMA_AMPLICON/Files/F/table.qza --o-denoising-stats /home/storage10TB/GARIMA_AMPLICON/Files/F/denoising_stats.qza --p-n-threads 0

Thanks
Saraswati

I think you might consider trying these options within qiime dada2 denoise-pyro:

  • Set --p-trunc-len 250 (or something shorter)? It's hard to say w/o viewing the quality plot. Basically, you want to remove the poor quality ends to help the denoiser.
    • Also, do you know if the primers are contained within the 454 sequence?
    • Both inappropriate quality trimming and inadvertently retaining primer sequences within the data can trick the denoiser into thinking that the reads are chimeric. If the primers are contained within the reads, try using cutadapt on these reads prior to denoising.
  • Last resort, perhaps consider setting --p-min-fold-parent-over-abundance to 2 or 4? I'm not sure if you should go above this value for this data.
2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.