comparative analysis of 16S rRNA sequencing data of different runs

Harshita_Sharma · August 1, 2024, 2:54pm

hi everyone,
I have 16S rRNA sequencing data from 4 runs: run1: 2x150 bp paired end sequencing using illumina miseq ; run2: 2x300 bp paired end using illumina hiseqX; run3: 2x300 bp paired end using miseq; run 4: single end sequencing using ion torrent. and i want to do a comparative analysis for which i am using qiime2.
This is what i have done till now:
i downloaded fastq.gz files from ncbi using SRA accession id and run FastQC which showed me presence of Illumina universal adapter in run 1 and 2 only which i removed using cutadapt. Then, i merged the paired end reads of run 1,2 and 3 followed by quality-filter q-score method for all the samples of 4 runs. then, i performed dereplication of all of these 4 run samples which gave me feature table and feature data as output. next, i proceeded with chimera removal using both denovo and reference based approaches, for which i got 3 output files [chimeric sequences, non-chimeric sequences and stats file].
Now, my queries are:

if i proceed with taxonomic annotation, how do i generate feature table for the above generated non-chimeric sequences?
this is what the stats file generated after chimera removal looks like:

qiime2query1366×768 195 KB

How to interpret it?
Am i goin in right direction ?
Someone Please help me out!!!!!!

cherman2 · August 7, 2024, 11:08pm

Hi @Harshita_Sharma,
Seems like you are doing really exciting research. Comparative analysis are tricky.

I am personally getting alittle lost in all your steps.

Would you mind giving us a list of exact commands you ran? I think this might help our mods better understand what you are running and how to get you where you want to go!

I believe you need to filter your feature-table using the non-chimeric sequences like in this post: Error with vsearch uchime-denovo output - #8 by Oddant1

Here is more info regarding the columns and what they mean: Galaxy |

Harshita_Sharma · August 9, 2024, 8:48am

@cherman2 Thank you for your valuable input.
This is the list of QIIME2 methods that i have used for my analysis till now:

fastqc
cutadapt trim-paired
vsearch join-pairs
quality filter q-score
vsearch dereplicate-sequences
feature-table filter-features
vsearch uchime-ref
feature-table merge-seqs
feature-table merge
feature-classifier: for this feature i am unable to use the SILVA database as reference: it always shows this error: "Plugin error from feature-classifier:

** The scikit-learn version (0.24.1) used to generate this artifact does not match the current version of scikit-learn installed (1.4.2). Please retrain your classifier for your current deployment to prevent data-corruption errors."**
What do you suggest i should do about it?
Also, using the input provided by you for feature table generation of nonchimeric sequences, i was able to execute it properly, Thank you again for that.

colinbrislawn · August 9, 2024, 4:25pm

Hello Harshita,

Thank you for posting your full pipeline and full error. Here is the core of the error that also tells us what to try next:

You can retrain as suggested, or download a copy of the SILVA database pre-trained on the correct version of scikit-learn on this page: https://resources.qiime2.org/

Let us know if you are able to find a version of silva that matches and get this command running!

P.S. Yes, you are going in the right direction!

system · September 9, 2024, 10:25pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.