hi everyone,
I have 16S rRNA sequencing data from 4 runs: run1: 2x150 bp paired end sequencing using illumina miseq ; run2: 2x300 bp paired end using illumina hiseqX; run3: 2x300 bp paired end using miseq; run 4: single end sequencing using ion torrent. and i want to do a comparative analysis for which i am using qiime2.
This is what i have done till now:
i downloaded fastq.gz files from ncbi using SRA accession id and run FastQC which showed me presence of Illumina universal adapter in run 1 and 2 only which i removed using cutadapt. Then, i merged the paired end reads of run 1,2 and 3 followed by quality-filter q-score method for all the samples of 4 runs. then, i performed dereplication of all of these 4 run samples which gave me feature table and feature data as output. next, i proceeded with chimera removal using both denovo and reference based approaches, for which i got 3 output files [chimeric sequences, non-chimeric sequences and stats file].
Now, my queries are:
if i proceed with taxonomic annotation, how do i generate feature table for the above generated non-chimeric sequences?
this is what the stats file generated after chimera removal looks like:
Hi @Harshita_Sharma,
Seems like you are doing really exciting research. Comparative analysis are tricky.
I am personally getting alittle lost in all your steps.
Would you mind giving us a list of exact commands you ran? I think this might help our mods better understand what you are running and how to get you where you want to go!
@cherman2 Thank you for your valuable input.
This is the list of QIIME2 methods that i have used for my analysis till now:
fastqc
cutadapt trim-paired
vsearch join-pairs
quality filter q-score
vsearch dereplicate-sequences
feature-table filter-features
vsearch uchime-ref
feature-table merge-seqs
feature-table merge
feature-classifier: for this feature i am unable to use the SILVA database as reference: it always shows this error: "Plugin error from feature-classifier:
** The scikit-learn version (0.24.1) used to generate this artifact does not match the current version of scikit-learn installed (1.4.2). Please retrain your classifier for your current deployment to prevent data-corruption errors."**
What do you suggest i should do about it?
Also, using the input provided by you for feature table generation of nonchimeric sequences, i was able to execute it properly, Thank you again for that.
Thank you for posting your full pipeline and full error. Here is the core of the error that also tells us what to try next:
You can retrain as suggested, or download a copy of the SILVA database pre-trained on the correct version of scikit-learn on this page: https://resources.qiime2.org/
Let us know if you are able to find a version of silva that matches and get this command running!