Training classifier for archaea and no archaea left.

gn_wang · September 8, 2024, 4:01pm

Hi,could you please help me out,thank you !!
I am using qiime2 2024.2,and need to analyze 16s(archaea), using the protocol,and specific primer, and I can not get any archaea left. Here are the codes.

nohup qiime rescript get-silva-data
--p-version '138.1'
--p-target 'SSURef_NR99'
--o-silva-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-silva-taxonomy silva-138.1-ssu-nr99-tax.qza &

nohup qiime rescript reverse-transcribe
--i-rna-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-dna-sequences silva-138.1-ssu-nr99-seqs.qza > nr99.log 2>&1 &

nohup qiime rescript cull-seqs
--i-sequences silva-138.1-ssu-nr99-seqs.qza
--o-clean-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza &

nohup qiime rescript filter-seqs-length-by-taxon
--i-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza
--i-taxonomy silva-138.1-ssu-nr99-tax.qza
--p-labels Archaea Bacteria Eukaryota
--p-min-lens 900 1200 1400
--o-filtered-seqs silva-138.1-ssu-nr99-seqs-filt.qza
--o-discarded-seqs silva-138.1-ssu-nr99-seqs-discard.qza &

nohup qiime rescript dereplicate
--i-sequences silva-138.1-ssu-nr99-seqs-filt.qza
--i-taxa silva-138.1-ssu-nr99-tax.qza
--p-mode 'uniq'
--o-dereplicated-sequences silva-138.1-ssu-nr99-seqs-derep-uniq.qza
--o-dereplicated-taxa silva-138.1-ssu-nr99-tax-derep-uniq.qza &

nohup qiime feature-classifier extract-reads
--i-sequences silva-138.1-ssu-nr99-seqs-derep-uniq.qza
--p-f-primer CAGCCGCCGCGGTAA
--p-r-primer GTGCTCCCCCGCCAATTCCT
--p-n-jobs 2
--p-read-orientation 'forward'
--o-reads silva-138.1-ssu-nr99-seqs-519f-915r.qza &

nohup qiime rescript dereplicate
--i-sequences silva-138.1-ssu-nr99-seqs-519f-915r.qza
--i-taxa silva-138.1-ssu-nr99-tax-derep-uniq.qza
--p-mode 'uniq'
--o-dereplicated-sequences silva-138.1-ssu-nr99-seqs-519f-915r-uniq.qza
--o-dereplicated-taxa silva-138.1-ssu-nr99-tax-519f-915r-derep-uniq.qza &

nohup qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138.1-ssu-nr99-seqs-519f-915r-uniq.qza
--i-reference-taxonomy silva-138.1-ssu-nr99-tax-519f-915r-derep-uniq.qza
--o-classifier silva-138.1-ssu-nr99-519f-915r-classifier.qza &

please hele me where is wrong thankyou very much!

SoilRotifer · September 9, 2024, 5:17pm

Hi @gn_wang,

Can you please provide more details. Are you not seeing any archaea at all, or just fewer than you expect?

What samples are you sequencing from?

Note: You do not need to follow all the steps from that tutorial, as it is just a set of examples, as described there. You can try skipping the rescript filter-seqs-length-by-taxon step. See this example and this example. I say this as that step might be removing sequences that you need for classification. Or, worst case scenario there are few archaea in your sample.

Have you tried GTDB or RDP? You can download both via RESCRIPt.

gn_wang · September 11, 2024, 4:18am

Thank you
I am not seeing any archaea at all
My sequencing are 16S rRNA sequencing sequencing,the RNA was extracted from human stool, and used the primer"519f-915r".
I am sure the sample have archaea
Thank you

gn_wang · September 11, 2024, 7:30am

Here is code I used for dada2
echo -e 'sample-id\tforward-absolute-filepath\treverse-absolute-filepath' >/data/gn/file01/manifest.tsv
ls R1.fq.gz|while read id;
do
echo "${id%%},$PWD/$id,PWD/{id%%_*}_R2.fq.gz" >>/data/file01/manifest.tsv;
done

sed 's/,/\t/g' /data/file01/manifest.tsv > /data/file01/manifest_pe.tsv

nohup time qiime tools import
--type SampleData[PairedEndSequencesWithQuality]
--input-path /data/file01/manifest_pe.tsv
--output-path /data/file01/paired-demux.qza
--input-format PairedEndFastqManifestPhred33V2 > import.log 2>&1 &

nohup time qiime demux summarize
--i-data /data/file01/paired-demux.qza
--o-visualization /data/file01/paired-demux.qzv > paired-demux.log 2>&1 &

nohup time qiime dada2 denoise-paired
--i-demultiplexed-seqs /data/file01/paired-demux.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--o-representative-sequences rep-seqs-dada2.qza
--o-table /data/file01/table-dada2.qza
--o-denoising-stats /data/file01/stats-dada2.qza \

dada2.log 2>&1 &

nohup qiime metadata tabulate
--m-input-file stats-dada2.qza
--o-visualization stats-dada2.qzv > stats-dada2.log 2>&1 &

mv rep-seqs-dada2.qza rep-seqs.qza
mv table-dada2.qza table.qza

qiime feature-table tabulate-seqs
--i-data rep-seqs.qza
--o-visualization rep-seqs.qzv

SoilRotifer · September 11, 2024, 12:54pm

How good are your quality plots? I ask because it is quite rare to use --p-trunc-len-f 0 --p-trunc-len-r 0 unless the quality is quite high. I'd recommend specifying other truncation values. Can you share your dada2 stats QZV file?

-Mike

gn_wang · September 26, 2024, 7:17am

stats-dada2.qzv (1.2 MB)
Thank you and here is my dada2 status QZV.

SoilRotifer · September 26, 2024, 6:16pm

That looks really good. Perhaps try playing with the --p-min-fold-parent-over-abundance parameter, to increase your "non-chimeric" read count. This is discussed here.

gn_wang · October 27, 2024, 12:38pm

Thank you very much and I will try it. And I will reply later after I try.

system · November 27, 2024, 6:39pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.