Training classifier for archaea and no archaea left.

Hi,could you please help me out,thank you !!
I am using qiime2 2024.2,and need to analyze 16s(archaea), using the protocol,and specific primer, and I can not get any archaea left. Here are the codes.

nohup qiime rescript get-silva-data
--p-version '138.1'
--p-target 'SSURef_NR99'
--o-silva-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-silva-taxonomy silva-138.1-ssu-nr99-tax.qza &

nohup qiime rescript reverse-transcribe
--i-rna-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-dna-sequences silva-138.1-ssu-nr99-seqs.qza > nr99.log 2>&1 &

nohup qiime rescript cull-seqs
--i-sequences silva-138.1-ssu-nr99-seqs.qza
--o-clean-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza &

nohup qiime rescript filter-seqs-length-by-taxon
--i-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza
--i-taxonomy silva-138.1-ssu-nr99-tax.qza
--p-labels Archaea Bacteria Eukaryota
--p-min-lens 900 1200 1400
--o-filtered-seqs silva-138.1-ssu-nr99-seqs-filt.qza
--o-discarded-seqs silva-138.1-ssu-nr99-seqs-discard.qza &

nohup qiime rescript dereplicate
--i-sequences silva-138.1-ssu-nr99-seqs-filt.qza
--i-taxa silva-138.1-ssu-nr99-tax.qza
--p-mode 'uniq'
--o-dereplicated-sequences silva-138.1-ssu-nr99-seqs-derep-uniq.qza
--o-dereplicated-taxa silva-138.1-ssu-nr99-tax-derep-uniq.qza &

nohup qiime feature-classifier extract-reads
--i-sequences silva-138.1-ssu-nr99-seqs-derep-uniq.qza
--p-f-primer CAGCCGCCGCGGTAA
--p-r-primer GTGCTCCCCCGCCAATTCCT
--p-n-jobs 2
--p-read-orientation 'forward'
--o-reads silva-138.1-ssu-nr99-seqs-519f-915r.qza &

nohup qiime rescript dereplicate
--i-sequences silva-138.1-ssu-nr99-seqs-519f-915r.qza
--i-taxa silva-138.1-ssu-nr99-tax-derep-uniq.qza
--p-mode 'uniq'
--o-dereplicated-sequences silva-138.1-ssu-nr99-seqs-519f-915r-uniq.qza
--o-dereplicated-taxa silva-138.1-ssu-nr99-tax-519f-915r-derep-uniq.qza &

nohup qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138.1-ssu-nr99-seqs-519f-915r-uniq.qza
--i-reference-taxonomy silva-138.1-ssu-nr99-tax-519f-915r-derep-uniq.qza
--o-classifier silva-138.1-ssu-nr99-519f-915r-classifier.qza &

please hele me where is wrong thankyou very much!

1 Like

Hi @gn_wang,

Can you please provide more details. Are you not seeing any archaea at all, or just fewer than you expect?

What samples are you sequencing from?

Note: You do not need to follow all the steps from that tutorial, as it is just a set of examples, as described there. You can try skipping the rescript filter-seqs-length-by-taxon step. See this example and this example. I say this as that step might be removing sequences that you need for classification. Or, worst case scenario there are few archaea in your sample.

Have you tried GTDB or RDP? You can download both via RESCRIPt.

2 Likes

Thank you
I am not seeing any archaea at all
My sequencing are 16S rRNA sequencing sequencing,the RNA was extracted from human stool, and used the primer"519f-915r".
I am sure the sample have archaea
Thank you

1 Like

Here is code I used for dada2
echo -e 'sample-id\tforward-absolute-filepath\treverse-absolute-filepath' >/data/gn/file01/manifest.tsv
ls R1.fq.gz|while read id;
do
echo "${id%%
},$PWD/$id,PWD/{id%%_*}_R2.fq.gz" >>/data/file01/manifest.tsv;
done

sed 's/,/\t/g' /data/file01/manifest.tsv > /data/file01/manifest_pe.tsv

nohup time qiime tools import
--type SampleData[PairedEndSequencesWithQuality]
--input-path /data/file01/manifest_pe.tsv
--output-path /data/file01/paired-demux.qza
--input-format PairedEndFastqManifestPhred33V2 > import.log 2>&1 &

nohup time qiime demux summarize
--i-data /data/file01/paired-demux.qza
--o-visualization /data/file01/paired-demux.qzv > paired-demux.log 2>&1 &

nohup time qiime dada2 denoise-paired
--i-demultiplexed-seqs /data/file01/paired-demux.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--o-representative-sequences rep-seqs-dada2.qza
--o-table /data/file01/table-dada2.qza
--o-denoising-stats /data/file01/stats-dada2.qza \

dada2.log 2>&1 &

nohup qiime metadata tabulate
--m-input-file stats-dada2.qza
--o-visualization stats-dada2.qzv > stats-dada2.log 2>&1 &

mv rep-seqs-dada2.qza rep-seqs.qza
mv table-dada2.qza table.qza

qiime feature-table tabulate-seqs
--i-data rep-seqs.qza
--o-visualization rep-seqs.qzv

1 Like

How good are your quality plots? I ask because it is quite rare to use --p-trunc-len-f 0 --p-trunc-len-r 0 unless the quality is quite high. I'd recommend specifying other truncation values. Can you share your dada2 stats QZV file?

-Mike

stats-dada2.qzv (1.2 MB)
Thank you and here is my dada2 status QZV.

That looks really good. Perhaps try playing with the --p-min-fold-parent-over-abundance parameter, to increase your "non-chimeric" read count. This is discussed here.

1 Like

Thank you very much and I will try it. And I will reply later after I try.

1 Like