Denoise paired issues

I found that four samples(four groups) is correctly to taxonmy, but five samples(five groups) is not correctly to taxonmy.

here is my code.

qiime dada2 denoise-paired
--i-demultiplexed-seqs 03.import/demux.qza
--p-n-threads 20
--p-trim-left-f 15 --p-trim-left-r 15
--p-trunc-len-f 0 --p-trunc-len-r 0
--o-table 04.Denoise/dada2-table.qza
--o-representative-sequences 04.Denoise/dada2-rep-seqs.qza
--o-denoising-stats 04.Denoise/denoising-stats.qza

cp 04.Denoise/dada2-table.qza 04.Denoise/table.qza
cp 04.Denoise/dada2-rep-seqs.qza 04.Denoise/rep-seqs.qza

qiime feature-classifier classify-sklearn
--i-classifier ${dbdir}/2022.10.backbone.full-length.nb.qza
--i-reads 04.Denoise/rep-seqs.qza
--o-classification 09.Taxonomy_Classify/taxonomy.qza

I donot know why?

Hi @bingli2019,

Although not an expert here but the issue might be arising due to the --p-trunc-len-f and --p-trunc-len-r which have been set to 0. If you do not want to leave any bases unused towards the end, you can input the length of the read (eg 251 using 515F-806R primers). This should solve the problem.

Hope it helps.

@Mudit_Bhatia This is incorrect as noted in the help text:

If 0 is provided, no truncation or length filtering will be performed

@bingli2019,
Can you provide more details on what taxonomy you expect to observe and why?

1 Like

Thank you Mike for this correction!!

1 Like

Yes, when I have 1,2,3,4 samples and It's OK. But if I have 5 or more samples and It's error.
here is an example of five and one samples.

five samples in five groups taxonomy not good.

index k__Eukaryota Unassigned Group
KC144 0 79450 B1
DC114 0 78200 B2
JH165 81814 420 B3
SB017 0 81494 B4
TL137 0 80076 B5

only one or not over 5 samples and groups, it's perfect.
here is one sample.

index d__Bacteria;p__Firmicutes_D;c__Bacilli;o__Bacillales_B_306089;f__Bacillaceae_H_294103;g__Bacillus_P_294101 Group
KC144 81141 B1

Hi @bingli2019,

I am not sure what it is you are asking. Are you looking for Eukaryotes? Bacteria?

Based on the table... I assume you are asking why there are so many unassigned taxa?

Do you know if your sequences are in a mixed orientation? That is, how were your data sequenced? In order for the classifier to work, the sequences should be in the same orientation as the reference database / classifier. One quick sanity-check you can perform is to use feature-classifier classify-consensus-vsearch... as this approach does not care about sequence orientation. If you obtain reasonable classification, then this would imply that your sequences are not oriented correctly, which is why the sklearn classifier is not working.

2 Likes

Yes, you're right. The sklearn classifier is not working when the sequences are the same orientation.

Thanks so much!

1 Like

I assume you mean that it is not working when they are not oriented the same way as the reference database. :slight_smile:

Keep this in mind when trying to generate ASVs / OTUs and constructing a phylogenies. You might get spurious results the reads are of mixed orientation. So, you'd need a way to orient all of the reads in the same direction.

Sadly there is no way to, currently, orient fastq files in QIIME 2. Though it should be possible via running vsearch manually outside of QIIME 2, then re-importing. See here for one possible approach.

You can use qiime rescript orient-seqs ... to orient FASTA sequences.

Thanks very much. I will try qiime rescript orient-seqs. :pray: