Denoise paired issues

bingli2019 · March 22, 2024, 4:29pm

I found that four samples(four groups) is correctly to taxonmy, but five samples(five groups) is not correctly to taxonmy.

here is my code.

qiime dada2 denoise-paired
--i-demultiplexed-seqs 03.import/demux.qza
--p-n-threads 20
--p-trim-left-f 15 --p-trim-left-r 15
--p-trunc-len-f 0 --p-trunc-len-r 0
--o-table 04.Denoise/dada2-table.qza
--o-representative-sequences 04.Denoise/dada2-rep-seqs.qza
--o-denoising-stats 04.Denoise/denoising-stats.qza

cp 04.Denoise/dada2-table.qza 04.Denoise/table.qza
cp 04.Denoise/dada2-rep-seqs.qza 04.Denoise/rep-seqs.qza

qiime feature-classifier classify-sklearn
--i-classifier ${dbdir}/2022.10.backbone.full-length.nb.qza
--i-reads 04.Denoise/rep-seqs.qza
--o-classification 09.Taxonomy_Classify/taxonomy.qza

I donot know why?

Mudit_Bhatia · March 22, 2024, 5:41pm

Hi @bingli2019,

Although not an expert here but the issue might be arising due to the --p-trunc-len-f and --p-trunc-len-r which have been set to 0. If you do not want to leave any bases unused towards the end, you can input the length of the read (eg 251 using 515F-806R primers). This should solve the problem.

Hope it helps.

SoilRotifer · March 22, 2024, 6:11pm

@Mudit_Bhatia This is incorrect as noted in the help text:

If 0 is provided, no truncation or length filtering will be performed

@bingli2019,
Can you provide more details on what taxonomy you expect to observe and why?

Mudit_Bhatia · March 22, 2024, 6:37pm

Thank you Mike for this correction!!

bingli2019 · March 25, 2024, 2:17am

Yes, when I have 1,2,3,4 samples and It's OK. But if I have 5 or more samples and It's error.
here is an example of five and one samples.

five samples in five groups taxonomy not good.

index	k__Eukaryota	Unassigned	Group
KC144	0	79450	B1
DC114	0	78200	B2
JH165	81814	420	B3
SB017	0	81494	B4
TL137	0	80076	B5

only one or not over 5 samples and groups, it's perfect.
here is one sample.

index	d__Bacteria;p__Firmicutes_D;c__Bacilli;o__Bacillales_B_306089;f__Bacillaceae_H_294103;g__Bacillus_P_294101	Group
KC144	81141	B1

SoilRotifer · March 25, 2024, 6:07pm

Hi @bingli2019,

I am not sure what it is you are asking. Are you looking for Eukaryotes? Bacteria?

Based on the table... I assume you are asking why there are so many unassigned taxa?

Do you know if your sequences are in a mixed orientation? That is, how were your data sequenced? In order for the classifier to work, the sequences should be in the same orientation as the reference database / classifier. One quick sanity-check you can perform is to use feature-classifier classify-consensus-vsearch... as this approach does not care about sequence orientation. If you obtain reasonable classification, then this would imply that your sequences are not oriented correctly, which is why the sklearn classifier is not working.

bingli2019 · March 26, 2024, 1:37am

Yes, you're right. The sklearn classifier is not working when the sequences are the same orientation.

Thanks so much!

SoilRotifer · March 26, 2024, 2:12pm

I assume you mean that it is not working when they are not oriented the same way as the reference database.

Keep this in mind when trying to generate ASVs / OTUs and constructing a phylogenies. You might get spurious results the reads are of mixed orientation. So, you'd need a way to orient all of the reads in the same direction.

Sadly there is no way to, currently, orient fastq files in QIIME 2. Though it should be possible via running vsearch manually outside of QIIME 2, then re-importing. See here for one possible approach.

You can use qiime rescript orient-seqs ... to orient FASTA sequences.

bingli2019 · March 27, 2024, 1:13am

Thanks very much. I will try qiime rescript orient-seqs.