Losing high percentage of reads with good quality scores

I have the same problem. And I try to increase the value of max-ee, but it don't improve. I don't know how to do? My data have been dealt with.

@ZHY,

Can you post all of the steps that you have performed on your data as well as the command you are using to denoise with? Also, it would be helpful if you could you also create and post some quality plots of your demultiplexed data using demux summarize(docs).

I got the data demultiplexed: importing and the result
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-format PairedEndFastqManifestPhred33V2 --input-path ./HBSmanifest.tsv --output-path ./HBSpaired-end-demux.qza

HBSpaired-end-demux.qzv (316.8 KB)
and the the denoising:
qiime dada2 denoise-paired --i-demultiplexed-seqs HBSpaired-end-demux.qza --p-trim-left-f 18 --p-trim-left-r 21 --p-trunc-len-f 250 --p-trunc-len-r 232 --o-table tableHBS.qza --o-representative-sequences rep-seqs-HBS.qza --o-denoising-stats denoising-statsHBS.qza --verbose

Sorry I seev now It was @ZHY you asked the data from. I don't know how I can delete this post. :grimacing:

The LH-7 is my single sample.
LH_7_dada2_data.qzv (345.2 KB)
The rdl_data is all samples.
rdl_data.qzv (345.9 KB)

For leave more reads, I try to leave more chimeric sequence, but it isn't useful.
qiime dada2 denoise-single
--i-demultiplexed-seqs /home/rendongliang/dada2_data.qza
--p-trim-left 0
--p-trunc-len 0
--p-max-ee 20
--p-chimera-method none
--o-table dada2/table.qza
--o-representative-sequences dada2/rep-seqs.qza
--o-denoising-stats dada2/denoising-stats.qza

And my single sample can leave half of the sequence reads, when it in all samples, it can't do that.
leave chimeric sequences´╝łall samples´╝ë
LH-7 21580 21554 99.88 2079 2079 9.63
leave chimeric sequences´╝łsingle sample´╝ë
LH-7 21580 19280 89.34 11995 11995 55.58
remove chimeric sequences´╝łsingle sample´╝ë
LH-7 21580 19280 89.34 11995 6928 32.1

@ZHY,

Could you post your import steps as well? looking at the visualizations, it looks like the PHRED scores are not being processed correctly, which I think we will be able to fix during import.

For my data, I don't use too many data treatings, because the returned data of Sequencing company has been taken care of before import.
So my import method as follows:
qiime tools import \

--input-path manifest.csv \

--type SampleData[SequencesWithQuality] **

--input-format SingleEndFastqManifestPhred33 **

--output-path rdl_data.qza**

@ZHY,

Try importing again, first with the --input-format set to SingleEndFastqManifestPhred33V2 and if that does not work try SingleEndFastqManifestPhred64V2.

I try the two ways, but they don't work, and SingleEndFastqManifestPhred33 is the ueseful importing way.

@ZHY,

Gahh, I forgot that the V2 signifies how the manifest file is built, try keeping everything else the same but changing the 33 to 64. It still may not work but lets try all the options here.

Yes, it do not work and provide a error information in the picture.
image
I think the ultimate reason that maybe my data really not good.

Thanks for your help. For my data, I can observe multiple sequences with at least 90% similarity, and part of thses sequences that quality score only have single nucleotide differences. But I not sure this is main reason.

An off-topic reply has been split into a new topic: Losing high percentage of reads in dada2 denoise-paired

Please keep replies on-topic in the future.

@ZHY,

I was able to take a closer look at your data and it looks like it may not have been collected on an Illumina machine, based on the values of the PHRED scores present. The scores you have in your data contain a wider range of values than would be present in a single Illumina variant on its own, as well as having longer reads than would be expected with the standard Illumina reads. Do you know what technology was used to sequence your data? If not could you ask your sequencing center?

1 Like

Hi, thanks for your reply, I ask the sequencing center about the sequencing methods. The company uses Pacbio SMRT to sequence my data.