Which quality control method should I choose?

Hello, Everyone.
I have some problems with quality control. The company provided me with the cleandata that filitered with fastq (over Q20). But when I used "qiime demux summarize" to exam the data, I found the quality of the middle segment of the sequences was very poor and they fell to Q10.
Should I go on discard all sequences after 220?Or ignore the quality control of qiime2 and go on next step?

HI @Alan :smiley: :wave:

Welcome to the :qiime2: forum. Could you post a screen shot of your quality plots? So people here can see what you are dealing with?

I wouldn't recommend ignoring quality control steps. These steps are there for a reason - which is to ensure integrity and viability of your data. Basically, you need to know what you find is real and not the result of error.

1 Like

In the first analysis, I truncated the sequences after 220, but the high-quality sequences after 300 also seemed to be removed. The number of annotated sequences after that seemed to be very small (only a few thousand or a few hundred).


So I wonder how to deal with the data?

Sure. This is the cleandata after fastp quality control steps (given by the company). I used fastqc to view, and the values are above Q20.

This is what I get after uploading cleandata to qimme2. The value dropped to Q10 now.

Hi again,

thanks for that - thats really helpful to see whats going on.

Looks to me like your 'clean' data delivered by the sequencing company has already been merged/joined together. That dip in the middle is actully where the quality towards the ends of the forward and reverse reads fades, which once merged makes it look like the middle section. Ideally before merging the low quality ends should be trimmed off.

Additioanlly, looks like this was sequenced on either a Novaseq or a Nextseq as the quality scores are binneed resulting in this blocky looking quality graph.

Do you have access to the raw data, instead of this 'clean' data? Often sequencing companies provide both.

3 Likes

Hi, thanks for your answer.
The company also provided the raw data. Should I try to use raw data and merge the sequence by myself, then use DADA2 for trimming and quality control? Or is there any other way to deal with it?

Hi again @Alan

Yes, I would recommend that you use the raw data and process it yourself. This also means you will know exactly what was done with it! :smiley:

I just want to flag that raw data will more than likely need adaptors/primers removing (check out cutadapt) and that you won't need to merge (and shouldn't) before using DADA2. DADA2 will do the denoising and merge your reads for you!

Some useful links:
Importing data tutorial - Importing your data in all kinds of situations
Cutadapt instructions - how to use cutadapt to remove adaptors/primers
Soil microbiom tutorial - includes DADA2 section for paired reads

happy :qiime2:in'

4 Likes

Thanks a lot!! I will try it.

1 Like