I am using the latest version of QIIME2 2024.10 (amplicon). I imported my paired-end sequences (2X250bp). Before running DADA2 workflow, I would like to do a Q-score filtering on these demux data. I remember QIIME used to have a qiime quality-filter q-score-joined script and later incorporated it into the "qiime quality-filter" q-score script.
I suppose qiime quality-filter q-score will automatically identify my F and R reads. I found interesting things.
If I convert them to qzv files. the 2nd stats file is normal with the total number of reads retained and filtered, although it doesn't show F or R individually.
If I check the 1st file's qzv file, it is an interactive quality plot. This plot only gives me the F reads information, but lacks of R reads information. Without both reads' information, I would not be able decide how to trunc the reads for the next DADA2 workflow.
I am not sure if this is a bug or if I used the wrong script. Should I run this (filter low Q score reads) after the DADA2 workflow?
Thank you for bringing this to our attention. Can you provide a list of the commands that you ran and attach the visualizations you're talking about?
You used this tool at the correct stage, using it after dada2 is not possible because sequences no longer possess quality scores at that point.
It's my understanding that dada2 works best with unfiltered sequences because it uses the quality scores to build a model that it then uses for denoising purposes. Thus, the more read positions the more informed it will be and the more accurate it will be. That being said, we should still investigate your issue because it may be a bug.
1> Here are scripts that I run (the trimmed_sequences/demux-paired-end-trimmed.qza is the raw imported demux data with primers trimmed using cutadapt).
2> I did try Q score cutoff with 20, 25 and 30. Interestingly, the results of my Q20 and Q25 cutoff are identical, but Q 30 is different. I don't know why. It could be everything is above Q25?
3>PS, It's true DADA2 is very good and it also has it is own Q-score cutoff with default 2. I am doing this to replicate someone's data analysis.
I don't know how DADA2's Q-score cutoff is equivalent to this QIIME Q-score filter cutoff. If I set DADA2 Q-score cutoff to 20, I would lose 90% of reads sometimes. Do you usually change DADA2 default Q score cutoff? If so, what do you choose?
Could you attach the .qzv files themselves, not just their exported data/ directories? This will make it a little easier for me to view them, and will make it possible for me to see their provenance, so I can look for clues about what is going wrong.
This is indeed a bug, or a failure to adequately document, depending how you look at it. Currently, the way quality-filter q-score operates is that whether you input paired end reads or single end reads, the quality filter is applied only to the forward reads and only the forward reads are returned. I'm now planning to update this action to apply the quality filter to both read directions and to return both read directions when paired end reads are inputted. Thank you for bringing this to our attention.
The good news for you is that I don't think this action is desirable for your workflow, for the reasons I mentioned above. I would skip this action and move on to dada2 denoise-paired.