percentage of input passed filter

mohsen_ej · January 12, 2021, 1:16pm

Hello every one,
I need to know, is there any threshold for “percentage of input passed filter” in denoising-stats.qzv? my results are something between 34%-52%. is it enough? if not, what is the problem?
Thank you

ChrisKeefe · January 12, 2021, 9:53pm

Hi @mohsen_ej,
Please include more information about exactly what commands you have run, and what your results look like. Without basic information, it’s hard for others to help. In this case, consider sharing the commands you ran, your denoising-stats.qzv, and some basic information on your study (e.g. what kind of sequencing on what kind of samples).

Best,
Chris

mohsen_ej · January 13, 2021, 7:13am

Hi.
first I have 13 paired-end meat samples. and the kind of sequencing is 16S Ribosomal RNA Gene Amplicons for the Illumina MiSeq System. I imported them by creating a manifest file and get this file.

then I trimmed the adapters and primers by this command:
qiime cutadapt trim-paired \

-- i- *demultiplexed-sequences paired-end-demux.qza *

--p-front-f CCTACGGGNGGCWGCAG **

--p-front-r GACTACHVGGGTATCTAATCC **

*--p-match-adapter-wildcards *

*--p-match-read-wildcards *

*--p-discard-untrimmed *

--o-trimmed-sequences paired-end-demux-trimmed.qza
based on this discussion cutadapt . this is about the primers if you can check the command.

after cutadapt these are the results: paired-end-demux-trimmed.qzv (315.7 KB)
then I was confused to decide on dada2 parameters but finally I decided to use just forward reads (please correct me I'm doing wrong) with this command:
*qiime dada2 denoise-single *

*--i-demultiplexed-seqs paired-end-demux-trimmed.qza *

--p-trim-left 10 **

--p-trunc-len 280 **

*--o-representative-sequences rep-seqs-dada2.qza *

*--o-table table-dada2.qza *

--o-denoising-stats stats-dada2.qza
but after that the mean of passed reads was about 50% while in other studies I know they do this with about 70%-80% passed.
sorry for long message and thank you for your help in advance.

mohsen_ej · January 13, 2021, 7:15am

for your information, my data is now bigger than the past. I mean in the discussion that I mentioned above about cutadapt, it was the same type of sequencing but with much lower reads in samples. I don’t know if its important or not.
Thank you very much

ChrisKeefe · January 14, 2021, 3:18am

Thanks for sharing more information. It will be much easier to understand why you're losing sequences if you share your denoising-stats.qzv as requested above.

In the meantime, here are a few random thoughts.

DADA2 parameters are challenging for many people at first, but it is absolutely worth learning how to set them appropriately. There are many great discussions of how to set dada2 trim and truncation parameters well on this forum. If you haven't already, consider spending an hour or two with the forum's search feature.

Did you sequence V3&V4? That's a pretty long amplicon, so that might have been the best choice, but given the cost of paired-end sequencing, it's probably worth checking how many paired-end sequences you can generate. Your quality scores look pretty good in general.

Why are you trimming these?

Chris

mohsen_ej · January 14, 2021, 6:18am

Thank you for your information.
So do you think there is no problem with the cutadapt command and I did it correctly?
I read many discussions about dada2 parameters but you know I think itرdepends on the type of data and is specific to each individual but they were helpful.
sorry, I didn't understand what do you mean by the second section. yes, it's sequenced V3&V4 but how can I check how many paired-end sequences I can generate?
and about the -p-trim-left 10, I did it because in one of the previous topics I asked about dada2 parameters somebody told me it would be workable because after 10 the quality scores are better . I would be appreciated if you let me know your suggestion about that.
denoising-stats.qzv (1.2 MB)
Thank you very much.

ChrisKeefe · January 15, 2021, 6:29pm

Thanks for sharing that data, @mohsen_ej. These are not the denoising stats from the command you posted above, which was initially confusing. They may be a useful starting point for exploration, though. First, a clerical note.

Please make an effort to keep your topics specific, detailed, and focused on one question. Failing to do so makes it harder for people to help you, and harder for others to learn from your experience. Based on the post you linked above, a number of other forum mods have already worked with you on selecting cutadapt parameters for this data. I'd recommend you take some time to learn how the tool works (documentation, paper), and if you have a specific question about cutadapt or how to use it, please create a separate topic.

Here, your question appears to be "what rate of sequence recovery is good enough". I'll focus on that.

Sorry if that was unclear. Running DADA2 with different trim and trunc parameters can change the number of reads recovered significantly. Learn how DADA2 works, choose good parameters, and you can maximize the amount of data recovered from a given sequencing run.

This post is focused on "how many reads is enough". If you're still having trouble understanding how to set DADA2 parameters, maybe we can discuss what search terms you've used, what topics you've read, and what you didn't understand about them. I and many others have written about this topic extensively, and I'd be happy to help you learn to find the information you need more effectively.

I suspect mis-communication here. If you were running DADA2 on the data in the screen capture you shared, trimming left 10nt might be useful. You're not. You're running DADA2 on paired-end-demux-trimmed.qzv, in which the lowest mean q-score on the forward 5' end is 28, and the rest are all above 30, which is generally adequate. You've already removed some nt with cutadapt, so you may need different parameters. Take a moment with that visualization. If you were going to apply trim-left, how many NT would you trim?

mohsen_ej · January 15, 2021, 8:02pm

Thank you SO much for your information full of useful tips. the best thing I got if I'm not mistaken is that about 10K reads is enough to downstream with. I chose just forward read with 220 trunc and it gave me denoising-stats.qzv (1.2 MB)
probably I don't need to worry about that. right?

system · February 16, 2021, 2:02am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.