Deblur stats.qzv file meaning and interpretation

MMC_northS · April 12, 2018, 11:48am

Hi,

I have some questions about the file generated after deblur step, called "per-sample-deblur-stats". I am using deblur denoised others, because I have one amplicon of 18S which also amplify some fragments of 16S in procariots so I use like reference database both from Silva 128.

From the post (already closed) "Deblur analysis merged sequence" that the parameter in the column called "reads-derep" is very similar to "reads-raw". That is normal? I have also high number in that column. I am not sure to understand all the colums from this file. Whats is the colum that shows how many sequences finally you keep after deblur?
The columns refereed "artifact" or "missed-reference" speak about the match with the reference database that you use?

What is the meaning of derep in that step?
I know that if you put the computer mouse over the column name it is showed a brief explanation but I continue with my doubts.

In other way, When I have tried to cut my sequences from my fastq by quality and length with Cutadapt program the % of sequences recovered was from 80-90% using 100 bp. When I have used that length trimming for deblur the numbers filtered-by-min-length was almost 0.50 (from 0.29-0.45) depending on the sample.
Anyone could help me please? Any idea or additional explanation?

wasade · April 17, 2018, 4:09pm

Hi @MMC_northS,

"reads-derep" indicates the number of unique reads within a sample following dereplication, and using the threshold provided by --p-min-size. By default, this will omit singletons, so if you're data do not have many singletons, "reads-derep" would be quite similar to "reads-raw".

The number of reads following deblur itself ("reads-deblur") may be misleading as that is upstream of negative and positive filtering. The output table from q2-deblur you will most likely be using should correspond to the results in "reads-hit-reference" which are the number of reads which passed the positive reference filter (which is applied after the negative).

The "artifact" and "reference" refer to the negative and positive filters respectively. The negative filtering database is generally composed of adapters, and the PhiX genome. The positive filter is generally composed of the target amplicon type.

"derep" indicates stats about the dereplication stage of the algorithm, and those two columns describe the number of unique reads observed after dereplication ("unique-reads-derep") and the number of total reads left after dereplication and filtering based off --p-min-size ("reads-derep").

When you have a second, can you provide the exact commands used? The column "filtered-by-min-length" does not appear in the deblur stats, is this in reference to q2-quality-filter? If so, that plugin will truncate sequences using a sliding window over the PHRED scores when a minimum quality is observed, and after the truncation a minimum length filter is applied (the defaults are based off of Bokulich et al 2013.

Best,
Daniel

MMC_northS · April 18, 2018, 7:43am

Hi @wasade

My command line was the following:

qiime deblur denoise-other
--i-demultiplexed-seqs 18S_16Smock_demux.qza
--i-reference-seqs silva128_all.qza
--p-trim-length 100
--p-sample-stats
--p-jobs-to-start 2
--o-table 18S_16Smock_demux_Deblur100-table.qza
--o-representative-sequences 18S_16Smock_demux_Deblur100-seqs.qza
--o-stats 18S_16Smock_demux_Deblur100-stats.qza

I tried the same command with --p-trim-length like 105, 110, 120, 130, 150, but one of my problem is that my fragment is variable depending on the species in a rank around 90-186, but with the majority in 130. It is because really I do not want to do the trimming but the option -1 gives me an error because it detects different sizes in my sequences.

Really thank you for your explanation, it is so useful for me for the interpretation of my results and make a decision about what method use in QIIME2 or if continue with QIIME1 for the moment for my sequences.
Thanks,
MMC

wasade · April 18, 2018, 5:49pm

Hi @MMC_northS,

Happy to be able to help!

Deblur is only defined over sequences that are of an identical length within a sample. If variable length is necessary, then I'm not sure if Deblur will work for you. However, note that there is a lot of literature that shows short fragments can be highly useful (such as Yatsunenko et al Nature 2012 which used 90nt fragments), and samples with different sequence lengths can lead to a technical bias Debelius et al Genome Biology 2016. I do want to caution that the utility of short fragments can depend on the variable region, and in particular whether the region affords phylogenetic and taxonomic differentiation near the forward primer.

Best,
Daniel

MMC_northS · April 19, 2018, 6:58am

yes, that is right, the information depends on each DNA fragment that you analyze. For my fragment the best length was 120 pb, no less and no more, but anyway I do not have the same perfect results that I got with QIIME1. Probably is because my fragment and my sequences PGM, but for the moment deblur is closer to reallity (I use mock comunities) but not enought yet.
I will continue trying in my next experiments.
Thank you for your help!! I appreciate it so much!
MMC

wasade · April 19, 2018, 4:12pm

Glad to be able to help! Just as a heads up, the error model for Deblur is derived from Illumina platforms and may not be well suited for non-Illumina platforms.

Best,
Daniel

system · May 20, 2018, 10:12pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.