Explanation of Deblur-stats output

pramesh_shakya · October 26, 2018, 9:30pm

I am working on some fecal data from birds(only forward reads 16s Analysis) and was going through the Qiime2 pipeline. After the denoising step using Deblur, there’s an output file deblur-stats.qza which has all the information about the unique sequences found, removed and such. I’m trying to make sense of this table but there’s very little information on it except for when you put the cursor on the headers of the table. I’ve also gone through the deblur paper but I still have a few questions so following are the questions:
command used:
qiime deblur denoise-16S --i-demultiplexed-seqs 2_demux-filtered.qza --p-trim-length 100 --o- representative-sequences 3_rep-seqs-deblur_100.qza --o-table 3_table-deblur_100.qza --p-sample-stats --o-stats 3_deblur-stats_100.qza

qiime deblur visualize-stats
–i-deblur-stats 3_deblur-stats_100.qza
–o-visualization 3_deblur-stats_100.qzv

According to the deblur paper, it says that there are two methods for filtering (postiive(against reference genome) or negative( against PhiX)) but it looks like both of these filtering steps is performed(there are output headers for both reads-hit-artifact and reads-hit-reference in Qiime2 and i’m confused about the sequence of these steps taken because it doesn’t follow the pipeline as mentioned in the paper.
According to this deblur-stats.qza output file, the reads-hit-reference header is the total number of reads preserved. So does that mean that this positive filtering is done after the reads go through deblur algorithm and the chimera removal? If so, the header ’ unique-reads-hit-reference’ header sums up to 56,015 while the number of features is close to 9993. While I understand that the unique sequences repeated in different samples were merged and there was additional filtering using ’ min-size’ flag of value 10. Is that the reason for such a large decrease in the number of features?

Thank you.

wasade · November 1, 2018, 9:32pm

Dear @pramesh_shakya,

both the positive and negative filtering is applied when run. Reads are first filtered against the negative reference and then against the positive.
Yes that should be the case. To clarify, the filtering is applied after Deblur and after bimera filtering, but before the --p-min-size parameter.

All the best,
Daniel

pramesh_shakya · November 2, 2018, 5:31pm

Thank you for the reply.

pramesh_shakya · November 7, 2018, 7:52pm

I’m really trying to figure out how many sequences are thrown out on each step of the pipeline of deblur but the i feel like, the pipeline as per the paper is different from how it actually takes in place. It would really be helpful if the output was in the order of the steps taken. Is there any chance you could elaborate on that?

wasade · November 8, 2018, 10:22pm

I don’t believe it’s different from how it’s described in the paper. The order of the columns for the stats should be in order of the events within the Deblur workflow.

Best,
Daniel

system · December 10, 2018, 4:59am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.