You posted some time ago about interpreting Deblur-stats. I´m having the same difficulties to understand the outputs and would like to track my sequences.
@wasadeDaniel McDonald said the order of the steps presented on the Deblur-stats is the same order as the pipeline in the paper. Sorry, I don´t believe so, since ...
Paper: Amir et al, 2017
Supplemental material FIG S1 says:
" The Deblur pipeline. A demultiplexed and quality filtered fasta/fastq file (or a directory of per-sample FASTA/FASTQ files) is used as the input to the pipeline. Following initial splitting to per-sample fasta files, all processing is done independently on each sample. Sequences are trimmed and dereplicated with singletons removed. Reads are then depleted from sequencing artifacts either using a set of known sequencing artifacts (such as PhiX) (negative filtering) or using a set of known 16S sequences (positive filtering). Resulting nonartifact reads are then aligned for easy indel detection. This multiple sequence alignment is then used as the input for the Deblur algorithm. Each Deblurred sample is then checked for de novo chimeras, and the resulting sOTUs from all samples are combined into a single BIOM"
Note that according to the paper, positive and negative filtering are done an THEN Deblur algorithm is run. This is different from the order presented by Deblur-stats which show the positive filtering(reads-hit-reference) as the last step.
Please, can somebody tell me what the real order is? Or am I reading the wrong paper?
you are correct.
The deblur implmentation in qiime2 uses the deblur workflow script (see here and here ). The script runs the deblur module into a tmp dir. The deblur module then does the following:
remove artifacts using known phix sequences (negative filtering)
multiple sequence alignment
Save the resulting table as ‘all.biom’
The sequences from this biom table are aligned to the 16S reference database, and two additional biom tables are created: ‘reference-hit.biom’ (the sequences aligning to the database) and ‘reference-non-hit.biom’ (the sequences not aligning to the database).
The qiime2 plugin then uses the resulting ‘reference-hit.biom’ from the tmp dir as the output for the qiime2 deblur denoise-16S output.
So the qiime2-deblur plugin does negative filtering with phiX, then deblurring, then positive filtering with the 16S database.
Does this make sense? Please let me know if you have any additional questions.
In row 1, we have 2000 reads coming out of Deblur, minus 8 chimeric and minus 38 that missed reference, does not equal 1603. There are 351 reads missing. Could you please help me calculate this?
Thank you very much!