Alternative methods of read-joining in QIIME 2

I have followed **Alternative methods of read-joining in QIIME 2 tutorial. The issue is my .qzv file after joining looks so strange to me compared to the tutorial example output which makes me feel something is wrong.

Can you please have a look at it and help me which is the best trimming settings that I should use for the following denoising step with deblur? I have no clear view where I should timm which makes me stuck at this step.

When I experimentally followed with Deblur as follows>
qiime deblur denoise-16S **

--i-demultiplexed-seqs demux-joined-filtered.qza **

--p-trim-length 250 **

--p-sample-stats **

--o-representative-sequences rep-seqs.qza **

--o-table table.qza **

--o-stats deblur-stats.qza

I got this table.qzv

How can I judge if my final result here makes sense and logic or not?

Thanks in advance.

Hi @Sabrin, welcome to :qiime2:!

It'd help to see the quality score visualization of the imported unmerged reads, in addition to the deblur visualizations. Can you share the QZVs for these?



Hi @SoilRotifer
Thank you so much for your response, now I have hope to move on :slightly_smiling_face:
demux.qzv (309.9 KB)
demux-joined.qzv (294.9 KB)
demux-joined-filter-stats.qzv (1.2 MB)
rep-seqs.qzv (447.3 KB)
table.qzv (460.2 KB)

1 Like

Hi @Sabrin,

Thank you for sharing these. I think the quality plots for demux.qzv look fine, as does the demux-joined.qzv. The reason why you see a jump in quality at positions ~150-300 is due to the identical base calls observed from each read, in the region of overlap. That is, you gain confidence in a base call when the two reads agree on a given base. When they do, the quality score gets a boost, hence why it is higher than the rest of the merged read.

The quality scores to either side of the overlap region (the elevated scores) are for the portions of the individual reads that are not in the overlap region, i.e. the forward read (left of the overlap region) and the reverse read (right of the region of overlap).

I think this is good to go!

I am not sure why you lose so many reads for samples 28mg1 and 28mg2 after denoising. I suspect that the loss has more to do with many "off-target" reads. That is, non-microbial / host DNA sequences? Sometimes I can do a better job of keeping eukaryotic sequences if I use SILVA. I do this just to confirm off-targets, which I'd remove later. It's not perfect... but can help narrow down why I am losing reads.

:candy: Bonus tip, you can use SILVA for deblur! :candy:

Download one of the SILVA reference sequence files from here, and use the command below:

! qiime deblur denoise-other \
    --i-demultiplexed-seqs demux-joined.qza  \
    --i-reference-seqs silva-138-99-seqs.qza \
    --p-trim-length 250 \
    --p-jobs-to-start 8 \
    --o-table deblur-pe-table.qza \
    --o-representative-sequences deblur-pe-repseqs.qza \
    --o-stats deblur-pe-stats.qza \
1 Like

What is this command exactly for please, I am confused with it? so with this command I am re-filtering again but at the same time inserting the reference sequences data from SILVA which makes me confused with the assign taxonomy step? why trimming at 250?

Should i download a full length reference [Silva 138 SSURef NR99 full-length sequences]?
I am actually stuck now at assigning taxonomy as I sequenced v3-v4 region and the pre-trained classifier in the moving picture tutorial is for V4 region!! Advice please how to proceed?

No. The reference used by deblur is simply there to identify and remove anything that does appear to be similar to your intended amplicon target (e.g. 16S rRNA). That is, it is intended only to remove spurious artifacts. The deblur denoise-16S uses GreenGenes reference database by default, and cannot be changed. This means any sequence that does not match bacterial or archaeal 16S rRNA gene sequences are discarded. Some 16S rRNA gene primers can amplify 12S & 18S rRNA, or other off-targets. So, these would be mostly removed from your data.

If you are using "universal" primers that are intended to amplify any small sub-unit (i.e. both the 16S and 18S rRNA gene sequences), or simply want to keep these off-targets for further investigation, then you 'd want to make use of the deblur denoise-other. As this will allow you to retain the 18S rRNA sequences, if using SILVA. The denoise-other approach allows you to use any reference sequences, i.e. different 16S rRNA reference databases, (i.e. SILVA, RDP, GTDB), other marker genes, ITS, CO1, etc... for that basic filtering step I mentioned above.

I simply used the example you provided in your original post.

You can make your own 16S rRNA V3V4 region classifier using RESCRIPt. If you do not want to make the classifier yourself you can search the forum for others that have already made a V3V4 classifier. Many are willing to share. :slight_smile:

4 off-topic replies have been split into a new topic: Training V3V4 classifier w/ rescript

Please keep replies on-topic in the future.

An off-topic reply has been merged into an existing topic: Training V3V4 classifier w/ rescript

Please keep replies on-topic in the future.