I have a question that seems to be relevant to this stream. I imported my raw data (16S V4V5, 515f/806r) using Casava 1.8 paired-end demultiplexed fastq format and they seem fine. Then I try to join them using vsearch command and I don't think they are joined. Here are the qzv files. csv18-joined.qzv (300.6 KB) csv18-pair-end.qzv (313.8 KB)
I used vsearch with default parameters:
qiime vsearch join-pairs --i-demultiplexed-seqs csv18-pair-end.qza --o-joined-sequences csv18-joined.qza
I also imported R_1 reads as single-end seqs and ran DADA2, which is what I usually do, on both single-end and paired-end seqs, and I got about half features from paired-end seqs than from single-end seqs.
I don't know what is going on. Is this a problem with my sequencing data or some parameters I did not set right?
I did not join them when I ran DADA2. When I did DADA2, I imported paired end reads and forward reads separately and went through the pipeline respectively. I got very different feature counts – about 220 from pair end and 450 from single end (forward).
I didn’t know what was going on so I was trying to join them and run Deblur yesterday.
Let’s take a closer look at your DADA2 denoising stats - I suspect your trim and truncation params when running denoise-paired are removing many features. Please share the denoising stats QZVs for both denoise-single and denoise-paired.
I agree. It isn't the worst I have seen, but there is a lot of reads being discarded during the merge step. As well, it looks like a few of your samples are losing many reads at the chimera filtering step.
I'm pretty sure 515f/806r is V4 only (can @jwdebelius or @SoilRotifer confirm that?), not V4-V5. Can you confirm what your target region is, and what primers you used? These are important to keep in mind when choosing your DADA2 trim and trunc parameters, and they have a direct relationship to the merging step - if you pick too aggressive trim and trunc params, you run the risk of making the reads unmergeable. Also, q2-dada2 needs at least 12 nts of overlap in order to merge. @Mehrbod_Estaki has a few good posts floating around the forum demonstrating how to calculate the overlap, check the "Search" functionality here in the forum for more details.
Thank you! Sorry I put the wrong primers, it was 515F and 926R.
I got demultiblexed raw data but I couldn’t remove the primers (somehow the tool they provided didn’t work) so I trimmed off 20 nts on the left side of R1 and R2. I set trunc len = 220, which should give me about 30nts of overlap, but that may be too short. I’ll try longer trunc length.
I think your forward reads look great @Hui_Yang! Based on the demux paired-end output, you should be able to go out to further for each read.
Based on the co-ordinates of the demux qzv file, I’d try the following truncation point values (or combinations thereof) :
FW: 268 | 283
REV: 211 | 243
I’d try the 268 - 211 pair first.
Why not try running q2-cutadapt on your demuxed data prior to dada2 / deblur? You can search the forum for many examples of running cutadapt. Then re-visualize your cutadapt output (quality plots) the same way you did your demuxed data. This will help you determine the appropriate truncation points after the primers have been removed, i.e. they’ll likely be 20-30 bp shorter. At this point there is likely no need to use the trim options.
Oh I thought trunc len was the length after the trimming. Good to know!
I tested a few params and here’s what I got:
Trim: f = r = 10, Trunk len: f = r = 220, 294 features
Trim: f = r = 10, Trunk len: f = r = 240, 254 features
Trim: f = r = 20, Trunk len: f = r = 260, 255 features
Trim: f = r = 20, Trunk len: f = r = 290, 207 features
Trim: f = r = 0, Trunk len: f = r = 280, 205 features
Trim: f = r = 20, Trunk len: f = 240 r = 220, 272 features
Seems trimming didn’t make too much difference, and trunc length shouldn’t be too long or too short. Is it ok to proceed with the setting that gives the most features?
to your command. The first two allow matches to IUPAC ambiguity codes (e.g. N, M, R…) while the last discards any pairs in which both primers are not found. This is why there are two drop-offs at the end of the quality plots, some are not being trimmed.