Differences between single end and paired end results

Hi

I’m Gyusik

I sequenced the 16s rRNA v3-v4 from the air capture sample with miseq (300*2).

And i analyzed it in two ways.

  1. Using flash, fastq files were merged into one and analyzed using single denoise in dada2.

command is as follows.

FLASh : flash %s_1.fastq.gz %s_2.fastq.gz -m 30 -M 140
[-m minOverlap]
[-M maxOverlap]

qiime dada2 denoise-single
–i-demultiplexed-seqs demux.qza
–p-trim-left 13
–p-trunc-len 460
–p-n-threads 16
–p-chimera-method ‘pooled’
–p-max-ee 2.0
–o-representative-sequences rep-seqs.qza
–o-table table.qza
–o-denoising-stats stats.qza
–verbose

  1. The fastq file of the paired end was imported and analyzed as dada2 paired denoise.

command is as follows.

qiime dada2 denoise-paired
–i-demultiplexed-seqs demux.qza
–p-trim-left-f 13
–p-trim-left-r 13
–p-n-threads 16
–p-chimera-method ‘pooled’
–p-trunc-len-f 280
–p-trunc-len-r 220
–o-representative-sequences rep-seqs.qza
–o-table table.qza
–o-denoising-stats stats.qza
–verbose

The dominant bacteria of the sample and the total number of ASV and the detected number of Feature are completely different in the above two processes.

Can you tell me which method is more recommended?
Or let me know if I made any mistakes.

The taxonomy, blast, and count results are combined into each tsv file to upload(single, pair).
pair.tsv (100.4 KB) single.tsv (63.6 KB)

Thanks

Hi @Gyusik_Hwang,
Welcome to the forum and happy 2020 to you!

If you want to use DADA2, use method 2. That is, you should not pre-join your reads prior to importing. DADA2 operates by building an error model on forward and reverse reads separately, using that error model to resolve errors and then using those corrected reads to join paired-ends. By pre-joining your reads you are interfering with the the error-building process and so would have unreliable results.
If you do need to use FLASH pre-joined fastq files for some reason, you should use Deblur instead in qiime2. Deblur operates on single-end reads only with a static error model and so can handle pre-joined reads.

For V3-V4 reads, I find DADA2 to retain a higher number of reads than Deblur. This is because as read lengths increases Deblur becomes more conservative (see here for a more detailed explanation). With shorter regions like V4 I find the 2 to be very comparable.
Hope this helps!

3 Likes