Dear Qiime team,
I am trying to merge paired-end reads but only few of my reads actually merge and I don’t know if it is due to my script or is it because my reads are too short? My problem is similar to this post ('joining-2x150-bp-v4-paired-end-reads-from-iseq100') but I also have question about the trimming process.
I have used 515f and 806r primers for the V4 region. I have 2x150 reads
Qiime 2020.2 installed on Virtual box
Here is what I am doing
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path metadata.csv
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33V2
#to reduce computing time I have subsampled 10% of my sequences
qiime demux subsample-paired
--i-sequences paired-end-demux.qza
--p-fraction 0.10
--o-subsampled-sequences sub-paired-demux.qza
--verbose
sub-paired-demusub-paired-demux.qzv (291.6 KB) x.qzv
qiime cutadapt trim-paired
--i-demultiplexed-sequences sub-paired-demux.qza
--p-front-f GTGCCAGCMGCCGCGGTA
--p-front-r GACTACHVGGGTATCTAATCC
--o-trimmed-sequences trim-sub-paired-demux.qza
--verbose
(only the last samples, all present similar results)
=== Summary ===
Total read pairs processed: 93,436
Read 1 with adapter: 91,279 (97.7%)
Read 2 with adapter: 90,113 (96.4%)
Pairs that were too short: 0 (0.0%)
Pairs written (passing filters): 93,436 (100.0%)
Total basepairs processed: 28,030,800 bp
Read 1: 14,015,400 bp
Read 2: 14,015,400 bp
Total written (filtered): 23,960,704 bp (85.5%)
Read 1: 12,100,395 bp
Read 2: 11,860,309 bp
=== First read: Adapter 1 ===
Sequence: GTGCCAGCMGCCGCGGTA; Type: regular 5'; Length: 18; Trimmed: 91279 times; Reverse-complemented: 0 times
No. of allowed errors:
0-9 bp: 0; 10-18 bp: 1
Overview of removed sequences
length count expect max.err error counts
3 42 1459.9 0 42
10 1 0.1 1 0 1
11 31 0.0 1 30 1
14 4 0.0 1 2 2
15 16 0.0 1 12 4
16 16 0.0 1 5 11
17 197 0.0 1 168 29
18 304 0.0 1 241 63
19 1250 0.0 1 902 348
20 6432 0.0 1 3587 2845
21 79589 0.0 1 77967 1622
22 3332 0.0 1 18 3314
23 1 0.0 1 0 1
25 2 0.0 1 2
56 1 0.0 1 1
120 2 0.0 1 2
121 7 0.0 1 7
122 5 0.0 1 3 2
123 10 0.0 1 10
124 16 0.0 1 16
141 4 0.0 1 4
142 1 0.0 1 1
146 6 0.0 1 6
147 10 0.0 1 10
=== Second read: Adapter 2 ===
Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 90113 times; Reverse-complemented: 0 times
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
Overview of removed sequences
length count expect max.err error counts
3 47 1459.9 0 47
4 2 365.0 0 2
12 1 0.0 1 1
15 1 0.0 1 1
16 4 0.0 1 2 2
17 1 0.0 1 0 1
18 6 0.0 1 1 5
19 14 0.0 1 3 9 2
20 111 0.0 2 24 61 26
21 214 0.0 2 42 81 91
22 763 0.0 2 139 270 354
23 3960 0.0 2 579 1873 1508
24 84848 0.0 2 31532 50828 2488
25 139 0.0 2 19 49 71
26 1 0.0 2 0 0 1
27 1 0.0 2 0 0 1
trim-sub-paired-demux.qzv (296.8 KB)
At this step it seems to me that the forward and reverse reads are correctly trimmed of the primers, is it right ? However the quality-score plot warns me that the minimal sequence length for the forward reads is 3bp… is there something wrong?
For the merging, I use a minimal overlap length of 7 because the length of my reads minus my primers barely overlap
qiime vsearch join-pairs
--i-demultiplexed-seqs trim-sub-paired-demux.qza
--p-minovlen 7
--p-maxdiffs 25
--o-joined-sequences joined-paired-demux.qza
--verbose
(for the last sample)
Merging reads 100%
93436 Pairs
217 Merged (0.2%)
93219 Not merged (99.8%)
Pairs that failed merging due to various reasons:
56626 too few kmers found on same diagonal
9 potential tandem repeat
36535 alignment score too low, or score drop to high
49 staggered read pairs
Statistics of all reads:
128.22 Mean read length
Statistics of merged reads:
239.70 Mean fragment length
36.02 Standard deviation of fragment length
0.32 Mean expected error in forward sequences
0.45 Mean expected error in reverse sequences
0.50 Mean expected error in merged sequences
0.15 Mean observed errors in merged region of forward sequences
0.28 Mean observed errors in merged region of reverse sequences
0.43 Mean observed errors in merged region
What means “too few kmers found on same diagonal” and “alignment score too low, or score drop to high” ?
Beside, using dada2 after the trimming also resulted in only few merged reads
Thanks a lot for your help