Thank you!
That's what I did after reading your comment:
I read through all the links and I also had a look at my sequences.
They show that the primers are still there and that a percentage around 75% for non-chimeric should be the goal. Am I right?
I thought that one possibility to trim the primers would be trim-f 19 (= primer length) and trim-r 20 (=primer length) in the dada2 denoising step. But the percentage of non-chimeric is only around 50%.
I couldn't make trimLeft = c(19, 20)
from Benjamin Callahan's FAQ work, so I don't know if that code would generate other results.
I will try cutadapt also, but that will take some time.
No, I don't understand, why I have "better" results by trimming 100 r and f. I would expect a worse outcome by cutting off so much of the sequence?
I also read quite a lot about truncating, which ended in choosing f220 and r210 = expected length of 400 and around 30 for a good overlapping. I tried also f210 and r210, but there was almost no merging any more, which means that overlapping was not sufficient. Truncating at 210 and 240 generated longer pieces, but longer pieces mean higher probability of errors, and lower probability for 1% error free reads.
So, I would think my truncating should be fine?
Again, thank you so much! It was very helpful and I will try cutadapt, now.
cutadapt update:
qiime cutadapt trim-paired \
> --i-demultiplexed-sequences 16_1_1.qza \
> --p-adapter-f GTGCCAGCMGCCGCGGTAA \
> --p-adapter-r CCGYCAATTYMTTTRAGTTT \
> --o-trimmed-sequences cutadapt_1.qza
Based on the Quality Plot:
qiime dada2 denoise-paired \
--i-demultiplexed-seqs cutadapt_1.qza \
--p-trim-left-f 5 \
--p-trim-left-r 5 \
--p-trunc-len-f 250 \
--p-trunc-len-r 200 \
--o-table table_15.qza \
--o-representative-sequences rep_15.qza \
--o-denoising-stats denoise_15.qza
verbose:=== Summary ===
Total read pairs processed: 23,742
Read 1 with adapter: 22,526 (94.9%)
Read 2 with adapter: 22,510 (94.8%)
Pairs written (passing filters): 878 (3.7%)
Total basepairs processed: 14,292,684 bp
Read 1: 7,146,342 bp
Read 2: 7,146,342 bp
Total written (filtered): 517,919 bp (3.6%)
Read 1: 253,706 bp
Read 2: 264,213 bp
=== First read: Adapter 1 ===
Sequence: GTGCCAGCMGCCGCGGTAA; Type: regular 3'; Length: 19; Trimmed: 22526 times
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1
Bases preceding removed adapters:
A: 0.0%
C: 0.0%
G: 0.3%
T: 0.0%
none/other: 99.7%
Overview of removed sequences
length count expect max.err error counts
3 4 371.0 0 4
4 1 92.7 0 1
6 1 5.8 0 1
12 1 0.0 1 0 1
133 1 0.0 1 1
152 1 0.0 1 1
194 3 0.0 1 3
195 1 0.0 1 1
196 3 0.0 1 3
197 34 0.0 1 31 3
198 2 0.0 1 1 1
199 14 0.0 1 10 4
200 3 0.0 1 3
299 1 0.0 1 0 1
300 6 0.0 1 5 1
301 22450 0.0 1 18777 3673
=== Second read: Adapter 2 ===
Sequence: CCGYCAATTYMTTTRAGTTT; Type: regular 3'; Length: 20; Trimmed: 22510 times
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20 bp: 2
Bases preceding removed adapters:
A: 0.0%
C: 0.1%
G: 0.0%
T: 0.1%
none/other: 99.8%
Overview of removed sequences
length count expect max.err error counts
3 10 371.0 0 10
4 14 92.7 0 14
5 3 23.2 0 3
269 1 0.0 2 0 0 1
300 18 0.0 2 8 8 2
301 22464 0.0 2 14428 6759 1277
=== Summary ===
Total read pairs processed: 17,794
Read 1 with adapter: 17,003 (95.6%)
Read 2 with adapter: 16,956 (95.3%)
Pairs written (passing filters): 581 (3.3%)
Total basepairs processed: 10,711,988 bp
Read 1: 5,355,994 bp
Read 2: 5,355,994 bp
Total written (filtered): 342,705 bp (3.2%)
Read 1: 167,872 bp
Read 2: 174,833 bp
=== First read: Adapter 1 ===
Sequence: GTGCCAGCMGCCGCGGTAA; Type: regular 3'; Length: 19; Trimmed: 17003 times
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1
Bases preceding removed adapters:
A: 0.0%
C: 0.0%
G: 0.3%
T: 0.0%
none/other: 99.6%
Overview of removed sequences
length count expect max.err error counts
3 5 278.0 0 5
4 4 69.5 0 4
9 1 0.1 0 1
126 1 0.0 1 1
127 1 0.0 1 0 1
128 1 0.0 1 1
132 2 0.0 1 1 1
152 1 0.0 1 1
194 4 0.0 1 4
195 3 0.0 1 2 1
196 1 0.0 1 1
197 26 0.0 1 20 6
198 4 0.0 1 3 1
199 1 0.0 1 1
200 5 0.0 1 3 2
300 3 0.0 1 0 3
301 16940 0.0 1 14137 2803
=== Second read: Adapter 2 ===
Sequence: CCGYCAATTYMTTTRAGTTT; Type: regular 3'; Length: 20; Trimmed: 16956 times
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20 bp: 2
Bases preceding removed adapters:
A: 0.0%
C: 0.0%
G: 0.0%
T: 0.0%
none/other: 99.9%
Overview of removed sequences
length count expect max.err error counts
3 7 278.0 0 7
4 10 69.5 0 10
11 1 0.0 1 0 1
300 3 0.0 2 0 2 1
301 16935 0.0 2 10581 5279 1075
=== Summary ===
Total read pairs processed: 15,656
Read 1 with adapter: 14,964 (95.6%)
Read 2 with adapter: 14,894 (95.1%)
Pairs written (passing filters): 538 (3.4%)
Total basepairs processed: 9,424,912 bp
Read 1: 4,712,456 bp
Read 2: 4,712,456 bp
Total written (filtered): 318,505 bp (3.4%)
Read 1: 156,921 bp
Read 2: 161,584 bp
=== First read: Adapter 1 ===
Sequence: GTGCCAGCMGCCGCGGTAA; Type: regular 3'; Length: 19; Trimmed: 14964 times
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1
Bases preceding removed adapters:
A: 0.0%
C: 0.0%
G: 0.2%
T: 0.0%
none/other: 99.7%
Overview of removed sequences
length count expect max.err error counts
3 4 244.6 0 4
5 1 15.3 0 1
127 1 0.0 1 1
128 1 0.0 1 1
147 1 0.0 1 1
194 6 0.0 1 6
195 2 0.0 1 2
197 15 0.0 1 14 1
198 1 0.0 1 1
199 1 0.0 1 1
200 6 0.0 1 6
300 5 0.0 1 1 4
301 14920 0.0 1 12796 2124
=== Second read: Adapter 2 ===
Sequence: CCGYCAATTYMTTTRAGTTT; Type: regular 3'; Length: 20; Trimmed: 14894 times
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20 bp: 2
Bases preceding removed adapters:
A: 0.0%
C: 0.0%
G: 0.0%
T: 0.1%
none/other: 99.8%
Overview of removed sequences
length count expect max.err error counts
3 13 244.6 0 13
4 6 61.2 0 6
5 1 15.3 0 1
7 1 1.0 0 1
300 4 0.0 2 0 1 3
301 14869 0.0 2 9463 4533 873