Reduced number of reads after cutadapt trimming

Tania_Aires · October 11, 2022, 3:00pm

Hi all,

I'm relatively new to Qiime2 so, I'm sorry if this is a simple thing I could solve myself.
I am using qiime2-2021.4 and I have imported my demultiplexed data using a manifest. So far so good.

However, when trying to remove the primers, using the cutadapt plugin, I end up with much less reads per sample.

This is the command I've used:

qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux.qza \
--p-adapter-f 'GTGYCAGCMGCCGCGGTAA' \
--p-adapter-r 'CCGYCAATTYMTTTRAGTTT' \
--o-trimmed-sequences demux_trimmed.qza \
--verbose

I attached the verbose results and below I will paste the screenshots of the reads visualization, from the before and after primer removal:

Before primer removal:

After primer removal

Any idea about what could have happened?
Thank you so much for your help!
Tania

verbose_cutadapt.txt (49.2 KB)

SoilRotifer · October 11, 2022, 9:52pm

Hi @Tania_Aires,

Thank you for sharing the verbose output. This was quite helpful!

Cutadapt is doing a great job finding and removing your primers, as evidenced by the lines that look like this:

Total read pairs processed: 41,863
Read 1 with adapter: 39,783 (95.0%)
Read 2 with adapter: 40,782 (97.4%)

Your loss of sequences mainly has to do with these lines of the verbose output:

Pairs that were too short: 40,897 (97.7%)
Pairs written (passing filters): 966 (2.3%)

This is odd, as the default is --p-minimum-length 1. These appear to be the 515F-Y (5'-GTGYCAGCMGCCGCGGTAA) and 926R (5'-CCGYCAATTYMTTTRAGTTT), primers. Are you sure your sequencing approach actually sequences through the primer region? Some sequencing protocols do not, so there are no primer sequences to remove. Which means if anything is being trimmed and removed, is the result of spurious matching. Ask your sequencing facility which sequencing protocol they use. Though I could be wrong about these being spurious, and may not be the case given most of the reads where trimmed.

It could be that these primers are not correct, you are trimming internally into the sequence thus resulting in excessive trimming and loss of data as the final sequences are short. That is, you are essentially trimming out a sub region (using these primers) that are contained within another primer set.

I also highly recommend that you enable --p-discard-untrimmed, as any sequence that is not trimmed will remain as part of your output, which is what you do not want.

Tania_Aires · October 12, 2022, 3:11pm

Hi Mike
Thank you so much for your quick reply. Yes, these are the primers and I think the that was the sequencing protocol used here...I guess we would have been told...or not. I'll ask
In the meanwhile, after asking for your help, I dug deeper into this forum and found this slightly different cutadapt command:
qiime cutadapt trim-paired
--i-demultiplexed-sequences demux.qza
--p-front-f GTGYCAGCMGCCGCGGTAA
--p-front-r CCGYCAATTYMTTTRAGTTT
--o-trimmed-sequences trimmed_removed_primers.qza
--verbose
So, basically I replaced --p-adapter-f and --p-adapter-r by –p-front-f and -p-front-r. Then, I did:
qiime tools extract
--output-path trimmed_remove_primers
--input-path trimmed_removed_primers.qza
To check the reads and see if the primers were still in the beginning of the sequences. Apparently, they were successfully removed.
The reads before cutadapt:
Forward

Reverse

And after cutadapt:

Forward

Reverse

Anyway, I am, again, sending the verbose output from this new cutadapt
command, just in case I am missing something.
I will definitely enable --p-discard-untrimmed, thank you so much for the tip.
Thanks a lot for your help
Tania
verbose_cutadapt_Final.txt (47.3 KB)

SoilRotifer · October 12, 2022, 3:17pm

Great detective work @Tania_Aires!
Even a :qiime2: veteran like me missed this!

Based on your output, these lines tell me everything worked perfectly.

Total read pairs processed: 39,482
Read 1 with adapter: 37,096 (94.0%)
Read 2 with adapter: 37,984 (96.2%)
Pairs that were too short: 0 (0.0%)
Pairs written (passing filters): 39,482 (100.0%)

No worries. I always suggest this for the following reasons: you run the risk of retaining sequences that either contain spurious sequences, or primers that could not be detected due to sequencing errors. This would inflate ESV counts, etc...

-Keep up the good work!
-Cheers!

Tania_Aires · October 14, 2022, 7:32am

Thanks a lot for your help!!
Just finished the analysis and everything went perfect!
Thanks!
Tania

system · November 14, 2022, 1:32pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.