q2-cutadapt output

Hi,

Firstly, thank you to the moderators for always responding so promptly and being so helpful with queries.

My question is in relation to the q2-cutadapt output (attached). I find that about 50% of reads from all my samples are removed after this step. If I understand correctly, this is because the primers were not detected in these removed reads. I work on 2 x 300bp (Illumina MiSeq V3) 16S reads. Is this ‘normal’ for Illumina reads or am I losing too much?

The parameters I use are:

qiime cutadapt trim-paired
–i-demultiplexed-sequences # filepath of demultiplexed artifact
–p-cores 16 \
–p-front-f Forward primer sequence
–p-front-r Reverse primer sequence
–p-overlap 10
–p-error-rate 0.2
–o-trimmed-sequences filepath/primer-trimmed-demux.qza \
–verbose

Please find attached the demux.qzv file and the output report.

Thank you and I look forward to your response!

all-demux.qzv (290.3 KB)
primer-trimming.zip (21.7 KB)

I haven’t done too much with cutadapt yet, but I’ll do some research and see if I can point you in the right direction, @jessica.song.

Hi @andrewsanchez,

Any help would be very much appreciated :slight_smile: Thanks again for taking the time to respond.

I have recently also trimmed the primers off a cDNA dataset using the same parameters and primer sets and yielded very similar results (~50% reads without primers detected and removed). I wonder if it is an issue with the parameters I have chosen or if it could be the quality of my dataset (both of which were Illumina MiSeq V3 2x300bp reads). Without a proper reference, it’s hard to tell what is acceptable. What concerns me is that after both adapter removal and trim/filter steps, I am left with only 30 - 40% of the total reads I started with. I understand that there is no golden rule for all of this but am I losing too many sequences here?

Looking forward to hearing what you think. Thanks!

Jessica

Jessica

1 Like

One simple thing you can try is to go with the default maximum allowed error rate (--p-error-rate) instead of increasing it, as you have done. And if you haven’t done this already, I would also suggest taking a deeper look at cutadapt’s documentation regarding the minimum overlap parameter to make sure you’re choosing a number there that makes sense.

Let’s see how that goes. Not sure how big of a difference this will make. There is some more relevant discussion here as well:

I have actually tried running the adapter trimming also on default parameters which if I am not mistakened sets the max. error rate at 0.1 and minimum overlap at 3 nt which appears to make a marginal difference (see below: attached).

trimmed-output-report(default).zip (21.1 KB)

I, of course, have also tried other combinations of parameters but found that I always yielded a similar result. I have also tried primer trimming just using dada2. However, having executed both pipelines and comparing them, it seems that the reads trimmed with dada2 are comprised of much more ‘Unidentified Bacteria’ in downstream taxonomic analysis as opposed to those trimmed with cutadapt (44% vs. 0.01% of total communities, respectively).

As I am still very new to all of this, I am not equipped to make any sort of educated guess as to why this might be the case. I wonder if perhaps you could shed some light on this.

I am not quite sure how to proceed with my data or if it is okay to do so having lost so many sequences. Could the quality of my sequences be the root of my problem?

I realize that I pose quite the open question and apologize for this. As you have probably already gathered, I need a lot of help in this department but I thank you for sharing your experience and taking the time!

1 Like

Hello @jessica.song,
Sorry for the late response on our part!
Andrew is out of the office this week, so he has been un able to answer your questions. He will be back soon.

1 Like

Hi @jessica.song,

Thank you for your patience! :pray:

Are you able to provide the SampleData[PairedEndSequencesWithQuality] artifact output by q2-cutadapt? I’d like to take a look at the provenance for cutadapt’s output if possible.

After looking at your logs again, I’m curious if disallowing indels is appropriate in your case (https://cutadapt.readthedocs.io/en/stable/guide.html#error-tolerance). Same question for discarding reads in which no adapter was found.

I’m also wondering if you should specify an anchored 5’ adapter as seems to be suggested in the cutadapt docs (https://cutadapt.readthedocs.io/en/stable/guide.html#anchored-5-adapters).

There are also some more fine grained parameters exposed for paired end data which may be worth taking a look at (https://cutadapt.readthedocs.io/en/stable/guide.html#filtering-paired-end-reads).

Unless your adapter may also occur in a degraded form, you probably want to use an anchored 5’ adapter.

This is certainly a possibility. Sometimes it is possible to recover lost data, but it usually takes a deeper understanding of your data and the tools like cutadapt. It seems to me that there is simply no way around digging deep into cutadapt’s documentation to ensure your cutadapt command accurately reflects your adapter layout.

I’m happy to try and assist further on this after you get back me about where you are at now, since it has been awhile. Looking forward to hearing back. Also, take a look at this forum search for related topics.

1 Like