Losing a ton of reads post demultiplexing with cutadept paired end

msport469 · April 30, 2020, 8:26pm

I am losing a ton of reads post demultiplexing my paired end reads. I have three runs I want to merge (and plan on merging feature tables post-dada2). I import the runs with:

qiime tools import \
--type MultiplexedPairedEndBarcodeInSequence \
--input-path '/scratch/msportie/rprc1/run1' \
--output-path run1import.qza

qiime cutadapt demux-paired \
  --i-seqs '/scratch/msportie/rprc1/run1import.qza'  \
  --m-forward-barcodes-file '/scratch/msportie/rprc1/run1_Mapping.tsv' \
  --m-forward-barcodes-column BarcodeSequence \
  --p-batch-size 0 \
  --o-per-sample-sequences 4.28.20.demultiplexed-seqs.run1.qza \
  --o-untrimmed-sequences 4.28.20.untrimmed.run1.qza \
  --verbose

and I get this attached file in the verbose output. myout-demux.run1.4.28.20.txt (75.2 KB)

The problem is, I'm missing a ton of reads. Here is an excel sheet that demonstrates what I'm missing: https://rochester.box.com/s/ai8l53ojiilsvpbceb79g3x0rdnsa32v . I've shown that all the samples are in the mapping txt file (run1_Mapping.tsv (39.8 KB) )

This is happening for all three runs I want to eventually combine. What's the deal? Thanks in advance for your help. I should note, I still keep many of the files, 77 out of the desired 187.

thermokarst · April 30, 2020, 9:03pm

Hey @msport469, I wasn't able to download the excel file you linked to - I don't have the necessary permissions to read.

Have you had a close read of the log file you attached? Most of the adapters don't appear to be matching. Now would be a good time to review the cutadapt docs, they will guide you on the specifics of how to specify the adapter labels, which you have included in your sample metadata file:

https://cutadapt.readthedocs.io/en/stable/

msport469 · May 1, 2020, 3:30am

Sorry about that, I've corrected that problem:

https://rochester.box.com/s/ai8l53ojiilsvpbceb79g3x0rdnsa32v

I'll take a look at docs, but I'm using the cutadept plugin within qiime, as i've supplied above. am i missing something? thanks.

msport469 · May 1, 2020, 1:50pm

I guess @thermokarst I'm still confused. Should qiime be able to remove these adapters with the script I provided?

thermokarst · May 1, 2020, 2:15pm

Probably - if cutadapt can, then q2-cutadapt most likely can. q2-cutadapt is just a thin wrapper around cutadapt. Take a look at the cutadapt docs to learn more about how cutadapt works, then, you can apply that knowledge to q2-cutadapt. You probably need to experiment with the different adapter types available in cutadapt. Keep us posted.

PS - I just want to remind you about the QIIME 2 Forum Code of Conduct, in particular, this section:

https://forum.qiime2.org/faq#patience

Sending multiple messages just bogs down our moderator queue, slowing down the time in which we can respond to you, and others. Thanks!

msport469 · May 1, 2020, 2:45pm

Sorry, I was just trying to give an update that I had read through the user docs you provided and still don't quite understand.

I guess my question is more basic than that:

I have raw sequences multiplexed. The above script will demultiplex them, but if I understand you correctly, the linker primer sequences will still be on the sequence and need to be removed? I see that the qiime2 cutadept plugin can do this if I provide it a set sequence, but my linkerprimersequences are of varying lengths and slightly different? Do you know why this would be or how qiime would handle this issue?

Hopefully I phrased my question better this time, I apologize for any confusion I caused.

thermokarst · May 5, 2020, 2:27pm

Hi @msport469!

I haven't commented on any of the specifics of your situation, so there might be a misunderstanding here - from my interpretation of the cutadapt docs, this will vary depending on the arrangement of the adapters, and the types of adapters you tell cutadapt to search for (linked, anchored, etc).

Depending on the adapter arrangement, you might need to run the demuxed reads through cutadapt trim-paired to remove the remaining adapters, but, that really depends on how the adapters are arranged, and how you "declare" the different adapters when running q2-cutadapt/cutadapt (linked, anchored, etc).

The cutadapt docs will be your best source of information, as well as ascertaining the specific adapter layout and design with your sequencing center. I'm not a cutadapt expert - I always find myself re-reading the cutadapt docs any time I am doing anything with q2-cutadapt. Good luck!

msport469 · May 5, 2020, 2:40pm

Thanks for your time,

i've trimmed everything using cutadept paired after cutadept demux pairing command.

Sorry, I seem to have really gotten off track from my main question. That is my fault. My main question is this: after the demuxing part, a lot of reads drop out. When I change the error rate from the default of 0.1 to 0.5 (out of 1), I keep basically all reads. Can you point me to a resource that talks about deviations from default error rate? The cutadept docs describe what the error rate actually means, but not what is acceptable/recommended practice.

Sorry for the confusion, and thank you!

thermokarst · May 5, 2020, 2:52pm

I don't think so - this is all related! You appear to have very low matching for some of your adapters:

So, looping back into what I just shared earlier today:

Making sure that you have specified your adapters in a way that accurately reflects the adapter layout will significantly improve the recovery. You will likely not need to adjust the error rate, once you have dialed in your adapters.

msport469 · May 5, 2020, 5:02pm

Hi,

Yeah I really thought I understood my adaptors correctly to be both 5' and therefore I used the p-front-f and p-front-r commands, but maybe I'm just wrong about this... I'll read again and stay in touch. Thanks.

msport469 · May 11, 2020, 6:01pm

Hi I've looked more into it and still have no idea what's going on. Can you explain what you mean by "most adapters don't appear to be matching"? The trimming seems fine, it's the cutadept demux command that I'm losing everywhere. When I increase error rate from default of 0.1 to 0.2, then everything passes thru, but then later during dada2 everything gets dropped again basically.

Is there anyone you can point to that has expertise in the demux step of cutadept?

thermokarst · May 11, 2020, 9:35pm

Ah bummer

Sure! Open up the log you shared in your original post:

If you review that file, you will see the cutadapt verbose log, explaining how many times your various adapters were found. There are many adapters that have very low "trimmed" counts, for example:

=== First read: Adapter D160MZ1P.05 ===

Sequence: ACTTTAAGGGTGGTGGTATGGGAG; Type: regular 5'; Length: 24; Trimmed: 8 times; Reverse-complemented: 0 times

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-24 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
14	4	0.1	1	0 4
15	2	0.0	1	2
16	2	0.0	1	0 2

and

=== First read: Adapter D160MZ5W.05 ===

Sequence: ATGTCCGACCAATGTTGCGTTTCT; Type: regular 5'; Length: 24; Trimmed: 0 times; Reverse-complemented: 0 times

and

=== First read: Adapter K160N8FF.04 ===

Sequence: CGCGGTTACTAATTAACTGGAAGC; Type: regular 5'; Length: 24; Trimmed: 1 times; Reverse-complemented: 0 times

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-24 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
15	1	0.0	1	0 1

Just scrolling through I see at least two dozen adapters that have < 100 reads matching, which is very, very low. Usually when I see recovery like that, it is traced back to some kind of issue with how you are telling cutadapt to search for those adapters. It can be a typo, or it can be that you aren't specifying the right kind of adapter (anchored, linked, etc).

I would start with consulting with your sequencing center - they should be able to help you clear up any confusion about the sequencing design.

system · June 12, 2020, 3:36am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.