Which are the correct parameters to set for this run?

Hello qiime users,

I apologize if there is some missing, wrong or confused information.
I am using qiime2-2021.4 on linux
I am processing COI data from a 2x250 Illumina MiSeq run. Samples were never multiplexed, so I proceeded after importing with primer removal using cutadapt. For the run was used Leray (2013) primers that should have a length of 313 bp.
Below is my sequence quality profile and the parameter that seems to retain the most sequences at each step, but also reduces the sequence length to 234 bp.

Do you have any suggestions for DADA2 filtering/merging/denoising step for fixing the problem?
Can you see any problem with the sequencing run or library preparation? it is necessary to rerun that samples?
I used these commands
qiime cutadapt trim-paired --i-demultiplexed-sequences paired-end-COI.qza --p-cores 8 --p-front-f GGWACWGGWTGAACWGTWTAYCCYCC --p-front-r TAAACTTCAGGGTGACCAAAAAATCA --o-trimmed-sequences COI-trimmed.qza --verbose

qiime dada2 denoise-paired --p-n-threads 8 --i-demultiplexed-seqs COI-trimmed.qza --p-trunc-len-f 205 --p-trunc-len-r 179 --o-table table-dada.qza --o-representative-sequences rep-seqs-dada.qza --o-denoising-stats denoising-stats-dada.qza --verbose


The last sample (T) is a mock community of known species that always gives no merged sequences.

I’ve tried adjusting the parameters (varying truncation lengths, increasing --p-max-ee-f and --p-max-ee-r numbers up to 15, trimming off 36 bp)

Are these good data to be processed?
Could you suggest me something for improving my results? Is this the best I can expect from my data?

Thank you in advance for your reply. I hope that can be useful for all qiime2 users.

denoising-stats-dada.qzv (1.2 MB) table-dada.qzv (521.8 KB) rep-seqs-dada.qzv (343.8 KB)

Hi @Stefano,

Your cutadapt command needs to be updated to include --p-discard-untrimmed --p-match-adapter-wildcards, otherwise you'll end up with mixed length output. See:

After you've fixed and ran the updated cutadapt command, and if you still have low quality base calls at the beginning, I'd suggest adding the --p-trim-left-f and --p-trim-left-r flags. It is quite baffling that the quality scores are so low at the beginning. But this might be a partial artifact of the cutadapt output. :man_shrugging:

2 Likes

Hi Mike, thank you for your message!
I have tried to update the command as you suggested:
qiime cutadapt trim-paired \

--i-demultiplexed-sequences paired-end-COI.qza \

--p-cores 8 \

--p-front-f GGWACWGGWTGAACWGTWTAYCCYCC \

--p-front-r TAAACTTCAGGGTGACCAAAAAATCA \

--p-match-adapter-wildcards \

--p-discard-untrimmed \

--o-trimmed-sequences COI-trimmed.qza \

But results are not encouraging :frowning:

What do you think? thank you again!

That is very strange. Can you share the QZV files for both the initial dada2 output you showed above and the recent cutadapt output? You can DM me privately if you do not want to share publicly.

-Mike

Thank you for sharing the QZV files @Stefano! I do not see anything out of the ordinary, in the provenance. Looking through your rep-seqs-dada.qzv file, I noticed a lot of length variation in the sequences. I also, ran BLAST on a few random sequences via the visualizer, and many of them are not "hitting" anything too well. If they do, they are short fragment hits to the same reference. I think one the the best hits I've seen is an Annelid.

I am wondering if much of this is contamination, or a bad run? This data contains a bit more off-target sequences than I've seen with COI data.

One sanity check you can do, is to forgo cutadapt (on the off-chance that is causing problems), and just use DADA2's trim options to trim the region where the primer would reside? :thinking:

Perhaps @devonorourke, may have some suggestions?

Quite the mystery here @SoilRotifer and @Stefano :face_with_raised_eyebrow:

Generally, I've found it easier to usually try running Cutadapt in it's native form (outside of the bundled QIIME commands) just to make sure that when I'm reading the Cutadapt documentation, I know exactly what I should be expecting. Because it's so quick, I end up doing the work with QIIME ultimately, but I tend to test it outside of QIIME first.

Because you have a mock community, I'd start there. How many different taxa are expected? What happens if you take the untrimmed R1 data for the mock samples and try dereplicating? If you look at the most prevalent sequences, are the generally of the taxa you expected? You might even try an even faster alternative using something like Mash Screen (see docs here too) - especially if you have a database of known sequences.

Admittedly, that's not an ideal approach, and it doesn't address your question about adapter trimming. However, if you can at least start by confirming that you have the kinds of taxa that you expect (and not a lot of sequences that are from things you know shouldn't be there) it might help guide your next best step.

Looking forward to hearing how this turns out

2 Likes