Qiime2 NCBI adapters / primers filter

Marco · July 28, 2025, 9:00pm

Hello to all,

I look for this topic and i couldn´t find it , so i hope it is not repeated.

I have the following question:

I´m downloading microbiome data from NCBI (SRR). I understand (from moving pictures tutorial) that NCBI data that came from Illumina sequences, have adapters and primers that are included in the downloaded data.

When using fastp I understand that those adapters and primers are removed, but i would like to know if quiime2 also do this process when using DADA2 or do i need another command in quiime2 to do that or it is not supported ?

Thanks =)

colinbrislawn · July 29, 2025, 4:33pm

Hello Marco,

Great question!

Yes, sequencing reads often includes non-biological data, so removing this is good.
You can use the Qiime2 cutadapt plugin for this:

Here is the twist: amplicon reads often include only the hypervariable region, with no adapters or primers at all!

How is this possible?

Well, we can reuse the PCR primers as sequencing primers, as popularized by the EMP protocol. In this case, the sequence matches the PCR product perfectly and we can run DADA2 directly on raw reads.

Marco · July 29, 2025, 5:24pm

thanks !!

I read about cutadapt, but the thing is that, if I understood well, it needs the user to enter the exact sequence to cut it, but, if i don´t have that information ?

is there a way that automatically detect those kind of data and cut them ?

Or in the NCBI that data should be specified ?

Thanks again =D

colinvwood · July 29, 2025, 5:51pm

Hello @Marco,

There are tools, such as fastp and fastqc, that can detect such adapters for you. They're usually called "overrepresented sequences" in the report. Neither of these tools is available through qiime2 at the moment, but there are plans to add fastp into a plugin in the next release.

Marco · July 29, 2025, 6:17pm

Hello @colinwood,

Perfect..
In the meantime i will add fastp to my pipeline and 'll be waiting for the next release to test it as a quiime2 plugin =D

thank you for all this information.

SoilRotifer · July 29, 2025, 7:18pm

Hi @Marco,

I'd like to add, that all you need is the PCR primer sequence. You do not need the adapter or other sequences. This is because much of the adapter sequence is prior to the PCR primer sequence. Thus, if you find and remove the primer with cutadapt, any sequence prior to that primer sequence (i.e. towards the 5' end ) will also be removed along with it.

Marco · July 29, 2025, 10:16pm

Hi @SoilRotifer ,

That sounds great, the thing is that in the NCBI i don´t see that data. For example, the following link is a bioproject using microbiome. If i go to the metadata part i dont´see the primer sequence or something related to.

https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=1&WebEnv=MCID_68894065bdae8f995c1806d9&o=acc_s%3Aa

That´s my actual issue. =D

SoilRotifer · July 29, 2025, 10:54pm

I am assuming there is a manuscript associated with this data, and it should detail the primers used. I'll just assume it is likely V4 or V3V4. Also, depending on the approach they used, there may or may not be primer sequence contained within the read.

Checkout these threads / posts:

Trimming and joining short reads resulted in few merged read on v4 region - #2 by SoilRotifer
Issues with Classifiers in QIIME 2 - Unusual Assignments Over 99% - #10 by SoilRotifer

-Mike

Marco · July 31, 2025, 3:53pm

Hello @SoilRotifer

Thanks for the response. I'll check the links.

system · August 31, 2025, 9:53pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.