Qiime2 NCBI adapters / primers filter

Hello to all,

I look for this topic and i couldn´t find it , so i hope it is not repeated.

I have the following question:

I´m downloading microbiome data from NCBI (SRR). I understand (from moving pictures tutorial) that NCBI data that came from Illumina sequences, have adapters and primers that are included in the downloaded data.

When using fastp I understand that those adapters and primers are removed, but i would like to know if quiime2 also do this process when using DADA2 or do i need another command in quiime2 to do that or it is not supported ?

Thanks =)

Hello Marco,

Great question!

Yes, sequencing reads often includes non-biological data, so removing this is good.
You can use the Qiime2 cutadapt plugin for this:

Here is the twist: amplicon reads often include only the hypervariable region, with no adapters or primers at all!

How is this possible? :thinking:

Well, we can reuse the PCR primers as sequencing primers, as popularized by the EMP protocol. In this case, the sequence matches the PCR product perfectly and we can run DADA2 directly on raw reads.

thanks !!

I read about cutadapt, but the thing is that, if I understood well, it needs the user to enter the exact sequence to cut it, but, if i don´t have that information ?

is there a way that automatically detect those kind of data and cut them ?

Or in the NCBI that data should be specified ?

Thanks again =D

Hello @Marco,

There are tools, such as fastp and fastqc, that can detect such adapters for you. They're usually called "overrepresented sequences" in the report. Neither of these tools is available through qiime2 at the moment, but there are plans to add fastp into a plugin in the next release.

1 Like

Hello @colinwood,

Perfect..
In the meantime i will add fastp to my pipeline and 'll be waiting for the next release to test it as a quiime2 plugin =D

thank you for all this information.

2 Likes

Hi @Marco,

I'd like to add, that all you need is the PCR primer sequence. You do not need the adapter or other sequences. This is because much of the adapter sequence is prior to the PCR primer sequence. Thus, if you find and remove the primer with cutadapt, any sequence prior to that primer sequence (i.e. towards the 5' end ) will also be removed along with it.

1 Like

Hi @SoilRotifer ,

That sounds great, the thing is that in the NCBI i don´t see that data. For example, the following link is a bioproject using microbiome. If i go to the metadata part i dont´see the primer sequence or something related to.

https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=1&WebEnv=MCID_68894065bdae8f995c1806d9&o=acc_s%3Aa

That´s my actual issue. =D

I am assuming there is a manuscript associated with this data, and it should detail the primers used. I'll just assume it is likely V4 or V3V4. Also, depending on the approach they used, there may or may not be primer sequence contained within the read.

Checkout these threads / posts:

-Mike

1 Like

Hello @SoilRotifer

Thanks for the response. I'll check the links. :slightly_smiling_face:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.