Removing non-biological sequences from raw reads

Parix · February 5, 2020, 10:27am

Hi,

This is a very simple question. I have received some reads which I am not sure if they still contain primers, adapters or any type of non-biological sequences. I want to first check if these non-biological sequences exist, and secondly remove them before downstream analysis. Is there a straightforward way to perform the mentioned tasks in Qiime?

timanix · February 5, 2020, 10:42am

Hi!
Just in case if nobody will provide a better answer, i checked the dataset after importing and before primer removal step by this command:

qiime demux summarize \
    --i-data demux-paired-end.qza \
    --o-visualization demux-paired-end.qzv

then opened it in a browser and searched for partial sequences of possible primers / adaptors manually.

If you’ll find primers you cane remove them by cutadapt
https://docs.qiime2.org/2019.10/plugins/available/cutadapt/

Parix · February 5, 2020, 3:39pm

Hi and thanks for your answer.
I was wondering how did you search for primers/adapters manually?

timanix · February 5, 2020, 3:44pm

I knew the region so I just took commonly used primers and searched in the opened in browser .qzv file for partial sequences of those primers. In my case I confirmed that primers are deleted and proceeded with the analysis.

Parix · February 5, 2020, 4:10pm

But how do you see the sequences in demux.qzv?
Because all I see is quality scores.

timanix · February 5, 2020, 4:20pm

Sorry, I think I copied wrong command - I already deleted this part of the analysis from my laptop and I can’t access a working machine this week.
Probably you need this one:

qiime feature-table tabulate-seqs \
    --i-data rep-seqs.qza \
    --o-visualization rep-seqs.qzv

I remember that I was able to run it with my .qza file before DADA2 in older version of Qiime2

Nicholas_Bokulich · February 5, 2020, 7:37pm

The easiest one-step procedure may just be to run q2-cutadapt without looking. If primers or adapters are present, you will see a reduction in read length. If not, then there should be no effect!

You would need to export the sequences and look at them directly.

timanix:

qiime feature-table tabulate-seqs \
    --i-data rep-seqs.qza \
    --o-visualization rep-seqs.qzv
I remember that I was able to run it with my .qza file before DADA2 in older version of Qiime2

tabulate-seqs can't help you here, since it requires a FeatureData[Sequence] artifact as input. You can only examine the fastq sequences by exporting them from QIIME 2. Nothing wrong with that... just export and then search for your primer sequences. One reason to just use q2-cutadapt for this is that it will perform a search for degenerate primers so that you don't need to.

ben · February 5, 2020, 7:58pm

Also, question, generally do you know these were processed? Illumina and also the V-region sequenced?

system · March 8, 2020, 2:08am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.