extract merged reads containing primer sites

splaisan · November 27, 2020, 2:08pm

I would like to extract those merged reads (from paired illumina shotgun sequencing reads) that contain the priming sequences for two 16S primers 314F and 805R.

I thought of using the command used to produce a reference subset for classifier creation but the data type of my reads poses problem.

Here is what I tried:

merge paired reads using bbmap to get longer sequences encompassing the two 16S primer sites (V3-4)
import the reads as merged reads

qiime tools import
--input-path reads/merged_manifest.csv
--output-path merged_demux.qza
--type SampleData[JoinedSequencesWithQuality]
--input-format SingleEndFastqManifestPhred33
feed the qza to extraction

qiime feature-classifier extract-reads
--i-sequences merged_demux.qza
--p-f-primer "CCTACGGGNGGCWGCAG"
--p-r-primer "GACTACHVGGGTATCTAATCC"
--p-n-jobs 24
--p-read-orientation 'forward'
--o-reads 314f-805r-merged_demux-seq.qza

I get:

There was a problem with the command:
(1/1) Invalid value for '--i-sequences': Expected an artifact of at least
type FeatureData[Sequence]. An artifact of type
SampleData[JoinedSequencesWithQuality] was provided.

Any idea how I could achieve my goal using qiime?
Should I import my reads using another format? (can I even have qualities here?)

jwdebelius · November 29, 2020, 12:30am

Hi @splaisan,

The feature-classifier expects reads without a quality score. I'm not sure if it would work, but you might try trimming paired end reads before joining using cutadapt. As a disclaimer, I'm not actually sure if this will work, but its worth a try on at least a subset.

Best,
Justine

system · December 30, 2020, 6:30am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.