I have a question about how to import my sequences. We have single fastq files are already demultiplexed and adapters were remove. How is the correct protocol to import the sequences and remove the primers when we are using five primers during the pcr?
Good morning Sergio,
That’s a lot of primers! How do you plan to process all of them!
I don´t have idea. Do you have some idea?
Working with multiple marker genes is hard, which is why I usually use different primers to target different kingdoms of organisms (bacteria vs fungi), but try not to mix and match primers when working with a single group of microbes.
The most direct approach is to use closed-ref database search to match reads against known microbes. This ‘old school’ method is imperfect… but fast and easy. I would start there.
There’s a technique call Smurf for scaffolding reads using kmer based alignment where they do an iterative demultiplexing.
The pro is that it is (in theory) a really cool method and is probably better than closed reference picking. (Insert obligatory skeptical comment about species level resolution, known databases, and 16s rRNA sequencing in general). In theory, you should be able to run it on any database with any set of primers you’re interested in. In practice, it only runs in Matlab which is proprietary and has proven difficult to run on anything other then their example data.
So, I think @colinbrislawn’s recommendation is probably a good one. Or, I might look at a single hypervariable region.
Since you still have the primers attached, it would be straightforward to use q2-cutadapt to trim primers and split your data by primer set all in one go, as @colinbrislawn suggested. Use
qiime cutadapt trim-paired (or
trim-single if your reads are not paired). See the
--help documentation for more details.