taxonomic classification with two marker genes in one fasta file

Alfred.Burian · February 9, 2021, 6:51am

I am analysing soil samples for two different marker genes (bacteria and fungi) and my sequencing service provider has some unusual technical procedures: Instead of running bacteria and fungi samples separately, they combined - for each sample - fungi and bacteria first-stage PCR products and then used both together for Illumina library preparation.

In a nut-shell, I ended up with fasta files that contain two different marker genes. Has anybody tips how I can deal with that during my taxonomic classification (using two data bases, SILVA and UNITE)?

Would it work to first extract reference sequences for both silva and unite for respective primers and then merge reference sequences (using merge-seqs?) and reference taxonomy files (any idea how?) and then train the classifer on the joint files?

Thanks for any help!!

jwdebelius · February 9, 2021, 3:55pm

Hi @Alfred.Burian,

I don't have a good database recommendation, but you might look at using cutadapt to slip the sequences, using the primers, using the --p-untrimmed parameter. (Note that this only works if the sequencing center hasn't trimmed your primer sequence; but it's worth trying).

One example of that discussion is here:

You might also searching for cutadapt on the forum for more discussion.

Best,
Justine

system · March 12, 2021, 10:30pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.