How to remove bacteria seqs from archaeal seq analyses?

Hi all,

I am working on 16S seqs that were amplified an archaea-specific primer set; however, I found that the amplicons also comprised a large portion of bacteria in the taxonomic analysis using seqs prepared as described in qiime2. Those contamination of bacterial seqs will be obstacle the diversity analyses for the pure archaea sequence.

Is there any way to remove bacterial seqs from the rep-seqs and table that were generated through DADA2 step; such that I could make tree using the bacterial seq-deleted one, and continue the alpha and beta diversity analyses?
Thanks for all your cooperation.

You can assign taxonomy, then use filter-seqs: Taxonomy-based feature sequence filter. — QIIME 2 2021.4.0 documentation with --p-include Archaea

If you do the taxonomy against the full silva database it will include Archaea.

1 Like

Thanks for info.

I command with following codes for this work.

qiime filter-seqs
--i-sequences arch-seqs.qza
--i-taxonomy arch_taxonomy.qza
--p-include Archaea
--o-filtered-sequences seqs-no-bact.qza
that resulted in a problem as below.
** (1/1) Invalid value for '--i-sequences': Expacted an artifact of at least type FeatureData[Sequence]. An artifact of type SampleData[ParedEndSequencesWithQuality] was provided**.

Could you let me know what I have to do to solve this problem?

Hi! Looks like you are trying to run it on paired reads. Are you sure that

is an output from Dada2? If not, you should provide a rep-seqs.qza file that you obtained after Dada2.

I did not use rep-seqs.qza, but used just imported seqs. Yes, now i found that the rep-seqs is working for this. thanks for your comments.

Quick question,
why is it not used --p-exclude bacteria instead of --p-include Archaea if we would like to remove bacterial seqs from arch and bact mixed seqs?

Because it is shorter since most probably you would like to remove fungi and eukaryota as well.

Thanks for quick response.

It sounds that my seqs may contain also fungal and other eukaryotic seqs.
Let me make sure that "--p-include" has function to specifically select archaeal seqs from the sequences annotated as archaea, bacteria, fungi and other eukaryote. Am I correctly understanding?

Yeah, only sequences, which annotations include "Archaea", will be retained.

Thanks for your answer.

I also need to delete bacterial seqs from the table.qza generated from DADA2. For that, i have searched a possible plugin from tutorial as below.

qiime taxa filter-table
--i-table table.qza
--i-taxonomy taxonomy.qza
--p-include Archaea
--o-filtered-table table-with-phyla.qza

Do you think it would work for me?

It should. Give it a try and let us know if you encounter some problems with it :+1:

thanks for all your answers, which help me to do this.

Let me shift the topic to a basic question. .

Do you know why my seqs are affiliated with bacteria (13-90% of total seqs) more than archaea from several samples even though they were amplified using a archaea-specific primers and I extracted the Silva references with the primer sequences? Do you think it is primer issue or method problem (e.g., silva-ref-based classifier)?

Usually you may expect that archaea accounts for relatively small amounts of both taxa and abundances. Using bacteria specific primers will even more reduce archaeal amplicons, meanwhile archaea specific primers will help to amplify more archaeal reads. But probably they are just not specific enough to get rid of bacteria and eukaryota on the PCR step.
I am currently analyzing archaeal dataset with no better distribution by domains.

It makes sense for me. Thanks for your explanation.