Rescript Filteration for Uncultured/Envirnomental Seq and Taxa

Hi all,
Could you please tell me whether the Rescript plugin could remove uncultured/environmental sequences and taxonomy lineages from the database is generated?
I checked it but I realized it has a few filtrations like length and primer basis.
Appreciate let me know there is such a thing.
Thanks

You can use the commands as outlined in the Filtering data tutorial to filter by taxonomy, etc…

1 Like

Thank you very much dear @SoilRotifer,
I did not mean any modification onDADA2 artifacts. What I meant is taxonomy and sequences (database) are extracted by Rescript plugin. I need to remove uncultured/ environmental taxa/lineages and sequences.
The filtration methods in Qiime2 tutorial not work this situation because one of requirements is feature table, so in my case I do not need to remove something from my feature table or my rep seq. I just need to remove uncultured ones from database obtained from Rescript.

and

As an example, I need to remove such a taxa and its sequences (with accession numbers) from my database. Please look at the photo below:

  • I have no problem with my feature table and repseq at the moment.

Thank you very much indeed in advance.

The page I linked shows you can filter sequences based on taxonomy (w/o a table) via qiime taxa filter-seqs.

You should be able to use qiime rescript filter-taxa to filter your taxonomy.

1 Like

Thank you @SoilRotifer

I executed this command but I obtained the error:
qiime taxa filter-seqs \

--i-sequences mcrA-Unfiltered-RefSeq.qza
--i-taxonomy mcrA-Unfiltered-Tax-Ref.qza
--p-include "k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__uncultured; s__archaeon"; "k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__methanogenic; s__archaeon enrichment culture"; "k__Archaea; p__Euryarchaeota; c__Euryarchaeota; o__Euryarchaeota; f__Euryarchaeota; g__uncultured; s__methanogenic archaeon
"; "k__Archaea; p__Euryarchaeota; c__Euryarchaeota; o__Euryarchaeota; f__Euryarchaeota; g__uncultured; s__methanogenic archaeon
"; "k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__uncultured; s__archaeon
"
--o-filtered-sequences Seq-Database-Seq.qza

It is full script with error:
qiime taxa filter-seqs \

--i-sequences mcrA-Unfiltered-RefSeq.qza
--i-taxonomy mcrA-Unfiltered-Tax-Ref.qza
--p-include "k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__uncultured; s__archaeon"; "k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__methanogenic; s__archaeon enrichment culture"; "k__Archaea; p__Euryarchaeota; c__Euryarchaeota; o__Euryarchaeota; f__Euryarchaeota; g__uncultured; s__methanogenic archaeon
"; "k__Archaea; p__Euryarchaeota; c__Euryarchaeota; o__Euryarchaeota; f__Euryarchaeota; g__uncultured; s__methanogenic archaeon
"; "k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__uncultured; s__archaeon
"
--o-filtered-sequences Seq-Database-Seq.qza
Usage: qiime taxa filter-seqs [OPTIONS]

This method filters sequences based on their taxonomic annotations.
Features can be retained in the result by specifying one or more include
search terms, and can be filtered out of the result by specifying one or
more exclude search terms. If both include and exclude are provided, the
inclusion critera will be applied before the exclusion critera. Either
include or exclude terms (or both) must be provided.

Inputs:
--i-sequences ARTIFACT FeatureData[Sequence]
Feature sequences to be filtered. [required]
--i-taxonomy ARTIFACT FeatureData[Taxonomy]
Taxonomic annotations for features in the provided
feature sequences. All features in the feature
sequences must have a corresponding taxonomic
annotation. Taxonomic annotations for features that
are not present in the feature sequences will be
ignored. [required]
Parameters:
--p-include TEXT One or more search terms that indicate which taxa
should be included in the resulting sequences. If
providing more than one term, terms should be
delimited by the query-delimiter character. By
default, all taxa will be included. [optional]
--p-exclude TEXT One or more search terms that indicate which taxa
should be excluded from the resulting sequences. If
providing more than one term, terms should be
delimited by the query-delimiter character. By
default, no taxa will be excluded. [optional]
--p-query-delimiter TEXT
The string used to delimit multiple search terms
provided to include or exclude. This parameter
should only need to be modified if the default
delimiter (a comma) is used in the provided
taxonomic annotations. [default: ',']
--p-mode TEXT Choices('exact', 'contains')
Mode for determining if a search term matches a
taxonomic annotation. "contains" requires that the
annotation has the term as a substring; "exact"
requires that the annotation is a perfect match to a
search term. [default: 'contains']
Outputs:
--o-filtered-sequences ARTIFACT FeatureData[Sequence]
The taxonomy-filtered feature sequences. [required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr
during execution of this action. Or silence output
if execution is successful (silence is golden).
--examples Show usage examples and exit.
--citations Show citations and exit.
--help Show this message and exit.

                There was a problem with the command:                     

(1/1) Missing option '--o-filtered-sequences'. ("--output-dir" may also be
used)
k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__methanogenic; s__archaeon enrichment culture: command not found
k__Archaea; p__Euryarchaeota; c__Euryarchaeota; o__Euryarchaeota; f__Euryarchaeota; g__uncultured; s__methanogenic archaeon
: command not found
k__Archaea; p__Euryarchaeota; c__Euryarchaeota; o__Euryarchaeota; f__Euryarchaeota; g__uncultured; s__methanogenic archaeon
: command not found
k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__uncultured; s__archaeon
--o-filtered-sequences: command not found

How can I solve this?

I refer you to the documentation again for taxa filter-seqs. More importantly please make sure you also read the --help text associated with this command and the options available.

Note the example uses , as the default delimiter, and that there are no spaces between the delimiter. If there are spaces within taxonomy labels, then you should be place these taxonomy strings within quotes, but without spaces between them. Also, there is no need to use the entire taxonomy string as per the example in the documentation.

1 Like

Thank you very much dear @SoilRotifer,

I hopefully removed the wanted items from sequences reference with parameter below:
--p-include "k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__methanogenic; s__archaeon enrichment culture","k__Archaea; p__Archaea; c__Archaea; o__Archaea; f__Archaea; g__methanogenic; s__archaeon enrichment culture","k__Archaea; p__Euryarchaeota; c__Euryarchaeota; o__Euryarchaeota; f__Euryarchaeota; g__uncultured; s__methanogenic archaeon" \

I used the same --p-include items for the taxa filternation but I got a new error. I have checked out the method's help by qiime rescript filter-taxa --help; there is no details about deliminator or ... to solve this problem. Please give your valuable suggestion on this.

The Error:

I appreciate you much.

In this case, for qiime rescript filter-taxa, you’ll see: List[STR]. This means, that the command is expecting a list of strings separated by whitespace. Unlike the qiime taxa filter-seqs command which is delimited by , as a default.

2 Likes

Thank you very much. Have a great time ahead.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.