how can I visualize sequence data?

Nizar_Goda · May 30, 2019, 9:04am

thank you so much i already solved it, i have another question regarding taxonomy i used the green gene and i have problem that approximately 43.271% not assign to family and genera, i wanted to know if there is a command i can export the specific OTU with the sequence for this taxa then i will able to blast to, i attached screenshot for the taxonomy, thank you

timanix · May 30, 2019, 11:27am

Hi!
I am not sure for 100%, but I guess that here you can see pooled hashes that all were assigned to this taxonomy. You can open taxonomy.tsv file (from taxonomy.qza) and find there all hashes that are assigned to this bacteria, after it find all the representative sequences in dna-sequences.fasta (rep-seqs.qza) by hashes. After you can blast it.

Nizar_Goda · May 30, 2019, 12:08pm

unfortunately, it does not work so there is no other way to solve this problem ?

Nicholas_Bokulich · May 30, 2019, 2:29pm

@Nizar_Goda,
Please read the tutorials. The moving pictures tutorial describes exactly what you are trying to do.

Note: you are not getting family-level classification because there is not a perfect match in the database, or too many matches that have different family-level taxonomies. Using BLAST to find the top hit can give you some more clarity on what this might be, but I would discourage you from relying on that BLAST result, especially for publication/reporting.

jwchen · June 3, 2019, 11:37am

Hi @Nizar_Goda,

This is how I export ASV sequences given a specific taxon in python API.

import pandas as pd
from qiime2 import Artifact

def taxon2fasta(taxonomy, sequences, taxon, path):
    '''
    taxonomy is an artifact of type FeatureData[Taxonomy]
    sequences is an artifact of type FeatureData[Sequence]
    taxon is the annotated OTU we are interested in. input string
    path is where to export the fasta files. input string
    '''
    # convert FeatureData[Taxonomy] to pandas dataframe
    df_taxon = taxonomy.view(pd.DataFrame)

    # filter ASV that were annotated to 'taxon'
    df_taxon = df_taxon.loc[(df_taxon.loc[:,'Taxon'] == taxon)]

    # convert FeatureData[Sequence] to pandas series
    ser = sequences.view(pd.Series)

    # filter seqs that were annotated to 'taxon'
    ser_taxon = ser[df_taxon.index]

    # covert filtered seqs to artifact
    taxon_seq = Artifact.import_data('FeatureData[Sequence]', ser_taxon)

    # export fasta files to given path
    taxon_seq.export_data(path)

And you can import your .qza files using Artifact module like this:

seqs = Artifact.load('path_of_your_qza_file.qza')

Hope this helps.

By the way @Nicholas_Bokulich , can you explain more about this or is there any reference?
Thank you.

Jiung-Wen

Nicholas_Bokulich · June 3, 2019, 11:48am

most microbiome sequencing methods rely on rather short DNA fragments — e.g., the V4 domain of the 16S rRNA gene — which contains limited taxonomic information on its own. This is very frequently insufficient to truly classify to species level, and using NCBI BLAST can provide misleading results if you BLAST, take the top hit, and move on without ensuring that there are not other equally (or nearly) good hits as well. BLAST is fine if you carefully consider the other hits: LCA methods — like the classify-consensus-* methods in QIIME 2's q2-feature-classifier — use BLAST or other aligners for database searching, but then consider what taxonomic consensus there is among the top hits to determine, e.g., whether multiple species are hit and whether that sequence can truly be classified to species level.

jwchen · June 4, 2019, 5:19am

Thanks for your prompt and detailed explanation!!

system · July 5, 2019, 11:19am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.