changing colors of arrows in rPCA biplot graphs, show arrow legend on graph

Hello, I was just wondering if the emperor biplot user can change the color of the vector arrows from default gray to black on the actual rPCA biplot graph generated with DEICODE. Thanks!

Best wishes,

Carla

I don't know if it's possible to actually change the default color of the arrows, but a (kinda cumbersome) workaround is coloring the arrows by an arbitrary feature metadata field (say, their taxonomy annotations) and then manually changing each value's color to black:

... Of course, this is a pretty silly way of doing things (I imagine it'd be really annoying if you have more than a handful of arrows in the plot). Not sure if someone else here knows a better way of doing this :slight_smile:

2 Likes

Hi Marcus thank you! I am going to try that. I am trying to download qiime2R (with difficulty) to see if I can try ggplot2 or something. I will probably ask for help with installing qiime2R next LOL.

Kindly,
Carla

And plus, I think it is pretty amazing that we were able to get to the rPCA plot (LOL) just by following the tutorials. Your help is SUPER appreciated! I hope you are doing well.

Best wishes,

Carla

1 Like

And actually, now that I have your attention, how do we isolate just the species or just the genus in our graphs? I tried filtering a bunch of ways, but I only am able to get to the species level along with all of the other taxonomic categories, which is cumbersome for the graph. Is this what the tutorial means by “binning” according to taxonomy? I read we weren’t actually supposed to do that, but, for graphing it is kind of essential. Again thank you!

Hi!
You want to make it like this?

1 Like

Oh yes! What a nice graph! But I need my graph without the feature ID :slight_smile:
How did you do that? I tried filtering my data with this command:

qiime taxa filter-table
–i-table table_115k.qza
–i-taxonomy /local/ifs2_projdata//curanga/16s/ /reads/v6v4_taxonomy.qza
–p-include s__
–p-exclude eukaryota,unassigned,metagenome
–o-filtered-table v6v4f.qza**

But of course I get the whole taxonomic lineage. I am only interested in genus and species, both together on the graph and separate on the graph. Thank you!

So, you are working with collapsed tables, not with ASVs/OTUs?
I wrote a custom script in Python which modifies a feature table, rep-seq file and taxonomy file. Then one can create a new tree with modified feature names and use this inputs for biplots.
But it will not work for you if you are working with collapsed tables, it should be tweaked in a different way. But it is possible anyway, just never tried it

1 Like

Yes, that is correct. I am working with collapsed tables. I know that is not ideal, but for publication purposes we need to make the graph readable and understandable. Thanks for your help.

Best,
Carla

The biplot I posted above is performed on ASV level, it is why it contains a ASV ID and last available taxon in the name. We published it like this, and I think it is readable enough plus you are keeping higher resolution. If you are interested I will post code here

2 Likes

Really? Ok cool! I would love to read your code. By the way, I am trying to install Qiime2R and it seems that the packages “pheatmap” and “qiimer” need updating for compatibility with R version 4.0. Do you know who we would talk to about this?

Best,

Carla

I don’t have a lot of experience with R and QiimeR, so I don’t know. Mostly I used Python to tweak graphs.

The code below should be executed after DADA2 and taxonomy assignment. Written for Silva annotated files, for GreenGenes maybe you will need to modify it a little bit.
To execute it, you need to install Jupyter lab inside of Qiime environment, then launch Jupyter lab in activated Qiime environment

First cell

#import
import pandas as pd
from Bio import SeqIO

#input
table    = 'Data/Tables/table.qza'
taxa     = 'Data/taxonomy.qza'
rep_seq  = 'Data/rep-seq.qza'

#export rep-seqs.qza, table.qza and taxonomy.qza
!mkdir Biom Taxa Rep-seqs

!qiime tools export \
    --input-path $table \
    --output-path Biom/
!biom convert -i Biom/feature-table.biom -o Biom/feature-table.tsv --to-tsv

!qiime tools export \
    --input-path $taxa \
    --output-path Taxa

!qiime tools export \
    --input-path $rep_seq \
    --output-path Rep-seqs/


#replacing hashes with combination of taxonomy and beginings of the hashes
taxa_df = pd.read_csv('Taxa/taxonomy.tsv', sep='\t', skiprows=[1])
biom_df = pd.read_csv('Biom/feature-table.tsv', sep='\t', skiprows=1)

taxa_df['Taxon'] = pd.DataFrame(taxa_df['Taxon'].astype(str).str.replace(';__','').str.replace('[','').str.replace(']','')
                                .str.replace('.','').str.replace('/','_').str.replace("'",'').str.replace(' ','_')
                                .str.replace('uncultured_bacterium','unc_b').str.replace('uncultured_organism','unc_org')
                                .str.replace('uncultured','unc'))
taxa_df['Combo'] =  taxa_df['Taxon'].str.split("__").str[-1].str.split(";").str[-1]
taxa_df.loc[taxa_df['Combo'].str[:]=='unc_b','Combo']=taxa_df['Taxon'].str.split("__").str[-2].str.split(';').str[0] + '_unc_b'
taxa_df.loc[taxa_df['Combo'].str[:]=='unc_org','Combo']=taxa_df['Taxon'].str.split("__").str[-2].str.split(';').str[0] + '_unc_org'
taxa_df.loc[taxa_df['Combo'].str[:]=='unc','Combo'] = taxa_df['Taxon'].str.split("__").str[-2].str.split(';').str[0] + '_unc'
taxa_df.loc[taxa_df['Combo'].str[:]=='unc_marine','Combo'] = taxa_df['Taxon'].str.split("__").str[-2].str.split(';').str[0] + '_unc_mar'

for n in range(3,6):
    taxa_df.loc[taxa_df['Combo'].str[:]=='unc_b_unc_b','Combo']=taxa_df['Taxon'].str.split("__").str[-n].str.split(';').str[0]+'_unc_b'
    taxa_df.loc[taxa_df['Combo'].str[:]=='unc_unc_b','Combo']=taxa_df['Taxon'].str.split("__").str[-n].str.split(';').str[0] + '_unc_b'
    taxa_df.loc[taxa_df['Combo'].str[:]=='unc_b_unc','Combo']=taxa_df['Taxon'].str.split("__").str[-n].str.split(';').str[0] + '_unc'
    taxa_df.loc[taxa_df['Combo'].str[:]=='unc_unc','Combo']=taxa_df['Taxon'].str.split("__").str[-n].str.split(';').str[0] + '_unc'
    taxa_df.loc[taxa_df['Combo'].str[:]=='unc_unc_mar','Combo']=taxa_df['Taxon'].str.split("__").str[-n].str.split(';').str[0] + '_unc_mar'
    
biom_df['#OTU ID'] = taxa_df['Combo']+'|'+taxa_df['Feature ID']
taxa_df['Feature ID'] = biom_df['#OTU ID']
taxa_df = taxa_df[['Feature ID', 'Taxon', 'Confidence']]

### writing all files back
biom_df.to_csv('Biom/feature-table.tsv', sep='\t', index=False)
taxa_df.to_csv('Taxa/taxonomy.tsv', sep='\t', index=False)
fasta_hash  = r"Rep-seqs/dna-sequences.fasta"
fasta_combo = r"Rep-seqs/dna-sequences.fa"
hlist = biom_df['#OTU ID'].tolist()
with open(fasta_hash) as hashes, open(fasta_combo, 'w') as combo:
    for record in SeqIO.parse(fasta_hash, 'fasta'):
        for h in hlist:
            if str(record.id) in h:
                combo.write('>'+h+'\n'+str(record.seq)+'\n')
!rm $fasta_hash
!mv $fasta_combo $fasta_hash

Second cell

### creating new rep-seqs.qza, table.qza and taxonomy.qza with modified hashes
!biom convert -i Biom/feature-table.tsv -o Biom/feature-table.biom --table-type="OTU table" --to-hdf5
!qiime tools import \
    --input-path Biom/feature-table.biom \
    --type 'FeatureTable[Frequency]' \
    --input-format BIOMV210Format \
    --output-path Data/Tables/combo_table.qza
!qiime tools import \
    --type 'FeatureData[Taxonomy]' \
    --input-path Taxa/taxonomy.tsv \
    --output-path Data/combo_taxonomy.qza
!qiime tools import \
    --input-path $fasta_hash \
    --output-path Data/combo_rep-seqs.qza \
    --type 'FeatureData[Sequence]'

!rm -r Biom Taxa Rep-seqs

After it, one can use the tweaked files to create a biplot as usual, or any other analysis in Qiime2, but it will contain modified annotations instead of standard ID hashes.

2 Likes

Neither pheatmap or qiimer are required by qiime2R or any of its dependencies to my knowledge. There should not be an any issues installing for R 4.0. Feel free to message me with the error you are getting if you need help installing.

1 Like

It looks like everyone has already addressed things pretty thoroughly :), but just to sum up -- as @timanix's code shows, it's possible to get the best of both worlds and avoid collapsing the table while still showing the taxonomy assignments in the biplot. Ideally this would be doable through Emperor (there's an open issue on it here), but for now I believe it'll require some external work (either relabeling the features before running DEICODE/Emperor, or moving the biplot into something like QIIME2R).

One caveat with the "relabeling features" solution is that their IDs need to be kept unique -- otherwise things will probably start breaking or behaving unexpectedly. @timanix's code handles this by including the original feature ID, which is a nice way to make sure that the feature IDs remain unique after including taxonomy stuff (since it's very plausible that multiple features could have the same taxonomic annotation, especially when "features" are ASVs). There's also been some discussion here before about how to handle including feature IDs in publications, in case you're interested. :qiime2:

2 Likes

Hi I was able to install the latest pheatmap and qiimer versions before installing Qiime2R and then it installed nicely, thank you!

Aha! Thank you Marcus! I am playing with Qiime2R as we speak. I guess R is my comfort zone, although I think I like the idea of moving outside of my comfort zone and looking into python a little more. I super appreciate your help!

Carla

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.