Merging per-sample BLASTn results to metadata

UnevenCuttlefish · February 16, 2024, 1:05am

This might be a easy question to answer but I figured I'd ask anyways.

I have two local BLASTn databases that I've blasted my post-deblur sequences against successfully and gotten the associated information from of which of my sequences are similar to the databases. Now what I would like to do is get a per-sample count of how many unique ASVs appear in each sample according to the BLAST result lists.

basically what I want is this

Sample ID | Database 1 | Database 2
Sample 1 | 23 | 4 |
Sample 2 | 76 | 39 |

I'm not sure if this is possible within QIIME2 but I figured I would ask!

-UC

Nicholas_Bokulich · February 16, 2024, 7:11am

Hi @UnevenCuttlefish ,

It is possible to merge metadata column-wise with QIIME 2, e.g., by using qiime metadata tabulate with multiple metadata inputs to make a merged visualization, or by using the Artifact API to merge Metadata objects.

But what you want is quite specific and is not a simple merge, as it sounds like you want to merge multiple blast6 hit tables generated outside of QIIME 2, which would contain redundant columns (and you probably just want to merge specific columns, not all of them). Such an operation is best done with Python or R, so that you can trim out the information that you want and merge it together. QIIME 2 could for sure be part of that process (e.g., to pass the trimmed data to tabulate as described above), but it does not have any actions for trimming a metadata file in the way that you need or renaming metadata columns to give custom names. So I recommend doing that outside of QIIME 2, and then just passing those files as input to the tabulate action to view as a QZV.

Good luck!

UnevenCuttlefish · February 16, 2024, 4:37pm

Hi @Nicholas_Bokulich

Thank you for your reply! I figured this would fall out of QIIME2's purview but figured I'd ask anyways. If I could ask one more question, does QIIME2 offer the ability to see a per-sample breakdown of ASV frequency? As in, would I be able to pick sample 2 and get a frequency of ASV01 found within that sample? I would imagine not, but my worry is going back to my original fastq.gz files and running a custom script to filter through the matches by sequence would overestimate the abundance since it would be pre-analysis filtering, and then it would be difficult to then match those back to the original hex-code names provided further in the pipeline.

Edit (because it's early):I realize that a collapsed FeatureTable[Frequency] could likely work for what I need. Silly me

-

Nicholas_Bokulich · February 16, 2024, 6:49pm

Yes this information is found in the feature table. The best way to get such specific information would be to view this as a data frame (e.g., a pandas.DataFrame) and then you can index specific rows/columns.

The collapsed feature table would work for sure if you want to see the overall frequency of an ASV or the number of sequences found in a sample (information also found in the output of feature-table summarize). But it would not tell you the frequency of ASV X in Sample Y. For that you need to look at the feature table itself.

Maybe of interest: you could transpose your feature table (so that feature IDs are the row labels) and then merge it with your blast results if you want to look at frequency per sample side-by-side with the hits.

Good luck!

UnevenCuttlefish · February 16, 2024, 8:06pm

@Nicholas_Bokulich

Thank you again! this is in fact the way to go for the specific questions in mind. I went ahead and threw it into a bash script to grab the specific columns. while not a QIIME2 solution I will go ahead and post my code below for anyone else wondering.

all you need is your files of the BLAST result and the FeatureTable[Frequency]

#!/bin/bash

file2="/your/file/path/file2.csv"
file1="/your/file/path/file1.csv"

#grab the first row from the FeatureTable[Frequency] file
awk 'NR==1{print}' "$file2" > new_file.csv

#compare columns and output to a new file 
awk 'NR==FNR{a[$1]; next} $1 in a' "$file1" "$file2" >> new_file.csv

All that's left to do is to sum the total amounts of the columns, transpose, and import it into QIIME2, then merge the tables themselves which should be easy enough and falls under a new topic.

Thanks for the help!

-UC

system · March 19, 2024, 2:06am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.