Excluding sequences by alignment

Sarah_McGrath · October 10, 2018, 4:38pm

Hello all,

Currently, I am working on a project where we are attempting to determine bacterial biomarkers to see if a predator ate a fresh (predated upon or freshly killed) prey item or a decaying one (i.e., ate carrion). First, we determined a set of ASV's which were associated with only fresh or decaying prey. Now, I would like to match those ASV's to the ones from our predator (stomach and fecal samples) to see how easily we can determine whether that predator consumed fresh or decayed prey.

To do this, I used the excluding sequences by alignment code to obtain a subset table with "hits" that match the ASV's from the prey data set.

(qiime2-2018.8) mcgratse@DESKTOP-0PO1GR1:/mnt/c/Users/Sarah/Desktop/SI-Rat-Project$ qiime quality-control exclude-seqs --i-query-sequences /mnt/c/Users/Sarah/Desktop/SI-Rat-Project/si-rats/rep-seqs-rat-seqs-5.qza --i-reference-sequences /mnt/c/Users/Sarah/Desktop/SI-Rat-Project/chicken_seqs_to_keep.qza --p-method blast --p-perc-identity 0.97 --p-perc-query-aligned 0.97 --o-sequence-hits hits_1.qza --o-sequence-misses misses_1.qza

Saved FeatureData[Sequence] to: hits_1.qza
Saved FeatureData[Sequence] to: misses_1.qza

Then, I filtered my original table to obtain a feature table with only those that hit for use in continued analysis.

(qiime2-2018.8) mcgratse@DESKTOP-0PO1GR1:/mnt/c/Users/Sarah/Desktop/SI-Rat-Project$ qiime feature-table filter-features --i-table /mnt/c/Users/Sarah/Desktop/SI-Rat-Project/si-rats/table-rat-seqs-5.qza --m-metadata-file misses_1.qza --o-filtered-table only-hits-filtered-table.qza

Saved FeatureTable[Frequency] to: only-hits-filtered-table.qza

Here is my original table and my resulting table after filtering.
table-rat-seqs-5.qza (116.4 KB)
only-hits-filtered-table.qza (340.2 KB)

I am still confused as to what this resulting table is really showing me. Is there any data where I can see how the sequences aligned? How do I go from this table to one showing me which ASV's matched where (i.e., ASV42 from prey data set matched to ASV57 in predator data set)? I chose to align at 0.97 and use the blast method based on the online qiime2 documents. Does this make the most sense for what I am trying to do?

Any assistance and/or advice is greatly appreciated!

Thanks,

Sarah

Nicholas_Bokulich · October 10, 2018, 7:08pm

Hi @Sarah_McGrath,
Good questions and very interesting use of this tool! Long story short: this might be an appropriate and interesting use of this method, but it might also not be the best tool for the job.

These tables will contain features that do/do not align to your reference database within a certain % similarity. The tables are not showing anything, though you could use this downstream, e.g., for alpha diversity analysis to count the number of carrion-associated features detected in each group.

No, scores for each individual sequence are not shown. You will need to use BLASTn directly to get the full report. However, since you are setting the percent identity parameter, you know that everything in your "keep" table must exceed that threshold.

This will not tell you which sequences were matched... that's a job for taxonomy classification! You can use classify-consensus-blast to do this, but that will require building a taxonomy file to match your sequences.

That sounds reasonably appropriate, but one thing that worries me is this: what if sequences in your "fresh" database are similar (e.g., 97%+) to sequences in your "carrion" database? If you know that these sequences are very dissimilar, then that's fine. But if you don't know, then I'd encourage you either to test this or use a different tool for the job.

E.g., you could instead perform taxonomy classification, and then filter the classifications/tables based on taxa associated with fresh/carrion. That would not work if there is taxonomic overlap between your fresh/carrion references, or if you need reliable species-level classifications to differentiate these groups (taxonomy classification gets iffy at species level and is often not reported when the classifier is uncertain).

I hope that helps!

system · November 11, 2018, 1:08am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.