Summarizing "misses" table help

dupremarye · February 7, 2019, 4:56pm

Hi all,

I need a bit of assistance at attempting to summarize my "misses" OTU table after filtering sequences that matched less than 80% with the UNITE database. This has been recommended to me to check the total number of sequences are were filtered out of my samples. Below is the code that generated the "misses" file, as well as a screen shot that includes the summarized format. How can I summarize this table in qiime2 to visualize the total number of sequences filtered out?

Thank you,
Mary Ellyn

qiime quality-control exclude-seqs
--i-query-sequences table.qza
--i-reference-sequences sh_refs_qiime_ver7_dynamic_01.12.2017_NO_CONTAMS.qza
--p-method blast
--p-perc-identity 0.8
--p-perc-query-aligned 0.8
--o-sequence-hits hits-8-8.qza
--o-sequence-misses misses-8-8.qza

Nicholas_Bokulich · February 7, 2019, 6:44pm

You have two options:

export the filtered and unfiltered sequences to count the number of lines
use qiime feature-table filter-features --i-table table.qza --m-metadata-file hits-8-8.qza to filter your feature table to only contain "hits". Then use qiime feature-table summarize to summarize the filtered and unfiltered tables. That visualization will contain a count of total features, which you can use to compare these tables.

I hope that helps!

dupremarye · February 12, 2019, 7:16pm

Maybe I'm just confused by the hits/misses output, so if you could explain this process to me that would be very helpful.

I start out with my table.qza file where (when visualized) each sample has a sequence count of at least 2,000 and going up to ~15,000 sequences each. When I run the code I posted above, it tabulates the sequences into features (total hits and misses ~3500) which is not what I'm interested in? I want to see the whole dataset and the number of sequences that have been filtered out in order to better understand the quality of my filtering process.

Let me know if you need anything from me to better understand my confusion. Thanks!

Nicholas_Bokulich · February 13, 2019, 3:41am

Perhaps the confusion is on my end. exclude-seqs is going to just split your sequences into those that hit the reference sequences, and those that miss. Using filter-features will then filter the misses from your feature table so that it only contains hits. Running summarize on both of those tables will tabulate the total number of sequences in each table — so you can compare how filtering impacts sequence depth, and the difference between the two will indicate the number of sequences that have been removed. That sounds like the comparison that you want to make — if it is not, please share these summary files and maybe describe in more detail how this differs from your goals.

system · March 16, 2019, 9:41am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.