Sequence filtering

swillyb · June 6, 2018, 12:48pm

Hello,

I was wondering if there is a way to remove any ESV that do not appear in 50% of the samples? I feel like there is a way, perhaps through the contingency filtering? but I am unsure.

I will then be taking this data and making correlation heat maps, I want these heat maps to contain only abundant correlations, not solely statistically significant ones, so I hoping there is a way to do this!

Thanks so much!!!

Mehrbod_Estaki · June 6, 2018, 4:35pm

Hi @swillyb,
You're right in that you can use the contingency filtering for this task, though with a small tweak. You can't use a percentage as an input with this command, but rather will have to manually calculate the number of samples yourself.

qiime feature-table filter-features
--i-table table.qza
--p-min-samples 2 \ # manually calculate 50% of your samples here
--o-filtered-table sample-contingency-filtered-table.qza

swillyb · June 6, 2018, 5:49pm

Great, thanks!
Do you mean just to enter the number of samples there, so if I had 100 samples, just put 50 there? I did this with a small sample set of 12, and put 6 in (-p-min-samples 6) but still got the same number of OTUs as in the non filtered data set.
Scott

Mehrbod_Estaki · June 6, 2018, 6:37pm

Hmm, that is a bit odd, unless all your features do indeed occur in at least 50% of your samples in which case no filtering would be performed. Would you mind sharing the feature table you are using or better yet the visualization artifact of your feature table?

swillyb · June 6, 2018, 7:01pm

table.qza (102.7 KB)
sample-contingency-filtered-table6samples.qza (11.1 KB)

I used this table.qza to make the contingency table, I cant seem to upload the table.biom I made from this ...... Im sorry Im new to qiime so please let me know what else I can give you.

Also, would you happen to know why Im having such a hard time converting the .biom to .tsv? I have used this command biom convert -i otu_table.biom -o otu_table.txt --to-tsv but sometimes it works and sometimes it errors, its really confusing. Thank you so much for your help.

Mehrbod_Estaki · June 6, 2018, 7:36pm

Hi @swillyb,

There is something certainly not right with the feature table you are using. Here is a visualization summary of it here. Under the Feature Detail tab you'll see that almost all of your features are only occurring in 1 or 2 samples. You'll want to resolve this before going any further, because based on this all of your features would be discarded with the filtering. Prior to running dada2, did you check to make sure your barcodes/primers/adapters were all removed from your reads? If I had to guess I'd bet the barcodes at least are still intact which would lead to each feature being labelled unique. You'll want to remove those and re-run dada2 before filtering.

Could you please post this separately as a new topic with a bit more information, especially the exact error message you receive that way someone with the proper expertise can help you troubleshoot.

swillyb · June 6, 2018, 8:14pm

I thought I had removed the the primers and what not, I have cassava paired end demultiplexed samples, and I used this

qiime dada2 denoise-paired
–i-demultiplexed-seqs demux.qza
–p-trim-left-f 13
–p-trim-left-r 13
–p-trunc-len-f 150
–p-trunc-len-r 150
–o-table table.qza
–o-representative-sequences rep-seqs.qza
–o-denoising-stats denoising-stats.qza

not this numbers of course, I was under the impression that this would remove those, I apologize if thats not the case.

If the features are only present in 1 or 2 samples, why would I have more than 3000 otus when I run
qiime feature-table filter-features
–i-table table.qza
–p-min-samples 6 \
–o-filtered-table sample-contingency-filtered-table.qza

wouldn’t this remove all of the features? I was able to create the .biom file from this sample-contingency-filtered-table.qza, and was then successful in converting to .tsv, and when I open that file in excel I have a list of 3000 OTUs, … Im confused, thank you so much for your help, its very much appreciated

Mehrbod_Estaki · June 7, 2018, 6:50am

Here is what your parameters were for dada2 based on your table's provenance.

trunc_len_f:275
trunc_len_r:275
trim_left_f:0
trim_left_r:0

It looks a though you didn't trim anything from the 5' of your reads which is where the adapter/barcodes would be. You'll want to have a look at your reads directly and figure out what is still left that needs to be trimmed. You could also ask your sequencing facility and they likely have this information for you as well.

Your sample-contingency-filtered-table.qza is in fact empty and doesn't have any features in it, as we expected. Here is the summary visualization of that table.

You may have accidentally created your biom table from the unfiltered file which has upwards of 3000 features in it.

Make sure your barcodes/adapters etc are all removed then run dada2 again, then your filtering should work fine. Hope that clarifies the matter, let us know if you run into any other issues.

system · July 8, 2018, 1:07pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.