Thank you very much for your support @Nicholas_Bokulich @wasade . The samples are 16S Illumina reads and I am running subsets to test the workflow first.
I used qiime feature-table filter-features
to remove doubletons and singletons and my sequence numbers shrunk drastically. Now, before I would apply this command to all my samples I wanted to make sure my overall workflow seems right:
- I imported already merged, pair-end, and quality-filtered data in .fna format.
- Dereplication by
qiime vsearch dereplicate-sequences
. This resulted in a feature table and dereplicated sequences. - I would integrate
qiime vsearch cluster-features-open-reference
with q2-2017.11 which would generate a clustered table and clustered sequence file. Here a question: Could I theoretically usetaxa barplot
here already along with SILVA's otu.qza as taxonomy file? - To remove the singletons and doubletons, I applied
qiime feature-table filter-features
after which I downloaded the .csv file of the frequency per feature filtered table. Then, I reformatted this file (.tsv) and used it as input for theqiime feature-table filter-seqs
command resulting in a filtered sequence file. Here I used for both commands the clustered table and sequence file from the open-reference clustering, is that right? I did this instead of running through deblur. - Using
qiime feature-classifier classify-consensus-vsearch
to classify against 16Sonly_consensus_taxonomy_7_levels.qza from SILVA. As reference read inputs I used the otu.qza file as in the cluster step. My input query was the dereplicated, OR-clustered, filtered sequence file, that decreased significantly in terms of sequence reads by now.
To move on, I would basically use the output taxonomy and the filtered feature table file to make taxa barplot, correct? I can run this pipeline and get some sequence/feature numbers to check, I just wanted to verify this workflow beforehand in order to avoid making a naive mistake somewhere.