After filtering out sequences with low frequencies and samples with low total reads I want to perform a clustering of features. However, I get this error:
Plugin error from vsearch:
Feature 4f35c3683b188fd809311fd3843ab3dd is present in sequences, but not in table. The set of features in sequences must be identical to the set of features in table.
I understand that one feature has been filtered out in the sequences, but is still present in the table. However, I don't know how to fix this.
Can anybody suggest a command to 'synchronise' both data sets and is this a common step that needs to be performed (I never had to do this before...).
This is my workflow:
Step 3: Importing into QIIME2
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path /Users/mvanhul/Documents/Registry/Raw_data
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path output/demux-paired-end.qza
To validate the output format, use this command:
qiime tools validate output/demux-paired-end.qza
To visualize the file, use this command:
qiime demux summarize
--i-data output/demux-paired-end.qza
--o-visualization output/demux-paired-end.qzv
qiime tools view output/demux-paired-end.qzv
###########################################################################################
Step 4: Removing primers
Use cutadapt to trim your primers from your sequences with the following command:
Specify the forward and reverse primer sequences
use 'adapter-removed.qza' instead of 'demux-paired-end.qza' if adapters were removed
qiime cutadapt trim-paired
--i-demultiplexed-sequences output/demux-paired-end.qza
--p-cores 5
--p-front-f GTGYCAGCMGCCGCGGTAA
--p-front-r GGACTACNVGGGTWTCTAAT
--p-match-read-wildcards
--p-match-adapter-wildcards
--quiet
--o-trimmed-sequences output/primers-removed.qza
&> output/primer_trimming.log
qiime tools validate output/primers-removed.qza
###########################################################################################
Step 5: Checking the quality of the sequence after removing adapters/primers:
qiime demux summarize
--i-data output/primers-removed.qza
--o-visualization output/primers-removed.qzv
qiime tools view output/primers-removed.qzv
Press q to exit view mode
###########################################################################################
Step 6: Denoising
Use the DADA2 algorithm to denoise and merge your paired-end reads into Amplicon Sequence Variants (ASVs)
Based on the interactive Quality PLot:
- specify the number of bases to trim from the 5' end of the reverse/forward reads (can be 0)
- specify the maximum length to which the forward/reverse reads will be truncated
qiime dada2 denoise-paired
--p-n-threads 5
--i-demultiplexed-seqs output/primers-removed.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 233
--p-trunc-len-r 230
--o-table output/table.qza
--o-representative-sequences output/rep-seqs.qza
--o-denoising-stats output/denoising-stats.qza
qiime tools validate output/table.qza
qiime tools validate output/rep-seqs.qza
qiime tools validate output/denoising-stats.qza
###########################################################################################
Step 7: Generate Feature Table and Feature Data summaries
qiime feature-table summarize
--i-table output/table.qza
--m-sample-metadata-file metadata.tsv
--o-visualization output/table-summary.qzv
qiime feature-table tabulate-seqs
--i-data output/rep-seqs.qza
--o-visualization output/rep-seqs.qzv
qiime tools view output/table-summary.qzv
qiime tools view output/rep-seqs.qzv
These tables will provide basic statistics such as the number of samples, the number of features, and the distribution of features across samples.
###########################################################################################
Step 8: Filter out sequences with low frequencies (total count)
specify the minimum frequency (i.e. the minimum number of times a feature must occur across all samples)
qiime feature-table filter-features
--i-table output/table.qza
--p-min-frequency 2
--o-filtered-table output/table-filtered-seq.qza
qiime feature-table summarize
--i-table output/table-filtered-seq.qza
--o-visualization output/table-filtered-seq.qzv
qiime tools view output/table-filtered-seq.qzv
###########################################################################################
Step 9: Filter out bad samples (samples with low total reads)
specify the lowest number of reads a sample must have to be retained for further analysis)
qiime feature-table filter-samples
--i-table output/table-filtered-seq.qza
--p-min-frequency 911
--o-filtered-table output/filtered-table-samples.qza
qiime feature-table summarize
--i-table output/filtered-table-samples.qza
--o-visualization output/filtered-table-samples-summary.qzv
qiime tools view output/filtered-table-samples-summary.qzv
###########################################################################################
Step 10: Cluster features (i.e sequences) based on their similarities (to reduce the computational burden and increase the accuracy of chimera detection)
#The clustering is done de novo, which means that sequences are clustered based on their similarity without reference to any external database)
specify the percent identity threshold for clustering sequences
specify the number of threads (i.e., CPU cores) to use for the clustering process
qiime vsearch cluster-features-de-novo
--i-table output/filtered-table-samples.qza
--i-sequences output/rep-seqs.qza
--p-perc-identity 0.99
--p-threads 6
--o-clustered-table output/table-filtered-seq-clustered.qza
--o-clustered-sequences output/rep-seqs-clustered.qza
###########################################################################################
Step 11: remove chimeras using the UCHIME algorithm (implemented in QIIME's vsearch plugin)
qiime vsearch uchime-denovo
--i-table output/table-filtered-seq-clustered.qza
--i-sequences output/rep-seqs-clustered.qza
--o-chimeras output/chimeras.qza
--o-nonchimeras output/rep-seqs-nonchimeric.qza
--o-stats output/chimeras-stats.qza
To visualize the chimeras:
qiime metadata tabulate
--m-input-file output/chimeras-stats.qza
--o-visualization output/chimeras-stats.qzv
To visualize the non-chimeric sequences:
The first command will generate a summary visualization of the filtered and clustered table, and the second command will generate a tabular visualization of the non-chimeric representative sequences
qiime feature-table summarize
--i-table output/table-filtered-seq-clustered.qza
--m-sample-metadata-file metadata.tsv
--o-visualization output/table-filtered-seq-clustered.qzv
qiime feature-table tabulate-seqs
--i-data output/rep-seqs-nonchimeric.qza
--o-visualization output/rep-seqs-nonchimeric.qzv
qiime tools view output/table-filtered-seq-clustered.qzv
qiime tools view output/rep-seqs-nonchimeric.qzv
qiime tools view output/chimeras-stats.qzv
###########################################################################################
Step 12: Assign taxonomy to the representative sequences.
Here we are using a a pre-trained classifier (SILVA 16S rRNA gene database) and the scikit-learn classifier:
qiime feature-classifier classify-sklearn
--i-classifier Classifier/silva138_AB_V4_classifier.qza
--i-reads output/rep-seqs-nonchimeric.qza
--o-classification output/taxonomy.qza
qiime metadata tabulate
--m-input-file output/taxonomy.qza
--o-visualization output/taxonomy.qzv
qiime tools view output/taxonomy.qzv