synchronising features in sequences and tables

After filtering out sequences with low frequencies and samples with low total reads I want to perform a clustering of features. However, I get this error:

Plugin error from vsearch:

Feature 4f35c3683b188fd809311fd3843ab3dd is present in sequences, but not in table. The set of features in sequences must be identical to the set of features in table.

I understand that one feature has been filtered out in the sequences, but is still present in the table. However, I don't know how to fix this.

Can anybody suggest a command to 'synchronise' both data sets and is this a common step that needs to be performed (I never had to do this before...).

This is my workflow:

Step 3: Importing into QIIME2

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path /Users/mvanhul/Documents/Registry/Raw_data
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path output/demux-paired-end.qza

To validate the output format, use this command:

qiime tools validate output/demux-paired-end.qza

To visualize the file, use this command:

qiime demux summarize
--i-data output/demux-paired-end.qza
--o-visualization output/demux-paired-end.qzv

qiime tools view output/demux-paired-end.qzv

###########################################################################################

Step 4: Removing primers

Use cutadapt to trim your primers from your sequences with the following command:

Specify the forward and reverse primer sequences

use 'adapter-removed.qza' instead of 'demux-paired-end.qza' if adapters were removed

qiime cutadapt trim-paired
--i-demultiplexed-sequences output/demux-paired-end.qza
--p-cores 5
--p-front-f GTGYCAGCMGCCGCGGTAA
--p-front-r GGACTACNVGGGTWTCTAAT
--p-match-read-wildcards
--p-match-adapter-wildcards
--quiet
--o-trimmed-sequences output/primers-removed.qza
&> output/primer_trimming.log

qiime tools validate output/primers-removed.qza

###########################################################################################

Step 5: Checking the quality of the sequence after removing adapters/primers:

qiime demux summarize
--i-data output/primers-removed.qza
--o-visualization output/primers-removed.qzv

qiime tools view output/primers-removed.qzv

Press q to exit view mode

###########################################################################################

Step 6: Denoising

Use the DADA2 algorithm to denoise and merge your paired-end reads into Amplicon Sequence Variants (ASVs)

Based on the interactive Quality PLot:

- specify the number of bases to trim from the 5' end of the reverse/forward reads (can be 0)

- specify the maximum length to which the forward/reverse reads will be truncated

qiime dada2 denoise-paired
--p-n-threads 5
--i-demultiplexed-seqs output/primers-removed.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 233
--p-trunc-len-r 230
--o-table output/table.qza
--o-representative-sequences output/rep-seqs.qza
--o-denoising-stats output/denoising-stats.qza

qiime tools validate output/table.qza
qiime tools validate output/rep-seqs.qza
qiime tools validate output/denoising-stats.qza

###########################################################################################

Step 7: Generate Feature Table and Feature Data summaries

qiime feature-table summarize
--i-table output/table.qza
--m-sample-metadata-file metadata.tsv
--o-visualization output/table-summary.qzv

qiime feature-table tabulate-seqs
--i-data output/rep-seqs.qza
--o-visualization output/rep-seqs.qzv

qiime tools view output/table-summary.qzv
qiime tools view output/rep-seqs.qzv

These tables will provide basic statistics such as the number of samples, the number of features, and the distribution of features across samples.

###########################################################################################

Step 8: Filter out sequences with low frequencies (total count)

specify the minimum frequency (i.e. the minimum number of times a feature must occur across all samples)

qiime feature-table filter-features
--i-table output/table.qza
--p-min-frequency 2
--o-filtered-table output/table-filtered-seq.qza

qiime feature-table summarize
--i-table output/table-filtered-seq.qza
--o-visualization output/table-filtered-seq.qzv

qiime tools view output/table-filtered-seq.qzv

###########################################################################################

Step 9: Filter out bad samples (samples with low total reads)

specify the lowest number of reads a sample must have to be retained for further analysis)

qiime feature-table filter-samples
--i-table output/table-filtered-seq.qza
--p-min-frequency 911
--o-filtered-table output/filtered-table-samples.qza

qiime feature-table summarize
--i-table output/filtered-table-samples.qza
--o-visualization output/filtered-table-samples-summary.qzv

qiime tools view output/filtered-table-samples-summary.qzv

###########################################################################################

Step 10: Cluster features (i.e sequences) based on their similarities (to reduce the computational burden and increase the accuracy of chimera detection)

#The clustering is done de novo, which means that sequences are clustered based on their similarity without reference to any external database)

specify the percent identity threshold for clustering sequences

specify the number of threads (i.e., CPU cores) to use for the clustering process

qiime vsearch cluster-features-de-novo
--i-table output/filtered-table-samples.qza
--i-sequences output/rep-seqs.qza
--p-perc-identity 0.99
--p-threads 6
--o-clustered-table output/table-filtered-seq-clustered.qza
--o-clustered-sequences output/rep-seqs-clustered.qza

###########################################################################################

Step 11: remove chimeras using the UCHIME algorithm (implemented in QIIME's vsearch plugin)

qiime vsearch uchime-denovo
--i-table output/table-filtered-seq-clustered.qza
--i-sequences output/rep-seqs-clustered.qza
--o-chimeras output/chimeras.qza
--o-nonchimeras output/rep-seqs-nonchimeric.qza
--o-stats output/chimeras-stats.qza

To visualize the chimeras:

qiime metadata tabulate
--m-input-file output/chimeras-stats.qza
--o-visualization output/chimeras-stats.qzv

To visualize the non-chimeric sequences:

The first command will generate a summary visualization of the filtered and clustered table, and the second command will generate a tabular visualization of the non-chimeric representative sequences

qiime feature-table summarize
--i-table output/table-filtered-seq-clustered.qza
--m-sample-metadata-file metadata.tsv
--o-visualization output/table-filtered-seq-clustered.qzv

qiime feature-table tabulate-seqs
--i-data output/rep-seqs-nonchimeric.qza
--o-visualization output/rep-seqs-nonchimeric.qzv

qiime tools view output/table-filtered-seq-clustered.qzv
qiime tools view output/rep-seqs-nonchimeric.qzv
qiime tools view output/chimeras-stats.qzv

###########################################################################################

Step 12: Assign taxonomy to the representative sequences.

Here we are using a a pre-trained classifier (SILVA 16S rRNA gene database) and the scikit-learn classifier:

qiime feature-classifier classify-sklearn
--i-classifier Classifier/silva138_AB_V4_classifier.qza
--i-reads output/rep-seqs-nonchimeric.qza
--o-classification output/taxonomy.qza

qiime metadata tabulate
--m-input-file output/taxonomy.qza
--o-visualization output/taxonomy.qzv

qiime tools view output/taxonomy.qzv

Hello!
This can be done in Q2, check out documentation: filter-seqs: Filter features from sequences — QIIME 2 2023.5.1 documentation
and tutorial Filtering data — QIIME 2 2023.5.1 documentation

Cheers,
Valentyn

Hi,
thanks for helping.
I have checked the tutorial, but I'm still stuck.
I have tried different commands, but basically, I'm always asked for a --I-taxonomy ARTIFCAT, but I'm not at that stage yet.

this is what I have tried:

option1:
qiime taxa filter-seqs
--i-sequences output/rep-seqs.qza
--i-table output/filtered-table-samples.qza
--o-filtered-sequences output/filtered-seqs.qza

option 2:
qiime feature-table filter-features
--i-data output/rep-seqs.qza
--m-metadata-file output/filtered-table-samples.qza
--o-filtered-table output/filtered-seqs.qza

Perhaps this is a good moment to mention that I am new at this and that I am still confused by the different input tables/data/sequences...

Hi @dr_hulk,

This sounds like what you want to me. It filters your sequences ( your rep-seqs) down to your table (which you filtered for low frequency features). What did this command produce and why is it not what you expected?

Hi Chloe,

I get this error message:

There was a problem with the command:
(1/1?) No such option: --i-data
zsh: command not found: --o-filtered-table

If I change --i-data to --i-table, as suggested by the examples, I get this error:

There was a problem with the command:
(1/1) Invalid value for '--i-table': Expected an artifact of at least type
FeatureTable[Frequency]. An artifact of type FeatureData[Sequence] was
provided.

Hi @dr_hulk,
You will actually want to use this command: qiime feature-table filter-seqs
Here is the docs.

Hi Chloe,

Thanks for the help. I really appreciate it. I'm still learning how to get things done with Qiime.

I changed the command and I got this message:

                There was a problem with the command:                     

(1/1?) No such option: --o-filtered-table Did you mean --o-filtered-data?

So I changed 'table' to 'data' (see below)

qiime feature-table filter-seqs
--i-data output/rep-seqs.qza
--m-metadata-file output/filtered-table-samples.qza
--o-filtered-data output/filtered-seqs.qza

And I got this error:

Plugin error from feature-table:

All features were filtered out of the data.

Debug info has been saved

Did I just delete everything??

Hi @dr_hulk,

Don't worry! QIIME 2 doesn't edit your input file! Nothing is deleted, however it seems like when you filtered your sequences down to your filtered-table there were not features left to keep.

This makes me question if something went wrong in the filter feature-table step! Will you send me a .qza of this table so I can look at it?

filtered-table-samples.qzv (455.5 KB)
filtered-table-samples.qza (65.2 KB)
table-filtered-seq.qzv (450.0 KB)
table-filtered-seq.qza (59.7 KB)

Hi Chloe,

Again, thank you for your help.

My sample table contains 865 features (filter-table-samples), whereas my sequences count (rep-seqs.qza) is at 866.

I understand that I must remove the sequence that is not featured in the samples table from the sequence table, but I can't figure out how...

Hi @dr_hulk,
Thanks for the data! I thought your table might have been filtered down to nothing but that doesn't seem to be the case. The only other thing I can think is that your data in your rep-seqs and table don't match. Can you send me your rep-seq file?

Also, What is the difference between those two tables?

Hope this helps!
:turtle:

rep-seqs.qzv (324.3 KB)
rep-seqs.qza (70.5 KB)

After dada2 denoising I obtain two files:

  • the representative sequences (rep-seqs.qza)
  • the table of features.

Both have the same number of features/sequences, but then I filter for samples with low reads and features with low frequency and I obtain

This is where the mismatch happens...

This is the workflow

###########################################################################################

Step 6: Denoising

Use the DADA2 algorithm to denoise and merge your paired-end reads into Amplicon Sequence Variants (ASVs)

Based on the interactive Quality PLot:

- specify the number of bases to trim from the 5' end of the reverse/forward reads (can be 0)

- specify the maximum length to which the forward/reverse reads will be truncated

qiime dada2 denoise-paired
--p-n-threads 5
--i-demultiplexed-seqs output/primers-removed.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 233
--p-trunc-len-r 230
--o-table output/table.qza
--o-representative-sequences output/rep-seqs.qza
--o-denoising-stats output/denoising-stats.qza

qiime tools validate output/table.qza
qiime tools validate output/rep-seqs.qza
qiime tools validate output/denoising-stats.qza

###########################################################################################

Step 7: Generate Feature Table and Feature Data summaries

qiime feature-table summarize
--i-table output/table.qza
--m-sample-metadata-file metadata.tsv
--o-visualization output/table-summary.qzv

qiime feature-table tabulate-seqs
--i-data output/rep-seqs.qza
--o-visualization output/rep-seqs.qzv

qiime tools view output/table-summary.qzv
qiime tools view output/rep-seqs.qzv

These tables will provide basic statistics such as the number of samples, the number of features, and the distribution of features across samples.

###########################################################################################

Step 8: Filter out sequences with low frequencies (total count)

specify the minimum frequency (i.e. the minimum number of times a feature must occur across all samples)

qiime feature-table filter-features
--i-table output/table.qza
--p-min-frequency 2
--o-filtered-table output/table-filtered-seq.qza

qiime feature-table summarize
--i-table output/table-filtered-seq.qza
--o-visualization output/table-filtered-seq.qzv

qiime tools view output/table-filtered-seq.qzv

###########################################################################################

Step 9: Filter out bad samples (samples with low total reads)

specify the lowest number of reads a sample must have to be retained for further analysis)

qiime feature-table filter-samples
--i-table output/table-filtered-seq.qza
--p-min-frequency 911
--o-filtered-table output/filtered-table-samples.qza

qiime feature-table summarize
--i-table output/filtered-table-samples.qza
--o-visualization output/filtered-table-samples.qzv

qiime tools view output/filtered-table-samples.qzv

###########################################################################################

Step 10: Filter out the features table based on the filtering of the samples

The set of features in sequences must be identical to the set of features in table.

Hi @dr_hulk,
I have figured it out. instead of putting your table in as a metadata input, it is supposed to be a table input.

The command should look like this:
qiime feature-table filter-seqs --i-data rep-seqs.qza --i-table table-filtered-seq.qza --o-filtered-data filtered-seqs.qza

Hope that helps!
:turtle:

1 Like

It worked!

Thanks Chloe!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.