Missing features in rep.seq.qzv + exported fasta file

Hi! Hoping for some thoughts on why I might have missing features. Following the use of the QIIME2 Dada2 wrapper, I have a final ASV table and a final taxonomy table. The ASV and Taxonomy .qza files both have the same features (when exported and compared), but when I went to examine the rep.seqs.qzv file and the associated sequences.fasta file produced by QIIME2 of the representative features for the ASVs in the table there are some features (4 that I am aware of since I was specifically looking for them) that are missing from both of these files despite being present in the original input table and taxonomy. Any ideas why that might have happened?

Here are the commands I used to make the rep.seqs.qza and the .fasta output files:
qiime dada2 denoise-paired --i-demultiplexed-seqs demux_short.qza --p-trim-left-f 0 --p-trunc-len-f 240 --p-trim-left-r 0 --p-trunc-len-r 240 --p-chimera-method consensus --o-representative-sequences rep-seqs-short.qza --o-table table-short.qza --p-n-threads 16 --o-denoising-stats denoising-stats-dada2-short.qza

qiime tools export rep-seqs-short.qzv --output-dir short_tabulate_output

I just find this so strange and can't seem to put my finger on why that might happen.

Hello @reige012,

What taxonomy files are you referring to? DADA2 does not assign taxonomy to features, and I don't see any files named with "taxonomy" in you commands here, so I'm a little confused.

Hi @colinvwood,
Thanks for following up. That was confusing how I have it written. I know that QIIME2 assigns taxonomy in a separate step following DADA2. Rather I was just saying that the output of that step is a taxonomy file that DOES have the feature IDs I'm interested in. So both the table (produced by the DADA2 wrapper) and and the taxonomy table from the subsequent QIIME2 step all have the same feature IDs, but the repseqs file from the same DADA2 command and the .fasta file from the export command do not contain those 4 feature IDs. Its very strange.

Here is the exact set of commands (in order - including the same as in the previous message just to clarify things a little bit):
qiime tools import --type EMPPairedEndSequences --input-path /project/reads/ --output-path emp-paired-end-sequences.qza

qiime demux emp-paired --i-seqs emp-paired-end-sequences.qza --m-barcodes-file Ch1_Short_Metadata.txt --m-barcodes-column BarcodeSequence --o-per-sample-sequences demux_short.qza

qiime dada2 denoise-paired --i-demultiplexed-seqs demux_short.qza --p-trim-left-f 0 --p-trunc-len-f 240 --p-trim-left-r 0 --p-trunc-len-r 240 --p-chimera-method consensus --o-representative-sequences rep-seqs-short.qza --o-table table-short.qza --p-n-threads 16 --o-denoising-stats denoising-stats-dada2-short.qza

qiime feature-table summarize --i-table table-short.qza --o-visualization table-short.qzv --m-sample-metadata-file Ch1_Short_Metadata.txt

qiime feature-classifier classify-sklearn --i-classifier /project/Qiime/silva-132-99-nb-classifier.qza --i-reads rep-seqs-short.qza --o-classification taxonomy.short.silva.qza

qiime tools export taxonomy.short.silva.qza --output-dir short_feature_taxonomy_out

qiime feature-table tabulate-seqs --i-data rep-seqs-short.qza --o-visualization rep-seqs-short.qzv

qiime tools export rep-seqs-short.qzv --output-dir short_tabulate_output

So the features can be found in these files: table-short.qzv and the taxonomy file produced by exporting this file taxonomy.short.silva.qza. They cannot be found in rep-seqs-short.qzv or the .fasta file produced by exporting the rep-seqs-short.qzv file.

I hope this clarifies the questions/issue a little bit.

@reige012, at this step it looks like you're exporting a .qzv file:

You should export the rep-seqs-short.qza file instead. Was that a typo in the commands that you shared? If not, could you see if the issue persists if you export rep-seqs-short.qza? If you're still experiencing issues it would be helpful to have your table-short.qza and rep-seqs-short.qza files. Would you be willing to share those so we can try to reproduce the issue you're having?

Hi @gregcaporaso, thanks for following up on this. That wasn't a typo, but I did try exporting just the .qza file instead and I have the same issue. Those few (that I know of) features aren't there, but the others are.

In any case, I've attached the .qza version of both the table and the req-seqs files. Let me know if anything comes up.

Also, in case its helpful. These are the features IDs (that I know of) that are in the table, but not in the rep-seqs: 3491c41b3491b3a9cb78ab9370a70b06, d9a6faa3f2e71a181f364609ae60b60b, c0e9d22af9f7d8131093bce9b3567f04, 1533c97d77541e59164429fba8edc02c, 8e44d6063b90610ef91a03a7bd761804, d256fed1263c92137e17d9dfa86d4d9e.

Thanks so much!
rep-seqs-short.qza (1.6 MB)
table-short.qza (546.4 KB)

Hi @reige012,
From looking at the data provenance I can see that the feature table and sequences that you shared were generated from two different runs of qiime dada2 denoise-paired. In the following two diagrams, look at the execution uuid toward the top right. You can also see that different truncation parameters were used in the two different runs.

An execution uuid (universally unique identifier) is assigned every time you run a command (in this case qiime dada2 denoise-paired), so you can compare those across provenance graphs to confirm that two results were generated from the same run of a command. In this case the uuids differ, so I know that they weren't.

If you're not familiar with viewing QIIME 2 data provenance, you can see the graphs I shared above by loading a .qza or .qzv at QIIME 2 View and then clicking the provenance tab toward the top right of the page. You can load different files there to identify which sequence data was generated from that run of DADA2, and then you should have all of the relevant feature ids.

Let me know if you have other questions. Good luck!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.