I am using Qiime2 with the BOLD database for my sequenced COI amplicons. I was not able to detect a spiked species in one of my samples with dada2, although I can detect it by using BLAST search or when I do mapping with my trimmed data. Now I would like to inspect the ASVs identified with dada2, but I did not find a possibility to extract a fasta of representative sequences for each sample. Instead I can get a combined one for all samples that does not tell me which sequence stems from which sample.
Is there any possibility to see which sequence was found in which sample? I already tried to run a single sample with dada2, but it gives me an error.
Thank you in advance for your help!
Welcome to the forum @Josephine!
The feature table is what maps the individual sequences to each sample. It sounds like what you probably want to do is use
qiime feature-table filter-seqs to grab sequences that are only found in that one sample.
Let me know if that’s what you are after!
Thank you for your fast reply!
I’m not sure, if that is what I asked for. I would like to get a fasta file with the sequences for each sample instead of a feature table. With this command the identifier in the fasta header does not tell me from which sample this sequences is derived from.
qiime feature-table tabulate-seqs \
--i-data $analysisdir/rep-seqs-dada2.qza \
The equivalent in Qiime1 was a fasta file with representative sequences (pick_rep_set.py)
Thanks again for your help!
do you want a fasta for each individual sample? The command I listed above will allow you to grab all sequences found in a single sample, e.g., the spike-in sample you used. So run that and all seqs will be found in that sample, implicitly. Since it sounds like you are looking for the seqs in that specific sample, this should be a good solution… it would be cumbersome if you want to see what seqs are found in a whole set of files but could theoretically just be looped multiple times to filter out seqs belonging to a bunch of different samples.
well… I’m not sure that’s what you are looking for either. In qiime 1 the rep_set fasta file did contain sample IDs in the header, but those rep seqs could be found in any sample and only one would be listed in the header so it would not be a reliable method for determining what sequences are found in any given sample.
Thank you! This is what I was looking for. However, so far I failed to succeed with
qiime feature-table filter-seqs. Could you give me an example? I was already trying with
--m-metadata-file that gave me the error (All features were filtered out of the data.) and
--p-exclude-ids 1L,2L that gave me a value error (ValueError: Could not coerce value based on expression provided).
My samples are named like 1S,2S,3S,4S,5S,6S,7S,8S,9S,10S,1L,2L,3L,4L,5L,6L,7L,8L,9L,10L
Hi @Josephine! @Nicholas_Bokulich is out of the office right now, I can lend a hand. You’re very close, but have a few things that need to be adjusted. I would suggest checking out the Filtering tutorial for more help.
First, transpose the table (this will flip the sample and feature axes, necessary for the next step). Then, use the transposed as feature metadata, and keep only the features found in samples 1L or 2L:
qiime feature-table transpose \
--i-table table.qza \
qiime feature-table filter-seqs \
--i-data rep-seqs.qza \
--m-metadata-file transposed-table.qza \
--p-where '[1L] > 0 OR [2L] > 0' \
An alternative (visual) approach (this will filter your table to just the two samples in question, and will label the features with the nt sequence):
qiime feature-table filter-samples \
--i-table table.qza \
--m-metadata-file sample-metadata.txt \
--p-where "id IN ('1L', '2L')" \
qiime feature-table heatmap \
--i-table table-1L-2L.qza \
--m-feature-metadata-file rep-seqs.qza \
Hope that helps!
It worked! Thank you very much for your help!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.