List of removed sequences after DADA2

Is there a possibility to get a list of names of the sequences removed after the denoising step (DADA2)?

1 Like

Hi @Joanna,
The q2-plugin does not retain filtered reads, unlike the native version of dada2 in R which is more flexible and lets you retain those reads. The idea to include this in q2-dada2 has been tossed around here before though, I think it’s a good idea too personally but I’m not sure if its high priority on the devs to do list.

1 Like

Thank you @Mehrbod_Estaki.
I guess the same case is to get a list of the names of remained sequences, am I right?

Well, the remaining sequences are actually what you would find in your dada2 output, both in your feature-table.qza and rep-seqs.qza.

I am sorry, I wasn’t enough clear. Unfortunately (for me in this case), the same sequences have been marge after the DADA2 step and the names are not the same as in case of the input. I would like to (somehow) get the list of the initial names.

Sorry I’m not sure if I’m following @Joanna.
When you say the same sequences have been merged do you mean feature-tables have been merged or just rep-seqs? Can you provide more details of how and with what commands this was done?
When feature-table or rep-seqs get merged they will not change the original name, which would be the hashed IDs. In other words, the names would be the same as the input.
If I’m off the target :dart: here please provide a bit more details!

Probably, I shouldn’t use the word “merge”. Let me try again.

As far as I can see the input of DADA2 (I extracted .qza to .fasta by using qiime tools extract --input-path demux.qza --output-path fasta_demux), each sequence is on its own and names look like: @HISEQ:531:H7VF7BCXY:2:1101:13362:2196 1:N:0:GAGATTCCCAGGACGT

After DADA2, I am getting three types of files representative-sequences, table and denoising-stats.
The representative-sequences file contains the sequences with different names than the input, now the names look like: fb4495f8daa8ff7399518c0b21963596
I guess this is because the same sequences (ASV) are under one (new) name, and the frequencies of these sequences are shown in the table.qzv file in the “feature detail”.

So, my problem is that the names are not the same before and after DADA2, and I would like to retrieve the initial names of the remained sequences.
I hope I have explained myself well. Please, if it is unclear let me know and thank you so much for your willingness to help me out :blush:

Hi @Joanna,
Indeed the names change because the sequences are dereplicated and denoised, so might not even be the same sequence! Such is the stuff of dada2. What you are looking for is an “OTU Map” (using antiquated parlance; ASV map may be more appropriate). This is not available in q2-dada2, and I am not sure whether it is available in the dada2 R package.

One day such OTU/ASV maps might be available for some QIIME 2 plugins, but we need to add support for this data type. See also this related issue for OTU maps:

1 Like

Thank you so much @Nicholas_Bokulich, it really helps a lot.

1 Like