Add readId-fastqId map file to dada2 output


(Kevin McCormick) #1

I’m familiar with DADA2 and how it denoises reads, and I understand why the output from DADA2 uses hashed sequence identifiers. However, I am often finding myself wanting to compare results between DADA2 (QIIME2) and other analysis pipelines, which often includes tracking individual reads through the pipeline. It would be incredibly useful if the DADA2 step (optionally) included an output table that links FASTQ IDs to hashed sequence IDs used in downstream analysis. E.g.:
[hashed sequence id 1],[FASTQ id 1]
[hashed sequence id 1],[FASTQ id 2]

[hashed sequence id 2],[FASTQ id 101]
Having such a file would allow me to query specific read IDs to see what taxonomy they were ultimately classified as in the QIIME2 pipeline.


(Matthew Ryan Dillon) #2

(Matthew Ryan Dillon) #3

This is an interesting idea, @kevinmcc21 - it looks like it might fit under this open issue: https://github.com/qiime2/q2-types/issues/92.

It sounds like this might specifically be a DADA2 feature request though — maybe you should open an issue on the official DADA2 tracker?


(Matthew Ryan Dillon) #4

(Kevin McCormick) #5

Thanks for the input. It seems it is already an open issue for DADA2:

The developer discussion is much more involved than what I’m looking for, but the general idea seems the same.

Regarding the open Qiime issue you mentioned, I am not sure they are so similar. That one seems more focused on creating a certain data structure or file format, and I see no mention of FASTQ IDs. I am not that familiar with the overall Qiime workflow though so I could be wrong.