Duplicate sample IDs: a way to catch them early?

John · June 23, 2017, 4:39pm

Hi,

I was wondering if it might be possible to implement a method that can check duplicate sample ID's before DADA2 starts processing samples for users that don't demultiplex in QIIME, i.e. don't run any steps that might catch it sooner? The reason I am asking is because I submitted hundreds of already demultiplexed paired samples to "qiime dada2 denoise-paired" and it didn't error out until 2 days later when it came across a duplicate sample ID. I used reg expressions upfront to get rid of some bad characters and it happened to create a duplicate, so perhaps karma. Nonetheless, if it's a fairly simple check to implement, it would be great.

FYI, I am using the latest and greatest. qiime2 2017.6.

John

jairideout · June 23, 2017, 9:12pm

Hi @John! Thanks for catching this, that is an annoying bug and we'll fix this in an upcoming release (I created a bug report to track it). When the fix is implemented, QIIME 2 will automatically detect and disallow duplicate sample IDs whenever demux sequence data is being read or written. This means that failure should happen much earlier in analyses if there are duplicates, avoiding what happened here with dada2.

system · July 25, 2017, 3:12am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

jairideout · July 25, 2017, 10:22pm

QIIME 2 2017.7 is now live and includes the bug fix I mentioned above