I was wondering if it might be possible to implement a method that can check duplicate sample ID’s before DADA2 starts processing samples for users that don’t demultiplex in QIIME, i.e. don’t run any steps that might catch it sooner? The reason I am asking is because I submitted hundreds of already demultiplexed paired samples to “qiime dada2 denoise-paired” and it didn’t error out until 2 days later when it came across a duplicate sample ID. I used reg expressions upfront to get rid of some bad characters and it happened to create a duplicate, so perhaps karma. Nonetheless, if it’s a fairly simple check to implement, it would be great.
FYI, I am using the latest and greatest. qiime2 2017.6.
Hi @John! Thanks for catching this, that is an annoying bug and we’ll fix this in an upcoming release (I created a bug report to track it). When the fix is implemented, QIIME 2 will automatically detect and disallow duplicate sample IDs whenever demux sequence data is being read or written. This means that failure should happen much earlier in analyses if there are duplicates, avoiding what happened here with dada2.