Catching duplicate ID's, redux

@jairideout
I was writing to follow-up on a previous issue I encountered when running denoise-paired in which it caught a duplicate ID but didn’t report it until after running for a day or so (link below). I wanted to let you know that the same issue seems to also plague denoise-single. It caught a duplicate, but didn’t report it until running for about 24 hrs. I pasted version info below in addition to the link to previous discussion. Thanks for your help!

https://forum.qiime2.org/t/duplicate-sample-ids-a-way-to-catch-them-early/738

System versions
Python version: 3.5.3
QIIME 2 release: 2017.7
QIIME 2 version: 2017.7.0
q2cli version: 2017.7.0

Installed plugins
alignment 2017.7.0
composition 2017.7.1
dada2 2017.7.0
deblur 2017.7.0
demux 2017.7.0
diversity 2017.7.0
emperor 2017.7.0
feature-classifier 2017.7.0
feature-table 2017.7.0
gneiss 2017.7.0
metadata 2017.7.0
phylogeny 2017.7.0
quality-filter 2017.7.0
taxa 2017.7.0
types 2017.7.0

Hi @John! Thanks for getting in touch, sorry to hear things aren’t working for you.

The fix that @jairideout mentioned in the linked post specifically prevents duplicates from occurring in demux data on read or write. We took this strategy (compared to implementing some kind of one-off fix in denoise-paired) because the duplicate sample ids can be a problem in all kinds of other places, not just in dada2. Did you re-import your sequences using 2017.7? Or, did you use your previously imported sequences with 2017.7? If it was the second, the duplicate sample ids are still going to be a problem, and it would make sense to reimport them using the latest version of QIIME 2. If it was the first option above, we will get to the bottom of this! Thanks! :palm_tree:

1 Like

Hi @thermokarst,

I think it makes total sense to prevent duplicates. My issue was that it doesn’t catch the duplicates and fail until it’s spent 24 or more hours analyzing the data and was wondering if you could implement a check before it actually starts denoising so that it fails immediately?

On a tangent, I don’t know the culprit yet, but the samples that are being flagged as duplicates aren’t actually duplicates. But I believe because all of my samples start with “00”, that it’s throwing something off. If I figure out what it is about the names, I’ll let you know (fyi, i’ve already removed all special characters, so it’s something else).

Does this make sense?

John

Thanks @John, it still isn't clear to me though, did you reimport your data using the latest version of QIIME 2? The quote above reads like maybe you didn't (sorry if I am misreading or misunderstanding!).

Can you provide the following, to help us track this down:

  1. What was your import command?
  2. What does a listing of your files look like? (Could be a screenshot, ls, or something else)

Thanks! We will get to the bottom of this! :tada:

Hi @thermokarst,

I apologize for not addressing your previous inquiry. All steps were conducted using 2017.7, including the import step. I will address your latest 2 questions below:

1

qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path /scratch/files/single/ --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-single-end.qza

I will paste a screenshot, but it will not be reflective of one that failed. I renamed the files, reimported, and am trying it again to see if the name changes helps.

As a quick reminder from the thread I opened a couple months ago, this is data that was already demultiplexed before starting QIIME.

Thanks for your help,

John

Hi @John, thanks so much!

I was able to reproduce the duplicate IDs error here locally, so I think we have what we need to keep working on this. Sorry for the false hopes in the last round — we fixed this problem in the Manifest formats, but it looks like the fix didn’t make it into the CasavaOneEightSingleLanePerSampleDirFmt format :grimacing:

One of us are going to dig into this a bit more, hopefully as part of 2017.9 development cycle but in the meantime, you could use one of our Manifest formats to import these data (in case the renaming you mentioned above doesn’t work).

Thanks so much for your patience, and for taking the time to report these kinds of issues!! :tada:

1 Like

Okay @John, we overhauled the validation used when importing these data — as they say in the biz, “third time’s a charm!” (well, maybe “they” don’t say that, but we will today). Check out the latest release of QIIME 2 (2017.9) to learn more! Thanks! :tada:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.