When demux performs golay error correction, it outputs a file with “Detail about the barcode error corrections.” This file is specified with --o-error-correction-details. My question is, how do I view the error correction details? Or, if this is not something that is visualized, what is the file format?
I thought maybe the output would be the sequences that were thrown out or corrected, but it doesn’t look like it is formatted as fastq files anymore:
# try to visualize it the same way as the other demux output:
$ qiime demux summarize --i-data p1_golay_errors.qza --o-visualization p1_golay_errors
...
There was a problem with the command:
(1/1) Invalid value for "--i-data": Expected an artifact of at least type
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality |
JoinedSequencesWithQuality]. An artifact of type ErrorCorrectionDetails was
provided.
Since I still don’t know what “type ErrorCorrectionDetails” is, I tried drag and dropping it into the qiime object viewer: https://view.qiime2.org
But this also just tells me that the file is:
type:“ErrorCorrectionDetails”
format:“ErrorCorrectionDetailsDirFmt”
Thanks for your help understanding what this file type is!
Hi @rrohwer,
Give qiime metadata tabulate a try — I believe that artifact should be viewable as metadata. (in general if you aren’t sure how to visualize an artifact, e.g., it’s not shown in a tutorial, metadata tabulate is usually a pretty safe bet).
So it looks like if the sequence was assigned, the sample name (those "rr" numbers are my sample names) is listed. And then it also lists the uncorrected and corrected barcode sequences (again, with corrected left blank if it was tossed), and then it lists the number of errors present in each barcode, including the unassigned ones.
Now 2 more questions!
are the number of errors from the nearest golay code, or from the nearest barcode in my mapping file?
where can I find all the golay code options? For example, to check if these unmatched samples exactly match other golay barcode options and might be from cross contamination with a previous run?
Thank you for checking this out! We may need to develop a custom visualization – as you note, the number of rows is quite large as this is a per-sequence report. I’ve opened an issue on q2-demux about this.
You are correct that if the sequence was assigned, then it gets the associated sample ID. Corrected is blank if the number of “errors” is such that it cannot be corrected meaning either the index read was not actually a Golay barcode, or there were too many errors for it to be correctable. Importantly, the number of errors are in bits, not nucleotides.
The full list of Golay codes should be accessible from here but may be in one of the supplementals. I’m unfortunately on rather limited internet at the moment so am hindered on tracking that down. The exact code used for the Golay decoding is here and includes some pretty pleasant documentation compiled by the prior developers that we ported over from the QIIME1 codebase.