What is the format of the golay errors corrections details file output by demux?

When demux performs golay error correction, it outputs a file with “Detail about the barcode error corrections.” This file is specified with --o-error-correction-details. My question is, how do I view the error correction details? Or, if this is not something that is visualized, what is the file format?

Example demux command:

# perform demux using golay error correction
$ qiime demux emp-paired --i-seqs data.qza --m-barcodes-file barcodes.txt --m-barcodes-column BarcodeSequence --o-per-sample-sequences data_golay.qza --p-golay-error-correction --o-error-correction-details data_golay_errors --p-rev-comp-barcodes --p-rev-comp-mapping-barcodes
Saved SampleData[PairedEndSequencesWithQuality] to: data_golay.qza
Saved ErrorCorrectionDetails to: data_golay_errors.qza

I thought maybe the output would be the sequences that were thrown out or corrected, but it doesn’t look like it is formatted as fastq files anymore:

# try to visualize it the same way as the other demux output:
$ qiime demux summarize --i-data p1_golay_errors.qza --o-visualization p1_golay_errors
...
There was a problem with the command:
 (1/1) Invalid value for "--i-data": Expected an artifact of at least type
  SampleData[SequencesWithQuality | PairedEndSequencesWithQuality |
  JoinedSequencesWithQuality]. An artifact of type ErrorCorrectionDetails was
  provided.

Since I still don’t know what “type ErrorCorrectionDetails” is, I tried drag and dropping it into the qiime object viewer: https://view.qiime2.org
But this also just tells me that the file is:

  • type:“ErrorCorrectionDetails”
  • format:“ErrorCorrectionDetailsDirFmt”

Thanks for your help understanding what this file type is!


(qiime2 Version q2cli 2019.4.0 in conda on mac)

Hi @rrohwer,
Give qiime metadata tabulate a try — I believe that artifact should be viewable as metadata. (in general if you aren’t sure how to visualize an artifact, e.g., it’s not shown in a tutorial, metadata tabulate is usually a pretty safe bet).

Let me know if that works!

2 Likes

I tried matadata tabulate as you suggested, and it did run without errors (although it took almost as long as the demultiplexing itself!):

# view golay error file with metadata tabulate
$ qiime metadata tabulate --m-input-file p1_golay_errors.qza --output-dir tabulated_errors
Saved Visualization to: tabulated_errors/visualization.qzv

I then tried to view the resulting qzv, but but it resulted in an error message in the chrome browser:

$ qiime tools view tabulated_errors/visualization.qzv
# error message:
SyntaxError: Unexpected end of JSON input

Instead, I downloaded the tsv file from the visualization page. It looks like this:

# too big to view in excel:
$ wc tabulated_errors/metadata.tsv
 15794054 107398501 1637408764 tabulated_errors/metadata.tsv
# hard to distinguish these columns in the terminal:
$ head tabulated_errors/metadata.tsv
id	sample	barcode-sequence-id	barcode-uncorrected	barcode-corrected	barcode-errors
#q2:types	categorical	categorical	categorical	categorical	numeric
record-00000001		@M02149:240:000000000-C8JKY:1:1101:15515:1788 1:N:0:0	ACTTTGGAACAG		4
record-00000002	rr180	@M02149:240:000000000-C8JKY:1:1101:15744:1797 1:N:0:0	GGTTCACCATAG	GGTTCACCATAG0
record-00000003	rr151	@M02149:240:000000000-C8JKY:1:1101:14836:1802 1:N:0:0	GGCACAGAGCAG	GGCACAGAGCAC1
record-00000004	rr28	@M02149:240:000000000-C8JKY:1:1101:15588:1805 1:N:0:0	CTATGTGAACCG	CTATGTGAACCG0
record-00000005	rr120	@M02149:240:000000000-C8JKY:1:1101:16520:1812 1:N:0:0	CACAAAGAAGTG	CACAAAGAAGTG0
record-00000006	rr149	@M02149:240:000000000-C8JKY:1:1101:15206:1815 1:N:0:0	ACCCTATCGGTC	TCCCTATCGGTC1
record-00000007		@M02149:240:000000000-C8JKY:1:1101:15258:1818 1:N:0:0	GACTAGTTGATG		4
record-00000008		@M02149:240:000000000-C8JKY:1:1101:14571:1818 1:N:0:0	AGAAACGAGGGG		4
# make a small file to open and view easily in excel (attached):
$ head tabulated_errors/metadata.tsv > smallfile.tsv

smallfile.tsv (939 Bytes)

So it looks like if the sequence was assigned, the sample name (those "rr" numbers are my sample names) is listed. And then it also lists the uncorrected and corrected barcode sequences (again, with corrected left blank if it was tossed), and then it lists the number of errors present in each barcode, including the unassigned ones.

Now 2 more questions!

  • are the number of errors from the nearest golay code, or from the nearest barcode in my mapping file?
  • where can I find all the golay code options? For example, to check if these unmatched samples exactly match other golay barcode options and might be from cross contamination with a previous run?

Thanks!

Robin

Hi @rrohwer,

Thank you for checking this out! We may need to develop a custom visualization – as you note, the number of rows is quite large as this is a per-sequence report. I’ve opened an issue on q2-demux about this.

You are correct that if the sequence was assigned, then it gets the associated sample ID. Corrected is blank if the number of “errors” is such that it cannot be corrected meaning either the index read was not actually a Golay barcode, or there were too many errors for it to be correctable. Importantly, the number of errors are in bits, not nucleotides.

The full list of Golay codes should be accessible from here but may be in one of the supplementals. I’m unfortunately on rather limited internet at the moment so am hindered on tracking that down. The exact code used for the Golay decoding is here and includes some pretty pleasant documentation compiled by the prior developers that we ported over from the QIIME1 codebase.

All the best,
Daniel

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.