How to use barcode without quality information

Hello every user of QIIME, i just ran some tutorial document and not do very well applying QIIME for Pacbio full-length metagenomics.
So I want to use data from a paper to practice how to use QIIME.

The paper’s DOI: 10.1186/s12866-016-0891-4
In this paper, we can find metagenome data(ERR1447468) and barcode data without quality information(Additional file 1: Table S1) as follow:

  1. "metagenome data"
    https://trace.ddbj.nig.ac.jp/DRASearch/experiment?acc=ERX1517843
    (.SRA which can be converted into .fastq to import into QIIME)

    For example,one spot from fastq data is like:

@ERR1447466.3 length=83
CTGTGTTTTTATATATTTATTTTTTTTTTTAGTATTCTTTTTTAACGTTTTGTATTTTTTAGTTGATTTATTTTGTTTTTGAG
+ERR1447466.3 length=83
!!!"!!!!!!"""#""""""!""!"!!!!"!"!""#!!!!"""""!""!!"!""#!""!"""!"!""""!!!"!""!!"!"!!
  1. “barcode data without quality information”
    (just Primer Name,Barcode Sequence,Primer Sequence data)
    https://static-content.springer.com/esm/art%3A10.1186%2Fs12866-016-0891-4/MediaObjects/12866_2016_891_MOESM1_ESM.doc

How do I import this two data into QIIME to perform metagenomic analysis?

Hi @Coke_Lin,

If your goal is just to practice processing data in QIIME 2, you might want to consider using a different dataset (e.g., with quality scores in the barcode) that should be less difficult to process in QIIME 2. However, I recognize you may want a PacBio dataset specifically...

It looks like the link to the metagenome data actually contains both a forward read and an index (barcode) read. Those will be the fastq sequence files that you want to import into QIIME as described below.

Check out this tutorial to learn about importing different data types. This is the specific example you want.

However, note that the barcodes need to have quality scores to import with this command. This is why I suggest using a dataset with quality scores in the barcodes for practice purposes. Otherwise, you need to generate fake quality scores for your barcodes (this would need to be done outside of QIIME 2 before attempting to upload).

This file is actually a list of barcodes used for each sample, not the barcode sequences; add these barcodes to your sample metadata file (see this file for an example format), do not try to upload this file as barcode sequences.

I hope that helps!

2 Likes

Thanks for the reply, it’s very helpful!
So the MS world file containing barcode in this paper is like a mapping data/barcode (.tsv )?And real barcode sequence data can be saparated form ERR1447468 data by split_libraries.py ?

Hi @Coke_Lin,

Exactly! It is not in the same format as the mapping files used in QIIME2 (see the link above for an example), but you can copy the barcodes out of this file and place them in your metadata mapping file to get started.

Yes, the actual barcode for each sequence will be contained within that sequence file; whether the barcodes are contained within the read or in a separate read I do not know (I am not familiar with PacBio data) and you may want to get in touch with the study authors to be sure. If the barcodes are in a second sequence file, you should be able to import those files directly into QIIME2 for demultiplexing. If the barcodes are contained in-line in each sequence, you will need to use a method outside of QIIME 2 to extract those barcodes into a separate fastq file, e.g., the qiime1 script extract_barcodes.py, and then import the resulting fastqs to QIIME2. That functionality is planned in QIIME2 for the next release (end of this month) but is not available in the current release.

I hope that helps!

2 Likes

Thanks for the quick reply and Hints! I’m currently trying to use split_libraries.py to search for barcode in the FASTQ sequence data. Hope everything goes well with you : )

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

QIIME 2 2017.12 has a new cutadapt plugin which provides demux-single and demux-paired for demultiplexing reads where the barcodes are included within your reads!