Importing and Demultiplexing Sequence Data Quick Reference

Nicholas_Bokulich · March 11, 2020, 9:14pm

The following is a "pocket guide" to determining the appropriate methods for importing and demultiplexing FASTA/FASTQ sequences (primarily from marker-gene sequencing experiments). Many samples are often "multiplexed" (pooled and sequenced together) on a single sequencing run. So the first step to analyzing these data is to demultiplex the data. Often, demultiplexing is performed automatically by sequencing centers/services and users. How can you tell if you have demultiplexed data? If you have one sequence file (or pair of files) per sample. The following steps are meant as a guide to determine appropriate steps and tutorials for importing (and demultiplexing) sequence data.

If you find errors in this guide, or want to provide steps for importing or demultiplexing other formats, get in touch!

Do you have demultiplexed FASTQ sequences? (i.e., one file per sample)
- Yes
  - Are the sequences in CASAVA 1.8 format? (see descriptions of CASAVA 1.8 and other FASTQ formats for details)
    - Yes: use the CASAVA format to import
    - No: Use the appropriate Manifest format to import
    - I don't know! Use the appropriate Manifest format to import
- No
  - If you followed the Earth Microbiome Project (EMP) protocol: import as an EMP format — single-end or paired-end — and demux with the corresponding method in q2-demux...
  - If you have a separate barcodes/index FASTQ file: import as an EMP format: single-end or paired-end — and demux with the corresponding method in q2-demux...
  - If barcodes are in-line with the sequences: see the q2-cutadapt tutorial for importing and demux instructions.
  - Do you have dual-index barcodes? If yes, use q2-cutadapt but see the demux-paired manual for details on using dual-index barcodes. Note: combinatorial dual-index barcodes are not yet supported.
  - See also: how to demultiplex without a barcode fastq?
Do you have demultiplexed FASTA data?
- Follow the OTU clustering tutorial
- Note: the denoising methods in QIIME 2 require FASTQ data so your only choice is to use OTU clustering or if possible convert your FASTA data to FASTQ.
Having trouble answering the questions above?
- If in doubt, contact the sequencing center/service or the original source of the data.
Don't have FASTQ or demultiplexed FASTA data?
- See the importing tutorial to see if a description exists for your data type. If not, get in touch on the QIIME 2 forum.

Nicholas_Bokulich · October 6, 2020, 6:28pm

A post was split to a new topic: best method for importing my data