I have different data than the QIIME2 tutorials and am struggling to create a manifest file/import my data.

Dear Colleagues,

I have been going through the QIIME2 tutorials (from "moving pictures" to "Parkinson's mouse") and honestly, they are great. However, each tutorial starts with me downloading the required data.

My problem is that I have finally received my sequencing data, but I do not understand how to prepare my data for importation into QIIME2. Specifically, my data looks much different to those used in the tutorials or online videos. We sequenced 16s rRNA (V4-V5) through a company (Novogene) that provided cleaned reads in a few files (see below). Why are there three different fasta files in one folder? Are these paired or single reads? How are these files different to those in the tutorials (e.g. 10483.recip.220.WT.OB1.D7_30_L001_R1_001.fastq.gz)? Lastly, how would I go about creating a manifest and importing the data?

Any help would be greatly appreciated!

Thank you kindly.




Hello Johann,

Welcome to the forums! :qiime2:

It sounds like you have already found the Fastq manifest format, which is what I would try first. You could make the manifest in Google Docs or Excel.

  • I'm guessing the .fna files are the same as the .fq files, but in the fasta format that does not include the quality scores of the fastq file format.
  • ExtendedFrags is something (semi-)custom Novogene is doing. They should provide docs for this.
  • These are probably single-end reads, but there is a slight chance these are interlaced paired-end reads.

It's possible that C7.fq.gz is just like that example file, but with a much shorter file name because other stuff like lane (L001) and direction (R1) direction have been removed.

While we are at it, what does your Rawdata look like? 'CleanData' sounds like it's been preprocessed somehow and starting from raw data can be advantageous in a number of ways.

1 Like

Thank you kindly for the help. I created a manifest and was able to import the data. Unfortunately, the data seems to be paired-end reads. If I may, would you perchance have any idea which files (in the above post) are the forward and reverse reads? Or are they already combined in one folder?

Lastly, please find the raw data folder below.

1 Like

Convention is to label forward reads 1, and reverse reads 2.