Can I start DADA2 ASV workflow with this format

Hello, our lab has some old amplicon sequencing data (couple years ago). We would like to do some metaanalysis using latest QIIME2 Dada2 workflow. We want to use ASV this time. However, I check the old data which was sequenced by a formal student long time.

The file name is like this, all fasq files with Forward and Reverse for each sample.



I was told by the formal student. The sequencing center has been multiplexed and removed the barcodes. He doesn't have barcodes for each samples. However, when I check each fastq files. It seems the barcodes sequences are in the file name (GGAGACAAGGGA). Also, I search these barcode sequences in each fastq file. They are still there, so I don't think the barcodes have been removed from the fastq file.

If so, can I start my old data using QIIME2 ASV workflow. If I can, how should I import my data? I check here (“Moving Pictures” tutorial — QIIME 2 2021.8.0 documentation). It seems no way to do this.


These appear to be Casava 1.8 reads, which is good news! You will need to compress them with gzip (here are the gzip docs) to get them as a fastq.gz file, but then you can follow this tutorial!

Thank you. I will try this. I read the tutorial and it seems they looks like Casava format. I have some follow up questions.

1>When I prepare the input data, I need to gzip each file into a fastq.gz format, right? I can't gzip all of the files into one fastq.gz? Or I can't use the individual fastq file directly?

2>As I mentioned, this type of data format from old sequencing center and old data, I also I have recent data. The format is very normal. We have three files - bacorde file, Forward and reverse data? Basically, I need to analyze all of the data.

What should I do? I can import my data at one time? Must I run QIIME twice and build two ASV tables? Later combine two tables? Or Is there a way to import old and new data just and run QIIME once?


Exactly! The other option you have is to create and use a manifest file to import your older data.

You will have to import, remove adapters, and denoise each dataset separately. The import and adapter removal steps will need to tailored to each dataset.

Since you are using DADA2 for the denoising, you need to make sure the trunc and trim parameters are set the same for all data sets that you plan to combine.

For an example of this in action check out the FMT tutorial!

For importing your newer data, simply follow the same steps that you have used before, stopping right before the denoising. Then find parameters for all of your datasets that will work for denoising with DADA2 and then run DADA2 on each set individually but with the same settings.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.