How to import a extendedFrag format

Hi everyone, I'm Carolina (a Master student) and this is my 1st time working with qiime2 in a real case.

I did the tutorials and all the examples that I found in the web and theire was so usefull, but I have a problem with my sequences.

I'm working with the bowel content from rats and send my DNA extraction for 16S rRNA gene amplicon data for do a Taxonomic classification, as the tutorial. The problem is that when the results from sequencing come, there are diferent folders:

  • fastq folder: containing all samples already demultiplexed as they come out of the sequencer without any additional quality filtering and trimming applied.
    • You will find two files R1 and R2 respectively for each sample. These files contain forward and reverse reads for each sequence.
  • cleaned folder: For each sample you will find trimmed and cleaned reads according to the parameters described in the Methods section. When a sequence does not fit the minimum standards of length and/or quality, it is removed from the dataset.
  • stats folder: by-samples stats txt files and figures
  • joined folder: merged R1 and R2 reads after passing QC (sample.extendedFrags.fastq.gz); notCombined reads are also reported.
  • reports: interactive by-semples html files for deeper knowledge on sequencing quality.
  • QCReport_LEA22-070.html: This report.
  • LEA22-070.md5sum: a MD5sum files to check the integrity of the sequences. In case of a problem while downloading or reading the files, you can check the integrity of the downloaded file by running the following command:

I had worked the fastq folder without problems, because for the import is CASAVA 1.8 and the comand is clear, but I need to work with the joined folder (the marked one) and I've never seen that format before (sample.extendedFrags.fastq.gz) to import to qiime, ¿How can I work with this format?

Thanks for your atention, Carolina.

Hello Carolina,

Welcome to the forums! :qiime2:

I would recommend starting with the folder of fastq files:

This is great because you get to choose how to process your data and make decisions about how to join your reads, instead of being stuck with the decisions made by your sequencing core. You can do all this processing within Qiime2 as well, and all the settings used will be recorded in detail inside the artifact provenance.

You can import these using the fastq manifest format!

(It's possible to import joined data, as you requested. However, plugins that want to join your data like DADA2 don't work as well. Instructions are on that importing page, if you want to take a look.)


You may have found this tutorial already, but the Parkinson’s Mouse Tutorial also works with an animal model which you may find useful. :mouse2:

1 Like

Thanks! I finally could do all the steps of the mouse tutorial, thank you very much :smile:

1 Like