Qiita download with no raw data

I have downloaded the folder ‘All QIIME Maps and BIOM’ from Qiita. However, I do not understand the structure of the downloaded folders. It has 3 subfolders:

BIOM - many BIOM and seqs.fa files (with different number ID’s)
mapping_files - many mapping files (with different number ID’s)
processed_data

What do these files represent? Which ones do I use to do analyses like the ‘Moving Pictures’ tutorial (mainly with sequences, barcodes, and mapping file/metadata)?

This is where I downloaded the data, which is not a very big folder:
https://qiita.ucsd.edu/study/description/10376

Many thanks for your help!

2 Likes

Hi @YinXun,
QIITA is separate from QIIME 2 and has its own support venue — I believe an FAQs section and email address are provided on the QIITA website for support. Since nobody has answered your questions here yet I recommend contacting the official support for QIITA.

2 Likes

Hy YinXun,

For this study (as many in Qiita), the raw data is not available ( sequences, barcodes, and metadata). So, if you want to apply the ‘Moving Pictures’ tutorial for this case, you should modify some steps at the beginning of this tutorial (at less until demultiplexed files) and download other files in Qiita.

After clicking in the link that you left above and enter in the study description, follow the steps in the pic:

  1. Once you've downloaded the file "741_seqs.fastq" open it and see the structure, you're going to see that barcode sequences are inside the sequences. So the next step is make a demultiplex of the samples with the sequences but before of this, compress this raw file with gzip and change its name, after this, you'll get some like this forward.fastq.gz

  2. Now, import this raw file in a Qiime2 artifact, using the type "MultiplexedSingleEndBarcodeInSequence". Your instruction should look like this:

qiime tools import
--type MultiplexedSingleEndBarcodeInSequence
--input-path forward.fastq.gz
--output-path multiplexed-seqs.qza

  1. Download the metadata from Qiita in the section "Sample information" and format it following the recommendations of Qiime2 developers. Or see the example that they give here Moving Pictures sample-metadata (QIIME 2 2019.7) - Google Sheets

  2. Now, make demultiplexing of your raw file, using the artifact created previously and the formatted metadata with the plugin "cutadapt" and the method "demux-single: Demultiplex single-end sequence data with barcodes in-sequence". Your instruction, should look like this:

qiime cutadapt demux-single
--i-seqs multiplexed-seqs.qza
--m-barcodes-file metadata.tsv
--m-barcodes-column barcode-sequence
--p-error-rate 0
--o-per-sample-sequences demultiplexed-seqs.qza
--o-untrimmed-sequences untrimmed.qza
--verbose

If you want see more details about this, see this post: Demultiplexing and Trimming Adapters from Reads with q2-cutadapt

Once finished the process, see inside output artifact and you'll see your demultiplexed samples, like this:

After this, I guess that you can follow the "moving pictures tutorial normally" at the step " Sequence quality control and feature table construction".

Hope that this can help you :slight_smile:

Kind regards,
Alejandro.

5 Likes

A post was split to a new topic: Extracting sequencing data from qiita