How to import cleaned multi-sample reads?

romain · September 13, 2018, 12:42pm

Dear all,

we have a multi-samples (34 samples) dataset of paired-ends reads.
I have demultiplexed and cleaned all reads using a pipeline of my own, ending with 34 fasta files (1 per sample).

I would like to import all these cleaned reads into Qiime for downstream analyses. To do so, I could combine all these sequences into 1 fasta file and import it using the 'qiime tools import' command.

In such case, know Qiime would know the origin of each sequence (i.e. which sample it comes from) ? I assume this information should be contained in the sequence headers and in a metadata file, but I couldn't find any clear description on how to proceed in the Qiime documentation.

any help would be highly appreciated.
thanks.

Nicholas_Bokulich · September 13, 2018, 9:33pm

No need to combine! QIIME 2 handles many formats of demultiplexed data perfectly well.

The one problem is it sounds like you have fasta data, not fastq... this means:

you can't import using one of the demultiplexed data formats (e.g., this)
you will not be able to denoise your data with dada2 or deblur. You will be forced to use OTU picking.

So if you really do have fasta data and can't tack the quality scores back on to make fastq, you will need to:

concatenate your fasta files, and yes sample information should be included in the headers following this format
Use this tutorial to import and cluster your data. At the end of that tutorial you will have a feature table and representative sequences, which you can use as described in any of the other tutorials.

I hope that helps!

romain · September 14, 2018, 6:02pm

Thanks for your help Nicholas.

I have managed to load my data in qiime using the following format for sequence headers (I didn't need metadata file):

XX.YY
where XX is the nam of the sample and YY is the name (unique) of the sequence.

As for the denoising step, I actually simplified the dataset using Vsearch 99% in qiime.

best.

Nicholas_Bokulich · September 14, 2018, 6:04pm

Excellent! Glad you were able to get your data in the correct format and do the clustering you needed.

system · October 16, 2018, 12:04am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.