How can I process MD5sum format sequence data in the QIIME2?

I am very new to the bioinformatics tool. I have paired-end Illumina sequence data with R1 and R2 fastq.gz data.
For instance, the format is
ad68a39a27ea01cfa085b1a28517a84b MI.M03992_0595.001.FLD0062.190391_R1.fastq.gz 7da2ce774619ae53d46c84d301bc372e MI.M03992_0595.001.FLD0062.190391_R2.fastq.gz.
I have the QIIME2 21.2 version and I am running on Window 10 with anaconda. Any suggestion will be a great help. Thank you!

Hi @laxmi_sharma, welcome to :qiime2:!

Is this multiplexed or demultiplexed data? See the Importing Data documentation to help determine which you have. Assuming this is demultiplexed data, then I'd suggest simply importing your data via a manifest, here you can simply assign sample names to each of your sequence files.

1 Like

Thank you @SoilRotifer!
Sorry, I am responding late, but can you explain how can I create a manifest file for my data.

So until now, I have learned my data is compressed demultiplexed paired-end sequence. I got the fastq file using notepad ++ from compressed for my general knowledge.
MI.M03992_0595.001.FLD0001.190376_R1.fastq.gz
MI.M03992_0595.001.FLD0001.190376_R2.fastq.gz

I tried to create a manifest file as instructed in Importing data — QIIME 2 2021.2.0 documentation. But it seems I am doing some mistakes.
It would be a great help if you could help me create a manifest file and also further steps to apply DADA-2.
FYI- I am using windows to use QIIME2.
Thank you!

Can you let us know what issues / errors you are running into?

Often users have trouble getting :qiime2: (running within WSL) to find their own files that exist in a Windows path, which is not easily understood by WSL. I'd recommend reading through the following posts:

-Mike

1 Like

Thank you @SoilRotifer. I was able to format my sequence and perform a taxonomy assignment. I used Greengenes for practice. Is it possible to use Greengenes and RDB classifiers for taxonomic classification?
If yes, can you please direct me on how I can do it?
Thank you again!

Yay! :tada:

There are several options. You can use the pre-made Greengenes or SILVA classifiers that are available on the Data resources page. You can also use RESCRIPt to compile your own SILVA, or NCBI reference db.

We have plans to make it easier to import and format RDP and GTDB reference sequence and taxonomy data for use within QIIME 2. But at this time we can not provide a time frame when this will be completed. If you search the forum you'll find several threads in which others may have compiled RDP and GTDB files for use. They may be willing to share those files so that you can train your own classifier.

That is perfectly fine. @SoilRotifer, Thank you very much for responding to me with such patience. I will try to use SILVA to classify my sequence taxonomy.

I realized this is my manifest file but not a metadata file. But I would have to perform alpha and beta analysis based on sample biological features. If you are aware of this process, can you please help me to find a way to do my task?

I’d suggest working your way through the tutorials. The most full-featured one is the PD-Mice tutorial. When you’ve gone though that and/or the other tutorials, I’d recommend working through the Atacama-Soil tutorial as an example of analyzing data from paired-end reads.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.