Raw Files and fastq files

mohsen_ej · October 28, 2020, 11:20am

I have some raw files, actually some fastq files. I need to do alpha and beta diversity. I don't know how to start the project? I mean I think firstly I need to write the metadata but I don't know how? is there any tutorial or something which you think is useful for me?
I just have about 24 fastq files that are from 3 different animals.
Thank you

cherman2 · October 28, 2020, 9:13pm

Hey @mohsen_ej,
Great question! We do have a metadata tutorial. I have linked it here. I also recommend Keemei. It helps validate your metadata file to make sure QIIME2 can use your metadata file. I hope that helps! Please let me know if you have any other questions!

mohsen_ej · October 28, 2020, 11:55pm

Thank you for your help.
is there any way to convert demux.qza files to metadata.
I mean I imported my data into on demux.qza . can I have a metadata from it and is it reasonable?
Thank you

cherman2 · October 29, 2020, 6:55pm

Hello @mohsen_ej ,
Unfortunately, No. We need a metadata file in order to get information about your sequences in your demux.qza
I would really suggest reading that metadata tutorial, especially the first couple of paragraphs. It explains what the metadata is used for in QIIME2.
In a previous question, Li gave a great example of what metadata should look like: here

mohsen_ej · October 30, 2020, 10:02pm

Thank you. I read the links you provided me but actually, I didn't find out how to start to write the metadata. I mean how to convert something like the picture which is all alphabetic character to numbers or something and to metadata. sorry if this is simple for you but I am new to qiime :)).
Thank you

cherman2 · November 2, 2020, 5:20pm

Hello @mohsen_ej,
Getting your data set up in qiime2 is one of the hardest parts. The thing is that you can’t grab really anything from the demux or the fastq files for your metadata file. You will need to have the sample ids that match the sequences and then information about your study. The sample id and barcodes can't be gathered from the fastq file. The Sample-id and barcodes mapping should be provided by the sequencing center.

An example would be: I think that humans have a different microbiome than cats. So, I would collect samples from humans and samples from cats and I would sequences them. I would then create a metadata file that has the sample id for each sample and information on whether that sample is a “cat” or a “human”. As you can see from my example all the information really comes from my experimental design and not the sequence files.

Let me know if that helps.

mohsen_ej · November 3, 2020, 10:48pm

yes, your guidance was really useful, and thank you for that but the issue is that the data is demultiplexed already so there are no barcodes and about the sample id, there are just the names of the sample's file named by sequencer.for example Sy-Bi1B_S39_L001_R1_001 . I think the data is paired-end. should I merge the R1 & R2 before writing metadata or what? and should I include something like a barcode or something in the metadata when there is no barcode file?
another question is that I see all the sample's sequences start with a specific sequence. what are these repeated sequences? for example, all the sequences start with CCGATCA.
really sorry if my questions are so simple. I am new to qiime.

cherman2 · November 4, 2020, 6:23pm

Hello @mohsen_ej,
Your questions are great so no need to apologize!

I think I am understanding the situation more with the added context. You are working with Demultiplexed sequences! Awesome, My follow up question would be: Have you imported your data into QIIME2 yet? If not, I would suggest this tutorial Parkinson’s Mouse Tutorial. This tutorial works through importing demultiplexed sequences. This tutorial is for single end reads so you will have to make adjustments for that. You will need a manifest file in order to import demultiplex sequences. I would look at this tutorial section because it will help you create a manifest file. After I have created a manifest file, I usually merge that into a metadata file ( when needed) by adding experimental design info .For example (referring to my example in my last answer) whether or not the sample came from a cat or a human.

You can also look at the Atacama soil tutorial. This uses paired end reads for data analysis. It will give you a good idea of how to adjust the Parkinson's mouse tutorial but it is not how I would suggest trying to import your data given that you don't have a barcode file.

As for the repeated sequences, they may be PCR primers but I am not sure without more information. I would ask whoever gave you the data if primers are still in the sequence. If the data is raw then it still may have the primers in it. These primers being in the sequence is not a big deal and can be removed later.

I hope this helps! :qiime2:

thermokarst · November 5, 2020, 1:10am

A post was split to a new topic: Trouble importing demultiplexed FASTQ files

system · December 6, 2020, 7:10am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.