How I will import Combined R1.fastq and R2.fastq files into QIIME2?

Aqleem12 · October 2, 2017, 8:38pm

The sequencing company has sent me files R1.fastq and R2.fastq files for each sample. I want to import these files into qiime2 and want to start analysis. However I donot know how to import as well as from where I begin analysis?

thermokarst · October 2, 2017, 9:56pm

Hi @Aqleem12! This sounds like the perfect use-case for our Fastq Manifest Format! This format is specifically for sequences that are already demultiplexed, but don't necessarily adhere to the Casava 1.8 naming convention. Since you have paired-end data, you will want to pay close attention to the paired-end portions of that section. There are a few examples of this sprinkled throughout the forum, here are a few to check out:

Let us know if you get stuck!
Thanks!

Aqleem12 · October 3, 2017, 3:44am

Dear Sir,

I have applied following code for combined r1. FASTQ combined r2.fastq file and get the following result. There was a problem importing pre-64 manifest, if someone help me to make a new directory and import of paired end sequence data, I would be thankfull.

(qiime2-2017.9) [sclerotinia@jianglab Abbas]$ qiime tools import \
>   --type 'SampleData[PairedEndSequencesWithQuality]' \
>   --input-path pe-64-manifest \
>   --output-path paired-end-demux.qza \
>   --source-format PairedEndFastqManifestPhred64
There was a problem importing pe-64-manifest:

  pe-64-manifest is not a file.
(qiime2-2017.9) [sclerotinia@jianglab Abbas]$

thermokarst · October 3, 2017, 3:53am

Hi @Aqleem12, I have a few questions to help us with diagnosis:

It sounds like you are using the PHRED 64 offset format — are you sure your data is in this format? PHRED 33 seems to be much more common now, but we do still see PHRED 64 with legacy data. If you don’t know, I would suggest starting with PHRED 33.
Can you provide the file (either attach or copy and paste the contents) of the file you named pe-64-manifest?
Can you provide a screenshot of the directory (or directories) with your sequence data? That way we have a general idea of file layout and naming - these values are very important when using the Manifest Formats.
What is the location of your sequences directory on your computer? You can run the command pwd when your terminal is open in that location and provide the results here.

Thanks!

Aqleem12 · October 3, 2017, 1:36pm

PF data

Dear Sir, The Image has been attached with this reply. I am connecting laptop with a server, The main directory that I have created is Abbas and for analysis I have used paired end sequence format as mentioned in QIIME2 website. May I am unable to create input path for this data. If the main directory is Abbas what should be the input path and what should be the format for the attached files in QIIME2. I mean where to start, I just make a main directory named as Abbas later I directly put the format mentioned in your website and failed.

thermokarst · October 3, 2017, 1:39pm

Thanks for the screenshot, @Aqleem12! Can you also provide answers to question 1 & question 2 above? Thanks!

Aqleem12 · October 3, 2017, 2:09pm

As I have created a directory[using Linux] named as Abbas and then I have created subdirectory name as pe-64-manifest and then copied the files into subdirectory[pe-64-manifest] that I have sent to you screen shot. After copying, then I used the following codes which found in this screen shot:

May be I am doing mistakes in making sub-directories and even the format.

thermokarst · October 3, 2017, 2:20pm

Ah, I see! Your manifest file should be a file, not a directory. The contents of the file will be a CSV, with your sample IDs, paths to your fastq file, and the read direction. You will need to make the manifest file yourself — QIIME 2 isn’t able to infer most of that information on its own. Please review the section Fastq Manifest Formats (which appears to be what you have printed out in your attached photo) - this provides some background and guidance on how to prepare your manifest file.

Aqleem12 · October 3, 2017, 2:51pm

Dear Sir,

How can I make a manifest file? I have read manifest review several times. However failed. Are there any specific commands for manifest file? What is the file path in the manifest format review? I would be thankful for your helpm

thermokarst · October 4, 2017, 3:14pm

Well, like I said earlier, you will need to make this file yourself, but here is an attempt at a starting point, based on what you have provided:

sample-id,absolute-filepath,direction
D4,/path/to/your/files/D4_combined_R1.fastq,forward
D4,/path/to/your/files/D4_combined_R2.fastq,reverse
D8,/path/to/your/files/D8_combined_R1.fastq,forward
D8,/path/to/your/files/D8_combined_R2.fastq,reverse
D12,/path/to/your/files/D12_combined_R1.fastq,forward
D12,/path/to/your/files/D12_combined_R2.fastq,reverse
D16,/path/to/your/files/D16_combined_R1.fastq,forward
D16,/path/to/your/files/D16_combined_R2.fastq,reverse
# ... fill in the rest of your files here

This is assuming that R1 are your forward reads and R2 are reverse. Also, you will need to update /path/to/your/files to match the file path to your files, on your computer.

Assuming you save that CSV manifest as aqleem12-manifest.csv, you could run the following import command, from the same directory as the manifest:

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path aqleem12-manifest.csv \
  --output-path aqleem12-paired-end-demux.qza \
  --source-format PairedEndFastqManifestPhred33

Also, I noticed your screenshot of your files appears to be a Windows computer. It sounds like you might be running QIIME 2 on a different server, but just to make sure it is clear, QIIME 2 is not supported natively on Windows, you will need to use a Virtual Machine if you want to use QIIME 2 on Windows.

Hope that helps!

Aqleem12 · October 5, 2017, 12:28am

Dear Sir,

Thank you for your support and timely help. I will follow the format that you have sent to me. If something happen I will contact again. Thanks

Aqleem12 · October 5, 2017, 7:46am

Dear Sir,
I have made a manifest file covering samples. I have two samples R based samples belong to rice fields and D based samples belong to dry fields. I have made only one manifest file as seen in the screen shot for both R and D based samples. I donot know whether the format is same, I have got qza file. Please take time and look in the screen shot, Am I right or not? I have put every thing in the screen shot you may understand and guide me further. I have created manifest file in notepad. Thanks for your support.

thermokarst · October 5, 2017, 3:42pm

Looks like you are off to a great start @Aqleem12! You can begin to explore your data to get a sense of whether or not this import step was successful: you can pick up at the Demux Step in the Moving Pictures Tutorial, but skip the first command (qiime demux emp-single) - you don’t need to demultiplex your data, because it was already demultiplexed prior to importing into QIIME 2. So, the next step would be to run the qiime demux summarize command, evaluate you results, and if they appear reasonable (this is really up to your interpretation, I would recommend looking at the sample included in the tutorial to get a sense of what “typical” data might look like here), you can continue on with Sequence quality control and feature table construction. Hope that helps!

Aqleem12 · October 6, 2017, 12:47am

Dear Sir,

Thanks for your help. According to your suggestions and format, I got a file named as qzv. I applied qiime demux summarize and what is the next step? Please also define why we do demux? why we start from a manifest file? Can we make manifest file in Notepad or Wordpad? I have attached screen shot of the recent step, If it is right what would be the next step?

Thanks for your regular suggestions and help!!

thermokarst · October 6, 2017, 10:13pm

Hi @Aqleem12!

You're welcome!

I highly recommend reviewing the Moving Pictures Tutorial to learn about some of the common steps in a typical QIIME 2 analysis. In fact, we recommend that you perform the Moving Pictures Tutorial with the Moving Pictures data, prior to attempting to run your own data through QIIME 2.

Take a look at the Getting Started Guide, this provides a high-level overview of QIIME 2, and how to analyze your microbiome data. Answers to these kinds of questions exist throughout the QIIME 2 docs! So specifically in this case, we run the summarize method of the demux plugin to get a high-level understanding of our sequences, prior to quality-control and read-merging.

Because your data is already demultiplexed (see the Casava format for an example of data that is multiplexed). The manifest file is a general-purpose way of telling QIIME 2 what Sample ID to assign to each set of sequences, and also in your case, to tell QIIME 2 if your reads are forward or reverse orientation.

You can write your manifest file in any text editor or spreadsheet software! The requirements for QIIME 2 are specified in the docs, so as long as your file meets that specification, you are good to go!

Please take a look at the Getting Started Guide --- this provides a suggested order for you to review the QIIME 2 docs, in order to familiarize yourself with the software, and some general microbiome bioinformatics information!

The screenshot you have provided indicates that your initial problem has been solved! If you need additional support (after you have read and reviewed the resources available to you at https://docs.qiime2.org), feel free to create a new topic here on the forum --- the original problem posed in this topic appears to be solved and we ask that users of the forum create new topics for new (or follow-up) questions. Thanks!

Aqleem12 · October 6, 2017, 10:59pm

Dear Sir, Thanks for your regular support. I will follow the getting started Guide. Thanks again

system · November 7, 2017, 5:00am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.