Importing FASTQ


(Lena Lapidot) #1

Hello and thank you for the great QIIME2 and for the support in this community.
I have a question regarding fasta files.
I was used to do the analysis with all the fasta files combined to 1 file for Forward reads and one file for all the Reverse reads.
Recently I received data with a single file for R and F for each sample, resulting in a large number of single fasta files…
How do I begin the analysis in this case?

Thank you,
Lena


(Matthew Ryan Dillon) #2

(Mehrbod Estaki) #3

Hi @LenaLapidot,
It sounds like you have demultiplexed files which are easily importable into qiime2 with the manifest approach. Have a look through the importing tutorial to see how you can import these.


(Nicholas Bokulich) #4

(Mehrbod Estaki) #7

Just a follow up on that @LenaLapidot, you mentioned you had FASTA files, did you mean FASTQ or you actually have FASTA which are without quality scores? The manifest approach is meant to work with FASTQ files and there actually isn’t a way to currently import demultiplexed FASTA files in qiime2. Your best bet in that scenario would be to backtrack to get either raw-reads or fastq files to import into qiime2. Sorry if this caused confusion!


(Lena Lapidot) #8

Hi,
Thank you for the answer.
I did mean that I have fastq files. Sorry for the confusion.

I thought that the manifest approach is the one…but I have over 100 fastq files, with a F and R for each.
I have some more questions (I’m relatively new in this world of bioinformatics)…

  1. Do I chose F or R for each file? Do I use both?
  2. Do I build the csv file manually? Adding each fastq file one by one?
    In the example in the tutorial there was a small amount of fastq files…

Thank you!
Lena


(Matthew Ryan Dillon) #9

(Mehrbod Estaki) #10

Hi @LenaLapidot,
No problem!
You can use both F and R reads for each sample or you can just import one of them, usually the forward reads have better quality than the reverse so if you were choosing only one, then the forward reads are probably your best bet. That being said, I would recommend starting with both F and R, referred to as paired-end, and follow the pipeline for those.
And yes you would have to build your manifest file manually unless you make some custom code for making this file automatically. For example, see one approach used for automatic manifest file making using R posted here.

Overall, if this is your first go at analyzing this type of data I would strongly recommend doing some additional readings first to understand the process otherwise it might get a bit overwhelming. For example, the qiime2 for dummies tutorial is a good start or even more comprehensive and theoretical is this documentation.. Good luck!


(Mehrbod Estaki) #11

(Lena Lapidot) #15

Thank you for the explanation and for the links.
I’ll start digging in :slight_smile:


(Lena Lapidot) #16

Hi again,
I completed the manifest file. Here are the first lines from the file:
J-01,/Users/lenalapidot/Documents/PSC-IBD-J_Sabino/wetransfer-df0ab3/J-01_S105_L001_R1_001.fastq.gz,forward
J-02,/Users/lenalapidot/Documents/PSC-IBD-J_Sabino/wetransfer-df0ab3/J-02_S106_L001_R1_001.fastq.gz,forwardJ-01,/Users/lenalapidot/Documents/PSC-IBD-J_Sabino/wetransfer-df0ab3/J-01_S105_L001_R1_001.fastq.gz,forward
J-02,/Users/lenalapidot/Documents/PSC-IBD-J_Sabino/wetransfer-df0ab3/J-02_S106_L001_R1_001.fastq.gz,forward

I have now a manifest.txt file with all the samples.
I try to run the following command and get an error:
(qiime2-2018.8.0) ➜ wetransfer-df0ab3 qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path manifedt.txt
–output-path single-end-demux.qza
–input-format SingleEndFastqManifestPhred33
There was a problem importing manifedt.txt:

manifedt.txt is not a(n) SingleEndFastqManifestPhred33 file

Is there a problem with the manifest?
Thank you,
Lena


(Mehrbod Estaki) #17

HI @LenaLapidot,
Is it possible that you’re just mistyping the manifest.txt since it shows as manifedt.txt in your input commands?


(Lena Lapidot) #18

I tried also doing this:
wetransfer-df0ab3 qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path manifest.csv
–output-path single-end-demux.qza
–input-format SingleEndFastqManifestPhred64
There was a problem importing manifest.csv:

manifest.csv is not a(n) SingleEndFastqManifestPhred64 file


(Matthew Ryan Dillon) #19

Hey there @LenaLapidot! If you upgrade to QIIME 2 2018.11 you will get a more detailed error message for this error, hopefully it will indicate exactly what is wrong. Any chance you can install 2018.11 and try again? If you are still stuck, please share the 2018.11-generated error message. Thanks! :t_rex:


(Lena Lapidot) #20

I updated to qiime2-2018.11 and it worked. Thank you!