Unknown input file format

mshelomi · May 22, 2018, 12:54pm

Hello,

I have clean, quality controlled, fastq files named like A02.T0.fastq that start like this:

@A02.T0_1 HISEQ:722:H5WK5BCX2:1:1101:20594:3175 1:N:0:ACCTCCAA orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0

There is no barcode file. This is it.

I was told to import it as SampleData[SequencesWithQuality] or SampleData[JoinedSequencesWithQuality], but I get this error:

Missing one or more files for SingleLanePerSampleSingleEndFastqDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz'

I use "qiime tools import" and have used literally every --type listed under "qiime tools import --show-importable-types" . None work.

What am I missing?

Thanks

mshelomi · May 22, 2018, 12:54pm

What is a "Manifest?"
What is the format of a Manifest?
How can I make a "Manifest" for my fastq.gz files for use when importing using a texteditor?

Mehrbod_Estaki · May 22, 2018, 5:42pm

HI @mshelomi,

The manifest file is just a simple csv file with 3 specific columns in it and is used to import already demultiplexed files. Have you had a chance going over the manifest import tutorial yet? I think it covers all your questions pretty well.

mshelomi · May 23, 2018, 4:31am

Thanks, but the link did not help.

The error I am getting is that my file is not a "FastqManifestFormat"

The link you sent, Importing data — QIIME 2 2018.4.0 documentation, lists only four possible formats:
SingleEndFastqManifestPhred33
SingleEndFastqManifestPhred64
PairedEndFastqManifestPhred33
PairedEndFastqManifestPhred64

My csv is in the format you described, and attached. MANIFEST.csv (19.8 KB)

The import page also makes no mention of metadata requirements or the fact that it must be in yaml format… sometimes. There is no mention of the format the filename must take, or the sample-name.

Mehrbod_Estaki · May 23, 2018, 5:05am

Hi @mshelomi,

Sorry you're still stuck on this and thanks for providing the manifest file!

There are 2 things I can see that would get in the way of a successful import at the moment.

If this is correct, then your files are not gzipped, however the file-paths you provided in your manifest are looking for gzipped format files. For this you need to either gzip your individual files first, or correct the manifest file by removing the .gz extensions. The manifest type import will then automatically gzip the files itself.

The second thing I noticed was that in your manifest file your sample-ids are all unique instead of being paired. For example

A02T0-1 <- forward reads
A02T0-2 <-reverse reads

need to have an identical sample-id (ex. A02TO) so the script knows they belong to the same sample.

When these are fixed the following command should work :

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path MANIFEST.csv \
  --output-path pe-demux.qza\
  --source-format PairedEndFastqManifestPhred33

Here I am guessing that your format is Phred33 as most recent illumina sequencers are, but you could double check that with your sequencing facility/operator.

The metadata file which provides environmental information about your samples is not required at the import step. Instead it is provided separately downstream as needed, depending on the analysis you are looking to do. More details, including its formatting and validation are explained here.

Hope this helps!

mshelomi · May 23, 2018, 12:49pm

Thanks.
I can now import the files.
Unfortunately the resulting .qza of raw files cannot be demux'ed because it "is not a subtype of EMPPairedEndSequences | EMPSingleEndSequences | RawSequences"
Unfortunately the resulting .qza of clean files cannot be aligned because it "is not a subtype of FeatureData[Sequence]"

But that is a separate question. Thanks for the help so far!

system · June 23, 2018, 6:49pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.