Input File Type (casava-18-paired-end-demultiplexed)

ninaxhua · May 22, 2018, 1:56pm

Would this file be categorized as casava-18-paired-end-demultiplexed format if it starts with this:

@M02825:29:000000000-AT791:1:1101:21434:2058 1:N:0:22

There are 2 files (forward and reverse) and no barcode file.

Mehrbod_Estaki · May 22, 2018, 5:51pm

Are there 2 files per sample or 2 files in total that have all the samples within them? If the former, then I would say yes. If there are only 2 files in total for all of your samples then the reads are not demultiplexed and you would have to use a different importing format.
The first line which you are displaying just hold some information about the run itself.
See here for more detailed description of what that line shows.

ninaxhua · May 22, 2018, 6:05pm

There are 2 files per sample, so then it would be casava-18-paired-end-demultiplexed formatted?

Mehrbod_Estaki · May 22, 2018, 6:35pm

It would be if individual sample file names match the formatting described here as per the casva1.8 format. Otherwise I believe you might use paired end Phred format variant to import.

ninaxhua · May 22, 2018, 6:53pm

The file name doesn’t match the format. It’s just “samplename_R1.fastq.gz” and “samplename_R2.fastq.gz”. How can we determine if the format variant was 33 or 64?

Mehrbod_Estaki · May 22, 2018, 7:06pm

No problem. Once you’ve created your manifest file as per the linked tutorial earlier, try this:

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path pe-33-manifest.csv \
  --output-path paired-end-demux.qza\
  --source-format PairedEndFastqManifestPhred33

The Phred33 vs Phred64 variant can be a bit tricky to figure out but most current Illumina machines use the 33 variant so if I had to guess I would start there. You can also ask your sequencing facility and they should have this information as well.

system · June 23, 2018, 1:06am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.