Paired-end, demultiplexed Illumina 16S

I am trying to import my amplicon dataset - paired-end, demultiplexed Illumina 16S and the files are named like so (sample):

(qiime2-2018.8) [[email protected] raw_data]$ ls
LALL-WLM-001-v10-16S-V3V4_S3_L001_R1.fastq.gz   LALL-WLM-006-v2-16S-V3V4_S21_L001_R1.fastq.gz  
LALL-WLM-001-v10-16S-V3V4_S3_L001_R2.fastq.gz   LALL-WLM-006-v2-16S-V3V4_S21_L001_R2.fastq.gz  
LALL-WLM-001-v17-16S-V3V4_S4_L001_R1.fastq.gz   LALL-WLM-006-v9-16S-V3V4_S22_L001_R1.fastq.gz  
LALL-WLM-001-v17-16S-V3V4_S4_L001_R2.fastq.gz   LALL-WLM-006-v9-16S-V3V4_S22_L001_R2.fastq.gz  
LALL-WLM-001-v2-16S-V3V4_S1_L001_R1.fastq.gz    LALL-WLM-007-v10-16S-V3V4_S27_L001_R1.fastq.gz 

And my ‘manifest.csv’ (for which I tried different names, like ‘MANIFEST’ because of error messages…) looks like this:

(qiime2-2018.8) [[email protected] raw_data]$ head manifest.csv 
(qiime2-2018.8) [[email protected] raw_data]$ tail manifest.csv 

(I have verified the path with: zcat /home/augerjer/Projects/qiime_WLM/raw_data/LALL-WLM-001-v10-16S-V3V4_S3_L001_R1.fastq.gz | head )
And my manifest was generated with this code:

echo "sample-id,absolute-filepath,direction" > manifest.csv
# Since the format asks to separate 'foward' and 'reverse' iterating for R1, then same loop for R2
for sampleID in $(ls ${raw_data}/*gz | cut -d'-' -f2-4 | sort | uniq)
    path=$(find $raw_data -name "*$sampleID*R1*")
    echo "$sampleID,$path,forward" >> manifest.csv
# Iterating for R2
for sampleID in $(ls ${raw_data}/*gz | cut -d'-' -f2-4 | sort | uniq)
    path=$(find $raw_data -name "*$sampleID*R2*")
    echo "$sampleID,$path,reverse" >> manifest.csv

To make sure that all the name were there and nothing more (without extra chars from excel or anything)

The manifest file is in the directory with the reads (also tried with it 1 level up, but no chance there either). I try to generate the artefact file, but keep getting the same error:

(qiime2-2018.8) [[email protected] raw_data]$ qiime tools import  \
  --type 'SampleData[PairedEndSequencesWithQuality]'  \
  --input-path /home/augerjer/Projects/qiime_WLM/raw_data/  \
  --output-path demux-paired-end.qza

There was a problem importing /home/augerjer/Projects/qiime_WLM/raw_data/:
  Missing one or more files for SingleLanePerSamplePairedEndFastqDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\\.fastq\\.gz'

I tried modifying the file names, redoing the manifest file and the metadata file, etc.

I don’t know what to try next!

Tanks a lot in advance!
Jérémie Auger

Hi @Jeremieauger,
Assuming the rest of your manifest file has been constructed properly, I believe you need to make the following changes in your command:

  1. Make sure your --input-path is directed to the manifest.csv file instead of its folder
  2. Add --input-format PairedEndFastqManifestPhred64 to your command as to instruct the import tool this is in the manifest format.

See the example in the importing tutorial for example

1 Like

Hi @Mehrbod_Estaki, thanks a lot for the fast answer!

I’m baffled! haha It worked!! But it is the first version of the import that I tried from the tutorial:

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path OLD_mapping_file.csv \
  --output-path sequences.qza \
  --source-format PairedEndFastqManifestPhred33

and it returned an error saying something like the “–source-format PairedEndFastqManifestPhred33” was not a valid option! Maybe my manifest was not formatted properly when i tried it and it gave me a wrong error message. (I also substituted the first part of the absolute path for $PWD i.e. changed /home/augerjer/Projects/qiime_WLM/raw_data/… -> $PWD/…)

Anyways, thanks a lot, seems to be working now :slight_smile: It created a ‘.qza’ file with a size equivalent to the size of all the reads.
Have a great one,

Hi @Jeremieauger,
Glad you got it sorted.
Your error was probably due to the phred33 vs phred64 formatting which refers to the quality score codes. Your manifest file was actually probably fine the way it was and the $PWD vs address from root not the issue. In fact I prefer giving the address from root myself buy meh it works :stuck_out_tongue:
I accidentally copied the wrong --input-format in my answer above, I should have assumed phred33 as phred64 is from much older machinse and by far most have Phred33 data. Oops, my bad on that one! But glad you figured it out anyways!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.