Demultiplex pair end sequence import problem

Dear All,

I am new to QIIME2, I have read and practise most of the QIIME2 tutorials and tried my best to solve my problem.

I got the demultiplex data with barcode and primer trimmed, the data was from Hiseq PE 250, but I failed to import them into QIIME2. I am not sure about the the meaning of the format, like in my mac:

qiime tools import --show-importable-formats

I check for the importable format but some formats are not mentioned in the tutorial, that’s why I am not sure about how I can import my data. May I know anywhere I can check for the detail of the formats?

My data has R1 file and R2 file for one sample, the forward sequence looks like:

@HISEQ:788:HCGH3BCX2:2:1101:3850:2219 1:N:0:AACAACCA
TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGCGATGAAGGCCTTCGGGTCGTAAAGCTCTGTCCTCAAGGAAGATAATGACGGTACTTGAGGAGGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCTAGCGTTATCCGGATTTACTGGGCGTAAAGGGTGCGTAGGTGGTTTTTCAAGTCAGGAGTGAAA
+
HIIIHHIIIIIIIIIIIIIIIGIHHHIIIIIIIHIIHIIIIIIIIIIIIIIIIHIIIIHIIIHFHIIGHHHIIHHIIFFEHIIGIIIIIIIIIGHHIHGIIHIDGHFHIIIHHI[email protected][email protected]@CEH>DHFHGI-@EH45CH[email protected]@C@@-6-@F#######
@HISEQ:788:HCGH3BCX2:2:1101:5174:2079 1:N:0:AACAACCA
TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCCATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTCAGTTGTGAGGAAGGCAGTGTCGTTAATAGCGGCATTGTTTGACGTTAGCAACAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGCATGCAG

I tried the most possible format and follow the instruction in the tutorial in my mac:

qiime tools import
–type ‘SampleData[PairedEndSequencesWithQuality]’
–input-path test
–source-format CasavaOneEightSingleLanePerSampleDirFmt
–output-path demux-paired-end.qza

but the error is:

There was a problem importing test:
Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: ‘.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz’

Any help and suggestion is grateful!

Hi @Lennon_Lee,

Have you tried using the manifest file as suggested Importing paired end demultiplexed MiSeq data?

Cheers,

Thanks for reply!!

Actually, I change the name of my file in a required format like

forward:

NTPM1_4_24_L001_R1_001.fastq.gz

reverse:

NTPM1_4_24_L001_R2_001.fastq.gz

and it works!

But I still don’t know the meaning of this naming format.

I keep moving forward my analysis and it seems nothing weird, may I know whether there will be anything wrong if I import my demultiplexed pair end data with CasavaOneEightSingleLanePerSampleDirFmt format?

Thanks

Hey there @Lennon_Lee!

Us too! It is product of the casava tool typically used with Illumina sequencing platforms — this file naming convention is so popular that we made it its own special format, just to make it as easy as possible for folks to get their data into QIIME 2.

The manifest formats are much more general purpose - they let you map files to samples and declare their orientation at the same time. Technically you can import casava-formatted files this way, but it does require creation of a manifest.

TLDR: you should be fine - the casava format has nothing to do with the contents of the data, just the source file-naming convention.

Hope that helps set your mind at ease! :t_rex:

Thank you very much!!!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.