Import sequence data that contains R1 , R2, and the corresponding barcode

I am a first time user. I have 16s sequence data in this format:

LZ_CA01_S210_l1_001.fastq.gz
LZ_CA01_S210_R1_001.fastq.gz
LZ_CA01_S210_R2_001.fastq.gz

What is the best way to import these sequence data into qiime 2?

Thank you.
EZ

Hi @zandi,
Have you had a chance to check out our tutorials? The importing tutorials describe how to import and use various common sequence data types.

Welcome! You may also find other sections of the QIIME 2 docs useful to help you get started

Hi all, I had the same questions, you can check my thread:

Thank you for your help. I believe this what every one does before asking for help!
Yes I did.
The barcode file is what confused me! There is no mention in the tutorial of how to deal with the barcode file, when you have the R1 and R2 for each file.

Hi I think we have the exact same questions. I think that our data fit “Casava 1.8 paired-end demultiplexed fastq” in the importing guide. I am a new user too. Not sure if this is correct. Still working on it!

Correct! If your filenames have that same format, it is most likely CASAVA 1.8. The most important feature is if they are actually already demultiplexed (i.e., there are separate files for each sample).

Usually no barcode file is given if the samples are already demultiplexed, so this is a little unusual.

So you have 3 files per sample? Sounds like CASAVA 1.8

It sounds like you probably just have an extra index file per sample. You can just set those aside for now and import the R1 and R2 reads as CASAVA 1.8

Let me know if that works!

1 Like

I am doing a test with two R1 and R2 fasq.gz files but getting an error message:

qiime tools import --type ‘SampleData[PairedEndSequencesWithQuality]’ --input-path casava-18-paired-end-demultiplexed --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-paired-end.qza
There was a problem importing casava-18-paired-end-demultiplexed:

Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: ‘.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz’

These are the two files for which I am getting the error message:

Zandi_CABA01_S210_R1_001.fastq.gz Zandi_CABA01_S210_R2_001.fastq.gz

Every thing looks ok to me!

Any suggestions?

Looks like the problem is the third underscore-delimited field in your filenames.

Looks like standard Casava should be something like L210. Instead, you have, e.g,. S210.

I'm not really sure whether L vs. S has any importance, but for practical purposes here it does not matter.

If you rename your files to replace the S with an L in that third field, that error should go away.

Let me know if that works!

1 Like

Thank you for the tip! Never thought that would be an issue!

I changed the S to L and it works now!

Maybe this can be fixed in qiime 2.

Hi @zandi!

:champagne: :confetti_ball: :tada:

This isn't "broken" in QIIME 2 --- what @Nicholas_Bokulich was saying is that your filenames aren't actually named following the CASAVA format (they are really really really close, but not quite there).

The CASAVA import format is provided as a convenience in QIIME 2, since so many users have data formatted like this. In case your data isn't formatted this way though, we offer a general-purpose manifest format that has no filename naming requirements. I suspect that if the filename fixes @Nicholas_Bokulich identified above were any more involved he would've recommended just using the manifest format, instead. Hope that helps!
:qiime2: :t_rex:

2 Likes

It’s not a bug, it’s a feature.

1 Like

Hah! Just to clarify, that was not my intent in the post. Allow me to rephrase:

  • QIIME 2 didn’t define the CASAVA spec (it is part of the Illumina family)
  • QIIME 2 offers a “convenience” format for users that have CASAVA-formatted data. This works within the limits that CASAVA is known to be specified in. The spec isn’t detailed, though.
  • @zandi did not have CASAVA-formatted data.
  • By reformatting, @zandi made their data CASAVA-formatted.
  • If reformatting wasn’t an option, this whole situation could’ve been avoided by using a general-purpose “manifest format”.
  • The only reason I commented was to clarify that there wasn’t a bug related to QIIME 2 with respect to this, but rather, a mismatch between @zandi’s data and the CASAVA format (which again, isn’t a QIIME 2 creation).

Thanks!

Thank you so much! I figured what is wrong. My samples were sent to two different facilities to be sequenced. so the data were a lot different. I found the barcode sequences eventually. Thanks a lot!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.