importing sequence reads (paired end, bcl2fastq2)

anirban.mcgill · June 24, 2020, 6:09pm

Hi!
I am requesting help in a command usage

So, my reads were already demultiplexed when I received them from the sequencing center. This is what they wrote after I asked them about quality filtering steps:

Reply: <<Cluster filtering and base calling was performed with Miseq control software v. 3.1 and real time analysis software (RTA) v 1.18.54 Program bcl2fastq2 v2.20 was then used to demultiplex samples and generate fastq reads >>

A sample read is named as follows:
MI.M03992_0508.001.FLD0187.YIK_new_Day30_100_65_R2.fastq.gz
The Quality offset reported in my results is 33.

So, here's how I imported the sequence reads:
-created a manifest file named YIK-manifest (3 columns: sample-id (I supplied my own custom id), absolute filepath (the $path/name-of-read), direction (forward or reverse).
-ran the following command:
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path YIK-manifest --output-path YIK-paired-end-demux.qza --input-format PairedEndFastqManifestPhred33

Is this correct?

2 more (small) questions:
a. The sequencing center says they did not use Cassava demultiplexed format, however, the above command taken from the qiime2 tutorial page is for Cassava paired-end demultiplexed format (Importing data — QIIME 2 2018.11.0 documentation). I referred to this command because my reads are demultiplexed and paired-end reads. Instead of renaming name, I simply assigned them to sample-ids and did everything as in the Cassava format. Am I missing anything here? Did i do anything wrong (Note: I got a demux.qza output,...all the way to a taxonomy file, but just checking). Here is a screenshot of my manifest file:

b. What is the difference between phred33 and phred33V2? My command gives error message when I write phred33V2. The sequencing center said that they used MiSeq Reagent Kit v2 paired-end 250 (2 x 250bp). Is the phred33V2 linked to the v2 paired-end phrase? If so, why is it not working.

I know it's a lot of questions, I am very sorry for taking up everyone's time here! Would really appreciate if I can get a light on this! Thanks a ton.

ChrisKeefe · June 24, 2020, 6:58pm

@anirban.mcgill, by talking with your sequencing center and using the import tutorial, you're working with the right resources. It looks like you just have a few things mixed up.

You'll have a better experience if you make sure the docs you're reading match the version of QIIME 2 you're using. The link you shared in your post doesn't discuss phred33V2 formats, because they did not exist yet in November of 2018. QIIME 2 has come a long way since then, and I'd recommend using a current release (with its relevant documentation) if possible. We're on 2020.2 right now, and 2020.6 should be released within the week.
The difference between Phred33[V1] and Phred33V2 is described in the relevant docs (V1 vs V2), and is a description of the Manifest Format itself - not related to the reagent kit at all. Though V2 has replaced V1, QIIME 2 has maintained backwards compatability and will happily read a Phred33[V1] when it's told to.

Computers are notoriously stupid, and I suspect yours is producing an error either because your version of QIIME 2 is older and it doesn't know what a V2-formatted manifest is yet, or because you have written a V1-formatted manifest, and then told QIIME 2 that it's looking at a V2-formatted manifest. It took you at your word.
I suspect this isn't true:

Your command looks a lot like a Manifest Format command. Compare:

qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path se-33-manifest \
  --output-path single-end-demux.qza \
  --input-format SingleEndFastqManifestPhred33

Mistakes do happen, but you can probably trust your sequencing center when they tell you this is not Cassava-formatted. In the end, you're probably using the correct format and import syntax, and just misread the docs/mixed up your terms. As for your big-picture question - "is this correct?", I can't make any promises, but based on what you've presented here I suspect you're heading in the right direction.

Best,
Chris

anirban.mcgill · June 24, 2020, 8:28pm

Hi @ChrisKeefe

Thank you for your response. So, I looked up a few things:

I am using QIIME2 release 2019.10. I looked up the manual: it supports Phred33V2 (Importing data — QIIME 2 2019.10.0 documentation). I then proceeded to download the tutorial manifest files (pe-64-manifest) and compared the 2018.11 ver with the 2019.10. Seems both are same, except for a few spaces between columns. Is that why QIIME2 is not reading it as a V2 file, instead seeing it as a V1 file? Because the manifest is not formatted as it should be for a 2019.10 release, which is a V2 file?
Based on the 2019.10 release, my import command:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path YIK-manifest
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33V2

I see you have given a "SingleEnd" formatted command, but mine are Paired End so I have written the above. Could you please confirm if the approach is correct?

Thanks a lot for your quick reply! I wanted to make sure I understood most of the tutorial before jumping in with real data. I guess I have to re-do the analysis again!

ChrisKeefe · June 24, 2020, 9:06pm

@anirban.mcgill, please reread the Fastq Manifest Formats format description section from the

I linked above. That section explains in detail how V1 and V2 differ, and should make clear why asking QIIME2 to interpret a V1-formatted file as if it were a V2 would break things.

Using a paired-end format to describe your paired-end data is correct.

anirban.mcgill · June 24, 2020, 9:42pm

Hi @ChrisKeefe

Thanks a lot! I was able to get my pipeline running and importing my sequences. I also noticed the change in the manifest format:

sample-id<tab separated>forward-absolute-filepath<tab separated>reverse-absolute-filepath

...and it worked! QIIME read it as a Phred33V2 format. So happy!

Thank you for clearing my confusion.

A tiny question (last one i promise): Is it okay if the order of sample-ids in my metadata and manifest format differ as long as (i) all of them are present in both files, (ii) proper file path is present.

Thanks!

ChrisKeefe · June 24, 2020, 11:24pm

To the best of my knowledge, differently-ordered records in those two files are just fine. QIIME 2 uses a plugin model, so it's hypothetically possible a plugin could require that kind of alignment, but it would be pretty fragile, and I think it's unlikely to occur.

As a side note, I can empathize with your caution but want to encourage you to dive in and try things! New software can be nervewracking, but as I keep learning, it's usually best to just give it a go and see what happens. Many of your questions in this thread are probably things you could check yourself. It would be straightforward, for example, to just try working with your differently-ordered manifest and metadata, and see if QIIME 2 complains.

If you're really nervous about the perceived risks, you could run a little experiment, re-sorting the records in your manifest using spreadsheet software, importing both versions, and running some commands on the resulting data. Or use qiime demux summarize or qiime tools export to compare the resulting files and their contents... I'm sure you get the idea!

Bugs do happen, but QIIME 2 is designed to be robust to many common user errors, and usually gives good feedback when we do something it doesn't like. And if you run into any more errors you can't puzzle out, you can always post them here.

Best,
Chris

system · July 26, 2020, 5:24am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.