qiime2 demultiplexing gives very little reads

Hi,

I'm running qiime2 to demultiplex data. I have a forward.fastq.gz file, a reverse.fastq.gz and a barcode.fastq.gz file. So the syntax I used is as below:

qiime tools import
--type EMPPairedEndSequences
--input-path emp-paired-end-sequences
--output-path emp-paired-end-sequences.qza

then:

qiime demux emp-paired
--i-seqs emp-paired-end-sequences.qza
--m-barcodes-file sample-metadata.tsv
--m-barcodes-column barcode-sequence
--p-rev-comp-mapping-barcodes
--o-per-sample-sequences demux.qza
--o-error-correction-details demux-log.qza

then:

qiime tools extract --input-path demux.qza --output-path .

After this step I got per sample .fastq files as I expected. However, the size of each fastq file were smaller than the files that have been demultiplexed with qiime1 2 years ago. Below is the file size comparison:

old results:

New results:

Any suggestion on this issue?

Thanks!
Leran

Hi @Leran,
Looks like your original files are fastq files and the new files are compressed fastq files (.gz extension). I believe the reason why the new files are smaller is because the old files were not compressed.

Thanks for your reply @Cherman2! I think you are right and I unzipped the fastq.gz files. The sizes did improved but still not the same with before. I also checked the reads, I took one sample as an example, as shown below:

Old file:

New file:

So the number of reads of the old fastq file is bigger than the number of reads of the new fastq file.

Another thing I have noticed that I'm not sure if it could be related to this. When I ran the "qiime demux emp-paired" step, it takes very long time and then I would received a "broken pipe" notification. When I check the output files, the demux.qza file is there. I'm not sure if this file can be generated even if the process was not fully finished. So I never know if the process has finished or not.

Could the less reads issue be caused by the incompleteness of this step?

Thanks!
Leran

Hi @Leran,
Alright! Glad unzipping helped alittle!
Now lets try to figure out why this would be different between these two runs!

  1. Did you run any additional filtering in the "new analysis" that you didn't run in the old analysis?
  2. Do you have both commands that you ran to get these outputs?

I have a feeling that it is a differences in the amount of barcode errors that are allowed or something to do with golay error correction.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.