.qza file small compared to input

Hello, I installed qiime2 (within miniconda2) for the first time yesterday to run dada2 on illumina paired-end reads.
Installation:
wget https://data.qiime2.org/distro/core/qiime2-2019.4-py36-linux-conda.yml

As far as I can see the installation went fine, and I have successfully (I think) imported my data as .qza, both as paired end reads, and as reads previously merged with vsearch outside qiime2. Here are the commands used:

paired end import
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path /home/lm/AMBER-136912793/FASTQ_Generation_2019-06-14_12_38_21Z-188329309/paired-end
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path /home/lm/AMBER-136912793/FASTQ_Generation_2019-06-14_12_38_21Z-188329309/demux-paired-end.qza

The paired-end import was succesful, I think, as it generated a 3.1GB .qza file. However, when I ran vsearch from within qiime

qiime vsearch join-pairs
--i-demultiplexed-seqs /home/lm/AMBER-136912793/FASTQ_Generation_2019-06-14_12_38_21Z-188329309/demux-paired-end.qza
--o-joined-sequences /home/lm/AMBER-136912793/FASTQ_Generation_2019-06-14_12_38_21Z-188329309/demux-joined.qza

The merged output file (demux-joined.qza) is only 1.4 Mb, from a 3.1GB input (demux-paired-end.qza, whereas total merged files generated by vsearch run outside qiime were 2.5 Gb in size. I ran it a second time with --verbose, but a message stated --verbose: command not found. The command run successfully with no error messages and the size was still 1.4 Mb.

merged import
I decided to import the merged files directly using a manifest file
qiime tools import
--input-path /home/lm/AMBER-136912793/FASTQ_Generation_2019-06-14_12_38_21Z-188329309/merged/all_filenames.tsv
--output-path /home/lm/AMBER-136912793/FASTQ_Generation_2019-06-14_12_38_21Z-188329309/merged/AMBER-merged-demux.qza
--type SampleData[JoinedSequencesWithQuality]
--input-format SingleEndFastqManifestPhred33

And I got a 121 Mb file (from a 2.5 Gb input), which again seems to be smaller than expected. It also differs in size from the 1.4 demux-joined.qza file generated by running vsearch within qiime.

I tried to see if I could get some stats on the .qza file:
qiime demux summarize
-–i-data AMBER-merged-demux.qza
-–o-visualization AMBER-merged-demux.qzv
--verbose

And I got an error message:

image

-- no such option.

I tried to find information on .qza file size in documentation and forum, but could not find anything specific. It would be great to get some pointers to understand if there is a problem or not. I am using ubuntu 18.04.2, on a desktop computer rather than server. Wondering whether it might not have the necessary speciications and crashing producing an incomplete file without reporting an error. Having said that, it coped with running vsearch, so maybe that's not the case!

A few additional comments:

  1. Adding --verbose has never worked in any of the commands above.
  2. The documentation suggests that the manifest file should be tab separated, but the import tool only accepted comma separated.

Many thanks!!

Hello @Lucio!

Sounds to me like there might be issues with sufficient read overlap. Do you need to join your reads? If you are planning on using q2-dada2, read joining is performed during denoising. You can also proceed with just forward or reverse reads, too.

This message is your shell reminding you that you forgot to escape the line before the --verbose line:

# bad
qiime vsearch join-pairs \
  --i-demultiplexed-seqs foo.qza \
  --o-joined-sequences bar.qza
  --verbose

# good
qiime vsearch join-pairs \
  --i-demultiplexed-seqs foo.qza \
  --o-joined-sequences bar.qza \
  --verbose

Sounds like maybe your manifest file has some technical issues - can you share?

Looks like you have two different types of dashes in there (long and short). Always use the dash key on your keyboard, if you can (and avoid copying and pasting commands into programs like Microsoft Office, those tools will replace normal dashes with fancy dashes automatically).

Makes sense, there isn't much to say here --- everyone's data is different.

I think this is unlikely.

Please see my suggestion above (also, you can just write your commands all on one line --- this isn't a QIIME 2 thing, just a shell thing).

Looks like you might be mixing up versions of documentation. We added the TSV manifest in 2019.4, and that is now the recommended default. Prior to 2019.4, we only supported CSV manifests.

Keep us posted!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.