Yup! FASTQ compresses very well, we store the files as fastq.gz
and then the .qza
format is actually a zip file also, so getting a smaller size is pretty expected.
You are actually running denoise-single
so overlap doesn't matter as it's only using your forward reads at the moment. (You can use denoise-paired
to have them be merged). Looking at my last reply, it looks like I gave you the wrong method. Sorry!
As for why it is so small, I would suggest running feature-table summarize
on your table to check on that. Generally we expect your table and rep-seqs to be much smaller than your raw sequence data. While a few Kb does sound unusually small, it's not impossible (our tutorial data is ~50kb for instance, but it's heavily subsampled to run quickly).
You may get better results if you trim your reads a little bit more before denoising. Check out the moving pictures tutorial which provides some justifications for trimming its example data-set.