Doing taxonomy analysis and getting abundancies with manifests

ebolyen · October 4, 2017, 10:11pm

Yup! FASTQ compresses very well, we store the files as fastq.gz and then the .qza format is actually a zip file also, so getting a smaller size is pretty expected.

You are actually running denoise-single so overlap doesn't matter as it's only using your forward reads at the moment. (You can use denoise-paired to have them be merged). Looking at my last reply, it looks like I gave you the wrong method. Sorry!

As for why it is so small, I would suggest running feature-table summarize on your table to check on that. Generally we expect your table and rep-seqs to be much smaller than your raw sequence data. While a few Kb does sound unusually small, it's not impossible (our tutorial data is ~50kb for instance, but it's heavily subsampled to run quickly).

You may get better results if you trim your reads a little bit more before denoising. Check out the moving pictures tutorial which provides some justifications for trimming its example data-set.