Hello @thermokarst
for the salmon and wild fish dataset, this is what i did:
#import three sequencing runs
qsub -q batch HPC-1b.sh
qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path M1_251bp_salmon_wild_manifest
–output-path single-end-demuxM1_251bp_salmon_wild.qza
–input-format SingleEndFastqManifestPhred33
qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path M2_151bp_salmon_wild_manifest
–output-path single-end-demuxM2_151bp1_salmon_wild.qza
–input-format SingleEndFastqManifestPhred33
qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path M3_151bp_salmon_wild_manifest
–output-path single-end-demuxM3_151bp_salmon_wild.qza
–input-format SingleEndFastqManifestPhred33
#denoised separately
qsub -q batch HPC-3.sh
qiime dada2 denoise-single
–i-demultiplexed-seqs single-end-demuxM1_251bp_salmon_wild.qza
–p-trim-left 12
–p-trunc-len 150
–p-n-threads 8
–o-table tableM1_251bp_salmon_wild.qza
–o-representative-sequences rep-seqsM1_251bp_salmon_wild.qza
–o-denoising-stats denoising-statsM1_251bp_salmon_wild.qza
qiime dada2 denoise-single
–i-demultiplexed-seqs single-end-demuxM2_151bp1_salmon_wild.qza
–p-trim-left 12
–p-trunc-len 150
–p-n-threads 8
–o-table tableM2_151bp1_salmon_wild.qza
–o-representative-sequences rep-seqsM2_151bp1_salmon_wild.qza
–o-denoising-stats denoising-statsM2_151bp1_salmon_wild.qza
qiime dada2 denoise-single
–i-demultiplexed-seqs single-end-demuxM3_151bp_salmon_wild.qza
–p-trim-left 12
–p-trunc-len 150
–p-n-threads 8
–o-table tableM3_151bp_salmon_wild.qza
–o-representative-sequences rep-seqsM3_151bp_salmon_wild.qza
–o-denoising-stats denoising-statsM3_151bp_salmon_wild.qza
#visualise
qsub -q batch HPC-4.sh
qiime metadata tabulate
–m-input-file denoising-statsM1_251bp_salmon_wild.qza
–o-visualization denoising-statsM1_251bp_salmon_wild.qzv
qiime metadata tabulate
–m-input-file denoising-statsM2_151bp1_salmon_wild.qza
–o-visualization denoising-statsM2_151bp1_salmon_wild.qzv
qiime metadata tabulate
–m-input-file denoising-statsM3_151bp_salmon_wild.qza
–o-visualization denoising-statsM3_151bp_salmon_wild.qzv
qiime feature-table tabulate-seqs
–i-data rep-seqsM1_251bp_salmon_wild.qza
–o-visualization rep-seqsM1_251bp_salmon_wild.qzv
qiime feature-table tabulate-seqs
–i-data rep-seqsM2_151bp1_salmon_wild.qza
–o-visualization rep-seqsM2_151bp1_salmon_wild.qzv
qiime feature-table tabulate-seqs
–i-data rep-seqsM3_151bp_salmon_wild.qza
–o-visualization rep-seqsM3_151bp_salmon_wild.qzv
qiime feature-table summarize
–i-table tableM1_251bp_salmon_wild.qza
–o-visualization tableM1_251bp_salmon_wild.qzv
–m-sample-metadata-file salmon_wild_metadata.csv
qiime feature-table summarize
–i-table tableM2_151bp1_salmon_wild.qza
–o-visualization tableM2_151bp1_salmon_wild.qzv
–m-sample-metadata-file salmon_wild_metadata.csv
qiime feature-table summarize
–i-table tableM3_151bp_salmon_wild.qza
–o-visualization tableM3_151bp_salmon_wild.qzv
–m-sample-metadata-file salmon_wild_metadata.csv
for the salmon only data I did this:
#import
qsub -q batch HPC-1b.sh
qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path M2_151bp1_salmononly_manifest
–output-path single-end-demuxM2_151bp1_salmononly.qza
–input-format SingleEndFastqManifestPhred33
qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path M3_151bp_salmononly_manifest
–output-path single-end-demuxM3_151bp_salmononly.qza
–input-format SingleEndFastqManifestPhred33
#denoise separately
qsub -q batch HPC-3.sh
qiime dada2 denoise-single
–i-demultiplexed-seqs single-end-demuxM2_151bp1_salmononly.qza
–p-trim-left 12
–p-trunc-len 150
–p-n-threads 8
–o-table tableM2_151bp1_salmononly.qza
–o-representative-sequences rep-seqsM2_151bp1_salmononly.qza
–o-denoising-stats denoising-statsM2_151bp1_salmononly.qza
qiime dada2 denoise-single
–i-demultiplexed-seqs single-end-demuxM3_151bp_salmononly.qza
–p-trim-left 12
–p-trunc-len 150
–p-n-threads 8
–o-table tableM3_151bp_salmononly.qza
–o-representative-sequences rep-seqsM3_151bp_salmononly.qza
–o-denoising-stats denoising-statsM3_151bp_salmononly.qza
#visualise
qsub -q batch HPC-4.sh
qiime metadata tabulate
–m-input-file denoising-statsM2_151bp1_salmononly.qza
–o-visualization denoising-statsM2_151bp1_salmononly.qzv
qiime metadata tabulate
–m-input-file denoising-statsM3_151bp_salmononly.qza
–o-visualization denoising-statsM3_151bp_salmononly.qzv
qiime feature-table tabulate-seqs
–i-data rep-seqsM2_151bp1_salmononly.qza
–o-visualization rep-seqsM2_151bp1_salmononly.qzv
qiime feature-table tabulate-seqs
–i-data rep-seqsM3_151bp_salmononly.qza
–o-visualization rep-seqsM3_151bp_salmononly.qzv
qiime feature-table summarize
–i-table tableM2_151bp1_salmononly.qza
–o-visualization tableM2_151bp1_salmononly.qzv
–m-sample-metadata-file salmon_metadata.csv
qiime feature-table summarize
–i-table tableM3_151bp_salmononly.qza
–o-visualization tableM3_151bp_salmononly.qzv
–m-sample-metadata-file salmon_metadata.csv
When I looked at tableM3_151bp_salmononly.qzv and tableM3_151bp_salmon_wild.qzv, I see that some samples have different feature counts in each table. For instance, for sample 111 (a salmon sample), there were 5529 features in the tableM3_151bp_salmononly.qzv and 5472 features tableM3_151bp_salmon_wild.qzv. I double checked my metadata and confirmed that there is only one sample 111. There is no problem of merging two samples together.
My guess is that feature count is affected by number of samples denoised together.
I denoised run 3 by using the previously mentioned script:
qiime dada2 denoise-single
–i-demultiplexed-seqs single-end-demuxM3_151bp_salmon_wild.qza
–p-trim-left 12
–p-trunc-len 150
–p-n-threads 8
–o-table tableM3_151bp_salmon_wild.qza
–o-representative-sequences rep-seqsM3_151bp_salmon_wild.qza
–o-denoising-stats denoising-statsM3_151bp_salmon_wild.qza
and
qiime dada2 denoise-single
–i-demultiplexed-seqs single-end-demuxM3_151bp_salmononly.qza
–p-trim-left 12
–p-trunc-len 150
–p-n-threads 8
–o-table tableM3_151bp_salmononly.qza
–o-representative-sequences rep-seqsM3_151bp_salmononly.qza
–o-denoising-stats denoising-statsM3_151bp_salmononly.qza
I suspect that having more samples denoised together (ie instead of just denoising salmon data, i am denoising salmon+wildfish data together) affects the final feature count. Perhaps dada2 removed more as noise when more samples are denoised together? This is just my suspicion. Is it possible?
basically, my question is:
why does adding wild samples to denoise together with salmon samples affect the individual salmon sample feature count (compared to when denoising the salmon samples only)?
I hope I managed to make myself understandable. Thank you for your time!