analysis of metagenomic of fish microbiome

hello everyone,
I am trying to do metagenomic analysis of fish using qiime2-2022.2.
the analysis was already done by the sequencing and I wanted to use the raw data to analyze it using qiime2:
I have two types of sample (egg and alevin) and for each sample we have paired-end sequences fastq files with barcode included in the header.
so I imported the data using manifest file as follow:

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest.tsv \
  --output-path paired-end-demux.qza \
  --input-format PairedEndFastqManifestPhred33V2

then summarize using:

qiime demux summarize \
      --i-data paired-end-demux.qza \
      --o-visualization demultiplexed-sequences-summ.qzv

when I used the qiime-view I got the following results:
then I used DADA2 for denoising:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs paired-end-demux.qza \
  --p-trunc-len-f 290 \
  --p-trim-left-r 1 \
  --p-trunc-len-r 291 \
  --o-representative-sequences asv-sequences-0.qza \
  --o-table feature-table-0.qza \
  --o-denoising-stats dada2-stats.qza

then for summary statistics:

qiime metadata tabulate \
  --m-input-file dada2-stats.qza \
  --o-visualization dada2-stats-summ.qzv

I used qiime-view for visualization:
and I compared with the statistics summary provided by the company:

so my question is why the huge difference between my analysis and company one?

Hello @adamos1945,

Welcome to the forums! :qiime2:

If the company processed the data differently, say using different settings for dada2 denoise-paired, we would expect the resulting counts to change.

Did they also use Qiime2 to process these files? If so, we can look at all the settings they used and compare these to your settings to see what changed. If they didn't use Qiime2, did they provide details about their analysis workflow?


thanks a lot for your answer.
the company didn't provide the detailed setting for their analysis, they provided only this one
so I can't know which setting they used.
for trunc-len-f and trunc-len-r, do u think are the correct ones?

may be due to denoising process, they could have chosen another region to make the cuts, you chose 290/291 why?
Yry to make the same with anoter values 280/240

And I have a question, do you have your raw data like this?

Im trying to make some similar with inscects microbiome

i chose 290/291 because of quality score drops starting from 270:

for the raw data, i have same format as yours:

i tried another values 240/241 but had same results

Good afternoon,

Well, that's a start! Those are available as Qiime2 plugins, like you have discovered, so it should be possible to replicate their analysis.

Could you reach out to the company asking for more details? May I ask which company processed your data?

Looks like most of your reads do not pass the first quality filter, unlike their analysis. This is because --p-max-ee-f and --p-max-ee-r are set at 2, and your trimming at 290 leaves a lot of low quality reads, which don't pass that Expected Error filter.

Try trimming shorter
--p-trunc-len-f 250
--p-trunc-len-r 200
the best option is probably to trim as short as possible (to remove errors near the end) while still long enough so they can overlap and join.

Try some new settings and see what works!

1 Like

thanks for your reply.
concerning the company who did the sequencing and analysis it was Macrogen.
so i will contact them tomorrow and ask them about the parameters used for DADA2 denoising.
I tried the suggest parameter:
--p-trunc-len-f 250
--p-trunc-len-r 200
but i got the following results:

So the number of reads that pass the filter is much higher (good!) but now the area of overlap is too small and they can't merged (bad).

So so 290 f 291 r is too long, and 250 f 200 r is too short.

Try some other settings and see if you can find a happy medium that lets you preserve more of your reads.

1 Like

thank for your reply and sorry for my late reply. I got the parameters used by the company and it was a fellow:

qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux.qza \
--p-trunc-len-f 250 \
--p-trunc-len-r 200 \
--o-representative-sequences asv-sequences-0.qza \
--o-table feature-table-0.qza \
--o-denoising-stats dada2-stats.qza 

and then got the following:

Good morning!

Thanks for confirming the settings used by the company. It looks like using the same settings produces similar results, which makes sense.

These settings also also cause the same issue: when truncating at 250 and 200, these reads are too short to join.

My suggestion is the same: try some settings in between and see what is the highest 'percent of input merged' you can achieve!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.