Hello
I am using qiime2-2019.10 in conda environment installed in HPC of the university.
I am analyzing fecal microbiome data obtained from the 4 plates sequenced in Illumina MiSeq in two sequencing runs (each run included 2 plates samples i.e. 96 X 2 =192). Initially I used dada2 quality filtering with the following commands for both the runs separately:
qiime dada2 denoise-paired
--i-demultiplexed-seqs run1.qza
--p-trim-left-f 13
--p-trim-left-r 12
--p-trunc-len-r 151
--p-trunc-len-f 150
--o-denoising-stats dada-denoise-stats_run1.qza
--o-table dada_table_run1.qza
--o-representative-sequences dada-rep-seqs_run1.qza
qiime dada2 denoise-paired
--i-demultiplexed-seqs run2.qza
--p-trim-left-f 13
--p-trim-left-r 13
--p-trunc-len-r 150
--p-trunc-len-f 150
--o-denoising-stats dada-denoise-stats_run2.qza
--o-table dada_table_run2.qza
--o-representative-sequences dada-rep-seqs_run2.qza
and the I merged the output (the run information was incorporated in the metadata file) and performed the beta diversity analysis which shown two strong clusters based on the sequencing runs and I assumed that the two sequencing runs have a very strong batch effect. Later I reanalyzed the data with the exact same filtering and truncation parameters using the dada2 for both sequencing runs followed by merging the output using command 'qiime feature-table merge', and the results were drastically different. further when I performed the beta diversity analysis and the clustering based on the sequencing run was totally lost. So I wanted to know whether 2-3 base differences in trimming can cause such a huge difference in the outcome?
bray_curtis_20000_Seq.run_anosim(1).qzv (1.0 MB) bray_curtis_unifrac_Seq.run_20000_ADONIS(1).qzv (310.6 KB) bray_curtis_20000_Seq.run_permanova(1).qzv (1.0 MB)
Further to get answers whether the clustering in PCoA is significant or not, I performed the beta-diversity significance test for later analysis using 'beta-group-significance' command using both PERMANOVA (pseudo-F=1.02031, p-value=0.382) and ANOSIM (R=0.000258745,p-value=0.369). So if I am getting it correct then the result says that there is no significant variation or similarity; but the R2-value =1.000000e+00 and P=1.0 in adonis is very high and indicate (if I am getting it right) that there is a strong influence of sequencing run on the data distribution, or I am misinterpreting the data completely?
I hope my questions are clear.
Thank you.