Odd PCoA from distance matrix

jbisanz · February 23, 2021, 5:19pm

Yes, therein is your problem. Assuming that the the sequencing run was set up identically for each of the 3 runs, you need to match the trim-left parameters across sequencing runs. Conceptually your problem is that when combining sequencing runs, it is done on perfect matching of the sequences, and if the length of the ends is variable, they will be treated as different sequences as below:

Run1 ..AGCGCGT...
Run2 .GAGCGCGT...
Run3 AGAGCGCGT...

Mehrbod has done a better job of explaining it here:

Unrelated issue which I see from your length distribution is the large number of features you have that are around ~250bp. I recall there is a bit of length variation in V3-V4 but I don't think that much. You may want to investigate how others are trimming and overlapping their reads for V3-V4 on MiSeq (I assume from the truncation lengths you are using).