Procrustes Analysis on Re-sequenced Samples

Zach_Burcham · January 30, 2019, 9:15pm

Hello,

I have a bit of a weird situation where I had to re-sequence some of my samples due to some controls being positive on that particular plate, but then later found out that the original plate was actually labelled incorrect and my original sequences should be fine as long as I correct the label. I want to use my original samples in my analysis instead of the re-ran samples because the re-sequenced samples have a batch effect.

Now since my re-sequenced samples should show high similarity to the label corrected original sequences, I wanted to show that the label correct samples are closer to the re-sequenced samples than before the label correction as a way to show the problem was in fact just a problem with the label. I thought a good way of doing this would be to create two procrustes analysis plots one with the PCOA from the re-sequenced samples (reference) vs. PCOA of incorrect labels and the other plot with the PCOA from the re-sequenced samples vs. PCOA of correct labels to show the corrected labels are closer related to the re-sequenced than the incorrect labeled.

I have created distance matrices using the stand alone qiime diversity beta-phylogenetic so I could avoid using a sampling depth that could create mismatches of the samples in my matrices and then used those matrices to create pcoa plots. But when I try the qiime diversity procrustes-analysis I get an error message saying:

Plugin error from diversity:

The ordinations represent two different sets of samples

How does procrustes check my sample sets to determine they are different? And can I get it to make this comparison?

jwdebelius · January 31, 2019, 1:22pm

Hi @Zach_Burcham,

Your error message is saying that your sample labels are different! For procrustes to work correctly, you need to have the same labels going into the distance matrix so it can match the samples. So, double-check the way you've named things, and try again.

Im not sure if you can re-label on the fly in Qiime2 for situations like this (cc'ing @thermokarst, @Nicholas_Bokulich @:qiime2:_genius_squad). I could give suggestions to handle it in python directly, though, if that might be of help...

Best,
Justine

thermokarst · January 31, 2019, 1:33pm

You can't relabel the ids on the distance matrix, but you can relabel IDs on the feature table, using the feature-table group command. Add a new column to your metadata file with the new sample ids, then run this command. Once you have the relabeled table, replace the ID column in your metadata file with the "new" column you created before relabeling.

Zach_Burcham · January 31, 2019, 5:35pm

Hi @jwdebelius and @thermokarst ,

Thank you both for the response. After looking deeper into the feature tables there were a couple samples that didn't match up between the two which was causing the error. Should have noticed that one!

As for the renaming, we believe the sample plate was rotated 180 degrees at some point (we had empty well blanks in the bottom right and the top left is coming back as empty for sequences while the bottom right is a strong positive, yikes!). So I am going back and reassigning the barcodes to the samples as if the plate was rotated and then checking if the sample composition resembles those that were re-sequenced so we can just use the original samples with a different demultiplex. Thanks for the help!

jwdebelius · January 31, 2019, 6:47pm

@Zach_Burcham, Im so sorry, I hope you figure it out. That is one of my nightmares

Zach_Burcham · February 8, 2019, 7:34pm

Hi @jwdebelius, I think I pretty much have what I need, but would like to compare the actual distances between the new and old procrustes. Is there a way to get the distance between the samples from the procrustes plot? (i.e. the length of the bar connecting the two points in the image)

Thanks!

jwdebelius · February 8, 2019, 9:31pm

Hi @Zach_Burcham,

Im not sure! Ive never tried this. You could try exporting the coordinates from the qza artifact, and then maybe trying to calculate a Euclidean distance.

You could maybe also try a mantel test if you have a set of samples where you know the IDs are correct and a set where you think they're flipped. You get a correlation coefficient out, which can be nice?

Best,
Justine

Zach_Burcham · February 8, 2019, 10:19pm

Thanks for the reply @jwdebelius, I exported one of transformed references and got a list of my samples like this:

11293.C.0787 -0.0442895295269168 -0.008628451758110188 0.0019121630556371686 0.022107042418814005 -0.02578787170841776

The first item is my sample name, then are the next coordinated for 5 dimensions? If I wanted to compare this with the same sample in the other coordinate list would I just calculate the x,y,z 3 dimension elucidiean distance against the same sample in the other file? Or will I need to look into doing a 5D measurement?

Zach_Burcham · February 9, 2019, 3:26pm

I actually figured it out with the 5D distance calculation using dist in R. Thanks!