Beta diversity group significance box plot

(sreesankar) #1

I ran the qiime diversity beta-group-significance command in qiime-2018.11 I have two groups extraction methods and primers. The boxplot produced through this command is giving distance within the same group. I actually want to compare the extraction methods and primers and show the distances between two groups. I went through the qiime longitudinal pairwise-distances tutorial but there we require specific state and studyid values. I don’t have those values for my data. is there any way I can create a distance box plot calculating the distance based on two groups using weighted unifrac distance matrix.


(Justine) #2

Hi @sree,

I think qiime longitudinal pairwise-distances is probably still your best bet here. You may need to fidget your metadata, though.

Let’s say you want to compare the effect of primer on the distance between your two extraction techniques. Then, you “state” variable will be the extraction column, where state-1 is the first method, and state-2 is the second. Your “group” column will be the primer. And then, you’ll pass a column with original sample the identifier for the two technical replicates.


(sreesankar) #3

Dear Justine,
Thank you for the suggestion. As per your suggestion I used “state” variable as the extraction method where state - 1 is method 1 and state-2 is method2 and my group column as the primer. I gave the studyid the name of the samples, that is both my SampleID and StudtyID are same. Following was the command I used:

qiime longitudinal pairwise-distances --i-distance-core-metrics-results/weighted_unifrac_distance_matrix.qza --m-metadata-file.tsv --p-group-column 16S_Regions --p-state-column ExtractionMethod --p-state-1 Swab_DNA --p-state-2 Soil_DNA --p-individual-id-column StudyID --p-replicate-handling random --o-visualization core-metrics-results/pairwise-distance-primer-extractionmethod.qzv

After running this command when opening the qzv file I am not getting a boxplot figure. The resulting figure is blank with my variables labelled. Should I post this in the technical section as a new post? Could you please help me to resolve this problem.

Is there anyway that I could export the unifrac weighted distance matrix to a csv. If this option is there then it will be very helpful.

I have attached the unifrac_weighted_distancematrix, qzv file and metadata here

weighted_unifrac_distance_matrix.qza (151.5 KB)
metadata.tsv (1.3 KB)
pairwise-distance-extraction_method-16S_Regions.qzv (352.3 KB)


(Justine) #4

Hi @sree,

Based on your metdata, you’re not pairing the samples properly. So, I think I would add three more columns using your StudyID column.

#SampleID ExtractionMethods 16S_Regions StudyID base_sample sample_region sample_kit
#q2:types categorical categorical categorical categorical catgorical categorical
GSX_1_V1V3a_Swab Swab_DNA V1-V3 GSX_1_V1V3a_Swab GSX_1 GSX_1_v13 GSX_1_swab
GSX_1_V3V4_Swab Swab_DNA V3-V4 GSX_1_V3V4_Swab GSX_1 GSX_1_v34 GSX_1_swab
GSX_4_V1_V3_Soil Soil_DNA V1-V3 GSX_4_V1_V3_Soil GSX_1 GSX_1_v13 GSX_1_soil
GSX_4_V3_V4_Soil Soil_DNA V3-V4 GSX_4_V3_V4_Soil GSX_1 GSX_1_v34 GSX_1_soil
HYX_1_V1V3a_Swab Swab_DNA V1-V3 HYX_1_V1V3a_Swab HYX_1 HYX_1_v13 HYX_1_swab

Then, rather than pairing on the StudyID column, I would pair on the sample_region column.


(sreesankar) #5

Dear Justine,
Thank you for the reply. I have modified the metadata based on your suggestion. But still the output does not produce boxplot nor the statistical results. I really couldn’t figure out the problem. If there is any way I could download the distance matrix data in a csv format that will be very helpful. The following commands were used:
qiime longitudinal pairwise-distances
–i-distance-matrix final_merged-soil-swab-v1-v3-v3-v4-core-metrics-results/weighted_unifrac_distance_matrix.qza
–m-metadata-file metadata-merged_allwe-gene-final.tsv
–p-group-column 16S_Regions
–p-state-column ExtractionMethods
–p-state-1 Soil_DNA
–p-state-2 Swab_DNA
–p-individual-id-column Sample_Region
–p-replicate-handling random
–o-visualization final_merged-soil-swab-v1-v3-v3-v4-core-metrics-results/extraction_vs_16S_regions-pairwise-distances.qzv

Attached is the updated metadata and qzv file
metadata.tsv (1.9 KB)
extraction_vs_16S_regions-pairwise-distances.qzv (352.4 KB)

(Nicholas Bokulich) #6

You are not using an appropriate individual ID column. That column should specify the precise sample that you analyzed using two different methods. A paired method like this will only be appropriate if you are examining identical samples processed using your different methods — it is not clear if that is the case. If it is, you will need a separate metadata column that indicates which unique individual subject each sample came from — and each value should be found twice: once at state 1 and once at state 2.

Given the metadata that you have now, I would recommend instead using qiime diversity beta-group-significance as you originally planned. Simply make a new metadata column that concatenates your two columns of interest. You wish to compare across 16S regions AND extraction methods, so you can just make a new column that concatenates this information so you have the following groups:


Then your barplot will compare those 4 groups.

Good luck!

(sreesankar) #7

Dear Nicholas,
Thank you for the suggestion. Instead of grouping the primers and extraction method, grouping of samples in specific category was the appropriate way for pair comparison, which worked fine. I have a suggestion, along with the PCoA values if we could also download the distance matrix of any analysis we perform, it will be a great help. With that matrix I could generate my own plot in R using different R functions

(Nicholas Bokulich) #8

You can use qiime tools export to export any data from QIIME 2, including distance matrices.

(sreesankar) #9

Ok. Thank you very much.