Okay, so if I'm understanding correctly, there's mismatched names between the microbe feature table (.qza) and my metadata? Specifically, the taxonomy names don't match the provided IDs? If that understanding is correct, is this something I can fix in the metadata tables themselves or will I need to redo some of the mmvec analyses I've done so far? I thought I had identified and fixed a mismatched name issue previously, but perhaps that just made things worse.
Here's what my metadata files look like when checking the first few lines:
$ head taxonomy_v2.tsv
Feature ID Taxon 16S_ID Kingdom Phylum Class Order Family Genus Species
Parse1 Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella Prevotella Bacteria Bacteroidetes Bacteroidia Bacteroidales Prevotellaceae Prevotella
Parse2 Bacteria;Firmicutes;Clostridia;Clostridiales;Ruminococcaceae;Ruminococcus_g2 Ruminococcus_g2 Bacteria Firmicutes Clostridia Clostridiales Ruminococcaceae Ruminococcus_g2
Parse3 Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella Prevotella Bacteria Bacteroidetes Bacteroidia Bacteroidales Prevotellaceae Prevotella
Parse4 Bacteria;Firmicutes;Clostridia;Clostridiales;Ruminococcaceae;Faecalibacterium Faecalibacterium Bacteria Firmicutes Clostridia Clostridiales Ruminococcaceae Faecalibacterium
Parse5 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae Enterobacteriaceae Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enterobacteriaceae
Parse6 Bacteria;Spirochaetes;Spirochaetes_c;Spirochaetales;Spirochaetaceae;Treponema Treponema Bacteria Spirochaetes Spirochaetes_c Spirochaetales Spirochaetaceae Treponema
Parse7 Bacteria;Firmicutes;Erysipelotrichi;Erysipelotrichales;Erysipelotrichaceae;Coprobacillus Coprobacillus Bacteria Firmicutes Erysipelotrichi Erysipelotrichales Erysipelotrichaceae Coprobacillus
Parse8 Bacteria;Proteobacteria;Gammaproteobacteria;Aeromonadales;Succinivibrionaceae;Succinivibrio Succinivibrio Bacteria Proteobacteria Gammaproteobacteria Aeromonadales Succinivibrionaceae Succinivibrio
Parse9 Bacteria;Spirochaetes;Spirochaetes_c;Spirochaetales;Spirochaetaceae;Treponema Treponema Bacteria Spirochaetes Spirochaetes_c Spirochaetales Spirochaetaceae Treponema
$ head metabolite-metadata-v2.txt
sampleid m/z RT Adduct Compound_Name MassDiff ATTRIBUTE_16s_ID ATTRIBUTE_16S_shortID ATTRIBUTE_Country ATTRIBUTE_IndusScore ATTRIBUTE_NumIndusScore ATTRIBUTE_OriginalFileName ATTRIBUTE_Population ATTRIBUTE_SampleID charge cluster index componentindex Compound_Source Data_Collector GNPSGROUP:BurkinaFaso GNPSGROUP:Guayabo GNPSGROUP:Isolated Traditional (4) GNPSGROUP:Matses GNPSGROUP:Norman GNPSGROUP:Peru GNPSGROUP:Rural Industrial (2) GNPSGROUP:Rural Traditional (3) GNPSGROUP:TamboDeMora GNPSGROUP:Tunapuco GNPSGROUP:Urban Industrial (1) GNPSGROUP:USA GNPSLibraryURL GNPSLinkout_Cluster GNPSLinkout_Network Instrument Ion_Source IonMode Library_Class MQScore MZErrorPPM name number of spectra parent mass PI RTMean shared name SharedPeaks Smiles SpectrumID sum(precursor intensity)
137.046_0.3805 137.046 0.3805 M+H HYPOXANTHINE 0.0039978 NA NA "USA,BurkinaFaso,Peru" "Isolated Traditional (4),Rural Industrial (2),Urban Industrial (1),Rural Traditional (3)" "4,3,2,1" NA "Norman,Guayabo,TamboDeMora,Matses,BurkinaFaso,Tunapuco" NA 0 11178 239 Commercial standard Prasad 3287.253717 3086.928938 NA 3232.343733 1780.195425 NA NA NA 2341.534378 3308.863934 1780.195425 NA http://gnps.ucsd.edu/ProteoSAFe/gnpslibraryspectrum.jsp?SpectrumID=CCMSLIB00000577903 "https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=45672509f37244b89c33d0afe8935c88&view=view_all_clusters_withID&show=true#{""main.cluster index_lowerinput"":""11178.0"",""main.cluster index_upperinput"":""11178.0""}" https://gnps.ucsd.edu/ProteoSAFe/result.jsp?view=network_displayer&componentindex=239&task=45672509f37244b89c33d0afe8935c88&show=true Q-Exactive Plus LC-ESI Positive 1 0.975775 29.1704 11178 108 137.046 Alexandrov Theodore 0.3805 11178 5 C1=NC2=C(N1)C(=O)N=CN2 CCMSLIB00000577903 308143.7386
139.0503_0.3108 139.0503 0.3108 M+H Spectral Match to Nicotinamide N-oxide from NIST14 0.000289917 NA NA "USA,BurkinaFaso,Peru" "Isolated Traditional (4),Rural Industrial (2),Urban Industrial (1),Rural Traditional (3)" "4,3,2,1" NA "Norman,Guayabo,TamboDeMora,Matses,BurkinaFaso,Tunapuco" NA 0 1791 -1 Isolated Data deposited by mjmeehan 556.2729524 958.0821205 NA 625.2442196 684.3900649 NA NA NA 663.1911842 785.1071473 684.3900649 NA http://gnps.ucsd.edu/ProteoSAFe/gnpslibraryspectrum.jsp?SpectrumID=CCMSLIB00003136964 "https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=45672509f37244b89c33d0afe8935c88&view=view_all_clusters_withID&show=true#{""main.cluster index_lowerinput"":""1791.0"",""main.cluster index_upperinput"":""1791.0""}" This Node is a Singleton IT/ion trap ESI Positive 3 0.999779 2.08498 1791 108 139.0503 Data from Gabriel Haddad 0.3108 1791 8 N/A CCMSLIB00003136964 75862.31072
148.0759_3.2191 148.0759 3.2191 M+H 3-METHYL-2-OXINDOLE 0.000106812 NA NA "USA,BurkinaFaso,Peru" "Isolated Traditional (4),Rural Industrial (2),Urban Industrial (1),Rural Traditional (3)" "4,3,2,1" NA "Norman,Guayabo,TamboDeMora,Matses,BurkinaFaso,Tunapuco" NA 0 4056 596 Commercial Fernando Vargas 212.78854 106.0768954 NA 107.3272897 57.71567023 NA NA NA 96.86660084 128.5895288 57.71567023 NA http://gnps.ucsd.edu/ProteoSAFe/gnpslibraryspectrum.jsp?SpectrumID=CCMSLIB00005463646 "https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=45672509f37244b89c33d0afe8935c88&view=view_all_clusters_withID&show=true#{""main.cluster index_lowerinput"":""4056.0"",""main.cluster index_upperinput"":""4056.0""}" https://gnps.ucsd.edu/ProteoSAFe/result.jsp?view=network_displayer&componentindex=596&task=45672509f37244b89c33d0afe8935c88&show=true Orbitrap ESI Positive 1 0.926001 0.721329 4056 107 148.0759 Dorrestein 3.2191 4056 6 CC1C2=CC=CC=C2NC1=O CCMSLIB00005463646 12013.46654
195.0651_3.0126 195.0651 3.0126 M+H Spectral Match to trans-Ferulic acid from NIST14 9.16E-05 NA NA "USA,BurkinaFaso,Peru" "Isolated Traditional (4),Rural Industrial (2),Urban Industrial (1),Rural Traditional (3)" "4,3,2,1" NA "Norman,Guayabo,TamboDeMora,Matses,BurkinaFaso,Tunapuco" NA 0 8500 165 Isolated Data deposited by amelnik 9.142905189 10.95795092 NA 79.3812195 23.49560097 NA NA NA 11.48764718 57.9869729 23.49560097 NA http://gnps.ucsd.edu/ProteoSAFe/gnpslibraryspectrum.jsp?SpectrumID=CCMSLIB00003137490 "https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=45672509f37244b89c33d0afe8935c88&view=view_all_clusters_withID&show=true#{""main.cluster index_lowerinput"":""8500.0"",""main.cluster index_upperinput"":""8500.0""}" https://gnps.ucsd.edu/ProteoSAFe/result.jsp?view=network_displayer&componentindex=165&task=45672509f37244b89c33d0afe8935c88&show=true Q-TOF ESI Positive 3 0.911395 0.469345 8500 107 195.0651 Data from Rob Knight 3.0126 8500 5 N/A CCMSLIB00003137490 3871.357412
245.0983_2.3864 245.0983 2.3864 [M+H]+ BIOTIN 0.0032959 NA NA "USA,BurkinaFaso,Peru" "Isolated Traditional (4),Rural Industrial (2),Urban Industrial (1),Rural Traditional (3)" "4,3,2,1" NA "Norman,Guayabo,TamboDeMora,Matses,BurkinaFaso,Tunapuco" NA 0 24636 2344 isolated MoNA:VF-NPL-QEHF028111 4.515143699 1.645258231 NA 3.247397651 4.589561014 NA NA NA 3.481700531 6.144073684 4.589561014 NA http://gnps.ucsd.edu/ProteoSAFe/gnpslibraryspectrum.jsp?SpectrumID=CCMSLIB00004721692 "https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=45672509f37244b89c33d0afe8935c88&view=view_all_clusters_withID&show=true#{""main.cluster index_lowerinput"":""24636.0"",""main.cluster index_upperinput"":""24636.0""}" https://gnps.ucsd.edu/ProteoSAFe/result.jsp?view=network_displayer&componentindex=2344&task=45672509f37244b89c33d0afe8935c88&show=true ESI-QFT N/A positive 3 0.867222 13.4474 24636 106 245.0983 MoNA 2.3864 24636 6 N/A CCMSLIB00004721692 452.0527228
263.2369_6.6799 263.2369 6.6799 M+H-H2O "Spectral Match to Conjugated linoleic acid (9E,11E) from NIST14" 9.16E-05 NA NA "USA,BurkinaFaso,Peru" "Isolated Traditional (4),Rural Industrial (2),Urban Industrial (1),Rural Traditional (3)" "4,3,2,1" NA "Norman,Guayabo,TamboDeMora,Matses,BurkinaFaso,Tunapuco" NA 0 5499 335 Isolated Data deposited by amelnik 50.57413483 14.22096432 NA 24.39415133 28.23868293 NA NA NA 31.26233591 34.6367702 28.23868293 NA http://gnps.ucsd.edu/ProteoSAFe/gnpslibraryspectrum.jsp?SpectrumID=CCMSLIB00003136691 "https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=45672509f37244b89c33d0afe8935c88&view=view_all_clusters_withID&show=true#{""main.cluster index_lowerinput"":""5499.0"",""main.cluster index_upperinput"":""5499.0""}" https://gnps.ucsd.edu/ProteoSAFe/result.jsp?view=network_displayer&componentindex=335&task=45672509f37244b89c33d0afe8935c88&show=true HCD ESI Positive 3 0.919096 0.347796 5499 108 263.2369 Data from P.Dorrestein 6.6799 5499 10 N/A CCMSLIB00003136691 3217.37545
272.1711_0.3211 272.1711 0.3211 M+H Spectral Match to Pro-Arg from NIST14 0.00012207 NA NA "USA,BurkinaFaso,Peru" "Isolated Traditional (4),Rural Industrial (2),Urban Industrial (1),Rural Traditional (3)" "4,3,2,1" NA "Norman,Guayabo,TamboDeMora,Matses,BurkinaFaso,Tunapuco" NA 0 11602 -1 Isolated Data deposited by fevargas 12.58430183 11.79805052 NA 31.12912398 15.54470568 NA NA NA 15.80852711 24.21593788 15.54470568 NA http://gnps.ucsd.edu/ProteoSAFe/gnpslibraryspectrum.jsp?SpectrumID=CCMSLIB00003137278 "https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=45672509f37244b89c33d0afe8935c88&view=view_all_clusters_withID&show=true#{""main.cluster index_lowerinput"":""11602.0"",""main.cluster index_upperinput"":""11602.0""}" This Node is a Singleton Q-TOF ESI Positive 3 0.720953 0.448506 11602 98 272.1711 Data from Christopher A. Lowry 0.3211 11602 8 N/A CCMSLIB00003137278 2064.89576
391.188_5.247 391.188 5.247 M-2H2O+H """(6R)-2-(hydroxymethyl)-6-((3R,5R,7R,8R,9S,10S,12S,13R,14S,17R)-3,7,12-trihydroxy-10,13-dimethylhexadecahydro-1H-cyclopenta[a]phenanthren-17-yl)heptanoic acid""" 0.00158691 NA NA "USA,BurkinaFaso,Peru" "Isolated Traditional (4),Rural Industrial (2),Urban Industrial (1),Rural Traditional (3)" "4,3,2,1" NA "Norman,Guayabo,TamboDeMora,Matses,BurkinaFaso,Tunapuco" NA 0 14059 485 crude Emily Gentry 64.56232761 12.02330053 NA 6.837557082 13.45614515 NA NA NA 18.17696299 7.954304517 13.45614515 NA http://gnps.ucsd.edu/ProteoSAFe/gnpslibraryspectrum.jsp?SpectrumID=CCMSLIB00005465846 "https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=45672509f37244b89c33d0afe8935c88&view=view_all_clusters_withID&show=true#{""main.cluster index_lowerinput"":""14059.0"",""main.cluster index_upperinput"":""14059.0""}" https://gnps.ucsd.edu/ProteoSAFe/result.jsp?view=network_displayer&componentindex=485&task=45672509f37244b89c33d0afe8935c88&show=true qTof ESI Positive 1 0.864316 3.67924 14059 105 391.188Dorrestein 431.3176 14059 18 C[C@@H]([C@H]1CC[C@]2([H])[C@]1(C)[C@@H](O)C[C@@]3([H])[C@@]2([H])[C@H](O)C[C@]4([H])[C@]3(C)CC[C@@H](O)C4)CCCC(CO)C(O)=O CCMSLIB00005465846 1765.737543
314.2699_7.3373 314.2699 7.3373 M+H Spectral Match to N-Palmitoylglycine from NIST14 0.00189209 NA NA "USA,BurkinaFaso,Peru" "Isolated Traditional (4),Rural Industrial (2),Urban Industrial (1),Rural Traditional (3)" "4,3,2,1" NA "Norman,Guayabo,TamboDeMora,Matses,BurkinaFaso,Tunapuco" NA 0 19209 383 Isolated Data deposited by mjmeehan 2.57055801 0.74362775 NA 1.604878946 0.490755632 NA NA NA 1.106978946 1.317783866 0.490755632 NA http://gnps.ucsd.edu/ProteoSAFe/gnpslibraryspectrum.jsp?SpectrumID=CCMSLIB00003140124 "https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=da2a1bf6bb58455690219c0abc637a44&view=view_all_clusters_withID&show=true#{""main.cluster index_lowerinput"":""19209.0"",""main.cluster index_upperinput"":""19209.0""}" https://gnps.ucsd.edu/ProteoSAFe/result.jsp?view=network_displayer&componentindex=383&task=da2a1bf6bb58455690219c0abc637a44&show=true HCD ESI Positive 3 0.859573 6.02063 19209 102 314.2699 Data from Jessica Metcalf 7.3373 19209 6 N/A CCMSLIB00003140124 130.8071214