Headers removed when importing phyloseq object for decontam

Matilda_H-D · January 30, 2019, 7:49am

Hello,

This is probably an incredibly simple question but I'm incredibly inexperienced with R and phyloseq! I really want to try out decontam on my data though so decided to have a go.

I used qza_to_phyloseq in qiime2R to import my deblurred feature table, sample metadata (a tab-separated .txt), insertion tree and Greengenes taxonomy file as a phyloseq object in RStudio.

phy <- qza_to_phyloseq("~/Documents/Bam/LongitudinalAnalysis/Decontam/merged-table.qza", "~/Documents/Bam/LongitudinalAnalysis/insertion-tree.qza", "~/Documents/Bam/LongitudinalAnalysis/Decontam/gg_taxonomy.qza", "~/Documents/Bam/LongitudinalAnalysis/Decontam/MergedBam_Mapping_MHD.txt")

Description of 'phy':
phyloseq-class experiment-level object otu_table() OTU Table: [ 1987 taxa and 779 samples ] sample_data() Sample Data: [ 779 samples by 45 sample variables ] tax_table() Taxonomy Table: [ 1987 taxa by 7 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 1987 tips and 1985 internal nodes ]

When I did head(sample_data(phy)) it looks like none of the metadata category headers are there (i.e. the first line is sample info, not the headers). This was also borne out when I went through subsequent steps in the decontam vignette and tried e.g. plotting the data by library size (the colouring didn't work as I expected because the category names couldn't be found).

Did I do something wrong when importing and how can I make sure the category headers are kept?

Thank you!!

Matilda_H-D · January 31, 2019, 7:53am

Just to follow up, I tried importing each of components separately (i.e. the feature table, tree, taxonomy and metadata) using Qiime2R read_qza and combining them into a phyloseq object as follows (based on the Qiime2R tutorial here):

phy2<-phyloseq( otu_table(Bam_asvs$data, taxa_are_rows = T), phy_tree(Bam_tree$data), tax_table(as.data.frame(gg_taxtable) %>% select(-Confidence) %>% column_to_rownames("Feature.ID") %>% as.matrix()), #moving the taxonomy to the way phyloseq wants it sample_data(metadata %>% as.data.frame() %>% column_to_rownames("#SampleID")) )

and the metadata headers seem to have been retained. Still not sure what went wrong with the initial command, though.

jbisanz · January 31, 2019, 5:46pm

Can you share the head(metadata, 4) from R and the top of the file from your terminal command line head -n 4 MergedBam_Mapping_MHD.txt?

Matilda_H-D · February 1, 2019, 2:04am

Hi Jordan,

Here is the R output:

> head(metadata, 4)
# A tibble: 4 x 46
  `#SampleID` BarcodeSequence LinkerPrimerSeq… ReversePrimerSe…
  <chr>       <chr>           <chr>            <chr>           
1 Bam15.002   ATGCTGCAACAC    AATGATACGGCGACC… NotApplicable   
2 Bam15.003   TCGGCGATCATC    AATGATACGGCGACC… NotApplicable   
3 Bam15.004   ATTGAGTGAGTC    AATGATACGGCGACC… NotApplicable   
4 Bam15.005   GTTCACGCCCAA    AATGATACGGCGACC… NotApplicable   
# … with 42 more variables: ACAD_Number <chr>,
#   ExtractionMethod <chr>, PCRCycles <dbl>,
#   SequencingDate <chr>, LabWork <chr>, ExtractionGroup <chr>,
#   ExtractionDate <chr>, SampleTypeYear <chr>, Control <chr>,
#   Period <chr>, Culture <chr>, SpecificLocation <chr>,
#   Country <chr>, Continent <chr>, StudyID <chr>,
#   SamplingYear <chr>, StudyVisit <dbl>, StudyYear <chr>,
#   YearStudyYear <chr>, ReturnVisit <chr>,
#   ExperimentalGroup <chr>, ExperimentalStudyYear <chr>,
#   Compliance <chr>, YearCompliance <chr>,
#   ExperimentalCompliance <chr>,
#   ExperimentalYearCompliance <chr>, YBP <dbl>, Project <chr>,
#   YearofTreatment <dbl>, Age <chr>, pH <chr>,
#   ExaminationDate <chr>, Examiner <chr>, NumberofTeeth <chr>,
#   Saliva_Consistency <chr>, StimulatedSalivaFlow <chr>,
#   ToothStaining <chr>, Gender <chr>, CariesStatus <chr>,
#   NoSurfacesCariesTotal <chr>, CariesSeverity <chr>,
#   Description <chr>

The terminal command looks to be printing the entire mapping file so I didn't want to paste it!

Todd_Testerman · February 10, 2019, 6:26pm

Hi Matilda,

I don't have a solution to your problem but just wanted to mention I've been experiencing a similar problem. My thought is that something has changed in such a way that the import process is treating the "#" sign in the header line like a comment in a script (ignoring it). I've tried adding a "#q2:types" line below the initial column header line and find that this line is also lost following import.

You might try changing your metadata file to have a different initial header that does not have a "#" sign in it ("sampleid", for instance). Here's a link to other acceptable headers that don't require a pound sign - Metadata Options

Maybe that will help fix your issue!

Todd

thermokarst · February 11, 2019, 4:16pm

Great suggestion, @Todd_Testerman! FWIW, the #SampleID header is for backwards compatibility with QIIME 1 --- maybe some day we will drop that syntax altogether, it is a bit confusing!

bmillerlab · March 12, 2019, 9:56pm

Continuing the discussion from Headers removed when importing phyloseq object for decontam:

Hello, Would it be possible to note this header problem somewhere it is easier to find. I had the same problem and fixed it using the change to sampleid instead of #SampleID. It took me forever to find this solution though, because I really thought I'd screwed up. I had already run my data using deblur and put that through decontam just fine. So when I went back to qiime2 and used dada2 for the first time and couldn't get my data into decontam I thought I had screwed up the dada2 run!

Thanks for qiime2 and decontam. Both of them are great!!

jbisanz · March 13, 2019, 5:13pm

Hi Matilda,

Sorry I missed your response. I think I may have pushed an update that would fix this problem which better handles the q2types line in the qza_to_phyloseq() function and separately makes the read_q2metadata() function available.

Jordan

jmalazar · September 21, 2021, 7:18pm

Hi!

This might be a super late reply but just want to add another trick. After the header, I added another row that contains the type/string of the variables. Something like this:

ROW1: #SampleID BarcodeSequence LinkerPrimerSequence Adapter Sampling_site
ROW2: #q2:types categorical categorical categorical factor

Cheers!