Merging data (and mapping files) from two plates with different samples

Hi, thank you so much for providing such a wonderful resource. I am so grateful to all of you at Qiime2!

I have 3 plates with samples from two different tissues because all the samples from each tissue couldn’t fit on one plate:
Plate 1: samples from tissue a
Plate 2: samples from tissue a, samples from tissue b
Plate 3: samples from tissue b

Each plate also has its own mapping file, with all of the same metadata, including information about plate and batch etc.

I want to analyze all of the tissue a data together and all of the tissue b data together. I processed them each separately to the point of creating a dada2 table for each, then filtered the table from plate 2 to create a table with just tissue a (for example), and then used merge (sum) to merge plate 2 tissue a data with the plate 1 tissue a data.

This worked well. The only issue, is that when I continue, and try to visualize that table, I can’t because I’m unable to merge the associated mapping files and thus I don’t have a mapping file to provide.

When I tried to merge the mapping files, for example using feature-table filter samples and providing the mapping files for both plate 1 and plate 2, it tells me that it can’t do this because there are not overlapping IDs “ValueError: Cannot merge because there are no IDs shared across metadata objects.” and it seems that this feature is intended to add additional metadata for the same samples rather than to combine the same metadata for two distinct sets of samples.

What I’m wondering is:

  1. Is this a good approach in general? If not, what is a better approach? I read in previous questions that the two choices are either process them all together and then use the group feature, or process samples separately and then use merge, however I’m confused, because it seems to me that even if I were to begin again and process all of the sample together and then group them by plate, I would still need to combine the mapping files some way.
  2. Is there a way to combine mapping files like this (for distinct ids) in qiime2? Maybe I am missing something obvious? What do you recommend for this?

Thank you and have a beautiful day!

1 Like

Hi @ariel! It sounds like each of the samples (regardless of which plates they are associated with) have distinct IDs. In that case, QIIME 2 will treat each sample independently, which is why the metadata files can’t be merged. Merging metadata files only retains sample IDs that are shared across all files.

You have a couple of options. Option 1 should be easier, given where you’re at in your analyses.

  1. Treat each sample independently (this is your current setup), where each sample has its own unique ID. If you wish to merge samples, use qiime feature-table merge to merge all feature tables into a single table. Next, use qiime feature-table group with a single metadata file containing all sample IDs contained in the merged feature table. You’ll need to use a metadata column in that file to define which samples should be grouped together. After running qiime feature-table group using that “grouping” metadata column, you’ll need to manually create a new metadata file that has new sample IDs corresponding to the group names in the metadata column you grouped by. The new “grouped” metadata file and feature table can then be used for downstream analyses.

  2. Restart your analyses using sample IDs that share the same name across sequencing runs. Process each run independently with DADA2, and then use qiime feature-table merge to merge samples sharing the same name. You can use a single metadata file containing your sample IDs, along with the merged feature table, for downstream analyses.

Hi, Thank you so much for getting back to me and so quickly! Yes, this is exactly what I need to do. I agree. However the the reason I'm asking is because I'm not sure what the best way to create such a metadata file is. For example, I could copy and paste them together in excel, however that seems like somewhat of a bad idea. It would be amazing if qiime2 could provide such a tool. In the meantime, do you have any suggestions for the best way to go about this?

  1. “Restart your analyses using sample IDs that share the same name across sequencing runs.”

–There are no samples that were run twice, if that provides further clarity. I just happened to have run extra samples from the same project on a second plate

1 Like

Unfortunately this is a manual process – you can accomplish this in a spreadsheet program such as Excel or Google Sheets, or perhaps write a script to do it. We don’t have an automatic method for grouping metadata because it’s hard to determine what the new column values should be (e.g. if samples being grouped have conflicting values for a particular metadata column, it’s unclear how the software should handle that situation). We decided to not implement a metadata grouping tool to avoid making too many assumptions about the investigator’s data. Sorry that this is a bit of a pain point!

Jai,

Thank you, I understand. In any event I was able to combine the mapping files in R and validate them using Keemi. Now I’m wondering if there is a way to combine the representative sequences because I’d like to make a phylogenetic tree and look at alpha and beta diversity for this entire group of samples and I realized that that is one step back.

Thank you,
Ariel

Yes! You can use merge-seqs.

I hope that helps!

Oh!!! Yes! Of course! Thank you!

2 Likes

Hello,

just briefly, and as always you must edit the examples according to your needs You can merge your data equivalent to this approach:

qiime feature-table merge \
  --i-tables "${tab[1]}" \
  --i-tables "${tab[2]}" \
  --i-tables "${tab[3]}" \
  --o-merged-table "$trpth"/"${otpth_tab[1]}"
qiime feature-table merge-seqs \
  --i-data "${seq[1]}" \
  --i-data "${seq[2]}" \
  --i-data "${seq[3]}" \
  --o-merged-data "$trpth"/"${otpth_seq[1]}"

The metadata mapping files can be merged without trouble akin to this:

tab[1]='foo1'
tab[2]='foo2'
tab[3]='foo3'
tab[4]='foo4'

otpth_tab='merged_foo.tsv'

# run script
# ----------
touch "$trpth"/"$otpth_tab"
head -n 1  "${tab[1]}" > "$trpth"/"$otpth_tab"
for ((i=1;i<=4;i++)); do
   tail -n +2 "${tab[$i]}" >> "$trpth"/"$otpth_tab"
done

Your metadata tables need to have the same columns though and should end after the last line with a line break.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.