Merging feature tables

Sarah_McGrath · December 26, 2017, 5:36pm

Hello,

I completed two separate runs on an Illumina MiSeq for the same set of samples and I am now trying to merge the feature tables for those runs. The first run had great quality but low sequence counts while the second run had lower quality with high sequence counts so I am trying to get the greatest amount of data I can from them both for further analysis.

I completed the dada2 denoise step for each set of sequencing files separately using the same parameters and just tried to merge the resulting feature tables. I got this error...

(qiime2-2017.11) qiime2@qiime2core2017-11:~$ qiime feature-table merge --i-table1 frog1forward/table-frog1forward.qza --i-table2 frog2forward/table-frog2forward.qza --o-merged-table merged-table-frogforward
Plugin error from feature-table:

Some samples are present in both tables: H2O1, N3PH2O2, AFVN1, AMC, N3FH2O3, AMCN4, N1in3, N4WTad3, N3WTad3, N1H2O3, N3FH2O1, N1H2OFTads, N4leaf3, N1WTad1, N3in3, N3leaf2, AMDN3, N3out1, AMDN4, N3FTad2, AMVN1, H2ON3, AMDN1, N1H2O1, H2O3, N4in2, N4out2, N3in1, AFCN4, N4WTad2, N1FTad2, AFDN1, N1H2O2, N4PH2O2, AFCN2, N1leaf2, H2OcontrolN4, N4in1, AMVN4, H2Ocontrol3, N3out2, C2H2O, N1WTad2, N3out3, N4FH2O2, AMD, N1PH2O1, N3in2, N1FTad3, N3WTad1, N1WTad3, N3WTad2, N4FTad2, AFDN3, N4out3, N4PH2O3, H2Ocontrol1, N4WTad1, AFVN3, N4leaf1, N22, N1leaf1, N21, N1PH2O3, N1PH2O2, N1in1, H2O2, N1out1, N4PH2O1, N1out2, AMCN3, AFDN4, N4leaf2, H2Ocontrol2, N4FH2O1, N3PH2O3, N3FTad1, N23, N1in2, N3FTad3, N4out1, N3PH2O1, AMVN3, AFCN3, AMCN1, N4in3, N1out3, AFDN2, N1FTad1, AFCN1, AFVN4, N4FTad1, N4FTad3, N3leaf3, N1leaf3, N3FH2O2, SterileSwab, N3leaf1, N4FH2O3, AFVN2, AMV

Debug info has been saved to /tmp/qiime2-q2cli-err-eijawch_.log

Do I need to change the file names for each sample to represent which run it was associated with (e.g., N4WTad1-1, N4WTad1-2)? Why would sample ID's be an issue with trying to merge different runs of the same samples?

Any assistance is greatly appreciated!

Thanks,

Sarah McGrath

colinbrislawn · December 27, 2017, 4:47pm

Good morning Sarah,

Correct. This should work OK once the names are unique.

I think they require unique names to make sure that people don't merge their samples by accident. This means that everyone will first merge their feature tables then merge their replicate samples.

Also, you may decide not to merge your samples from the two runs; when I combine samples from two miseq runs, I will keep them separate so that I can include a metadata variable like MiSeqRunNumber and use it to detect batch effects.

Happy New Year, Sarah,
Colin

thermokarst · December 29, 2017, 1:26pm

Thanks @colinbrislawn - this is great advice!

@Sarah_McGrath --- you have two options that are jumping out at me right now:

Option 1: you can treat these sample replicates as unique samples (which is what you're suggesting with the modified sample ids N4WTad1-1, I think).

Option 2: keep the sample ids as-is (some duplicates in each table), and then when you merge your feature tables, you can specify an alternative overlap-method - in this case, sum. As @colinbrislawn suggested above, the default overlap-method is error_on_overlapping_sample - which requires all sample identifiers to be unique for all tables being merged. If you change that method to sum, it will allow duplicate sample IDs, and will sum the observations when overlaps are found between tables.

I am not sure which approach makes the most sense for your study, but @Nicholas_Bokulich might have something else to add to @colinbrislawn's metadata suggestion. Thanks!

Nicholas_Bokulich · December 29, 2017, 2:14pm

Hi @Sarah_McGrath,

To elaborate on @colinbrislawn's suggestion, a simple approach to detect batch effects right now would be:

filter your merged feature table to contain only replicated samples.
Generate PCoA plots to visually compare sample ordination by sample ID and by batch
Use beta-group-significance to test for significant differences between batches.

If any of your samples are artificial/mock community samples that have a known composition, you can also use the actions in the quality control plugin to assess how well these known samples are profiled in each batch.

I hope that helps!

Sarah_McGrath · January 2, 2018, 1:49pm

Hello,

Thank you all for your great responses.

I have decided to change all of the sample id's to be unique for each run and have gone through all of the steps up until qiime feature-table merge-seqs (qiime feature-table merge worked perfectly for me this time!).

Now I am getting an error code when attempting to merge-seqs...

(qiime2-2017.11) qiime2@qiime2core2017-11:~$ qiime feature-table merge-seqs --i-data1 frog1forward/rep-seqs-frog1forward.qza --i-data2 frog2forward/rep-seqs-frog2forward.qza --o-merged-data merged-rep-seqs-frogforward.qza
Error: QIIME 2 plugin 'feature-table' has no action 'merge-seqs'.

I couldn't find anything on the forum about how to handle this. Any suggestions?

My plan is to follow the steps of @Nicholas_Bokulich to check for batch effects and if there aren't any, then sum my features and proceed with analysis accordingly. Does that sound appropriate?

Thanks!

thermokarst · January 2, 2018, 11:00pm

Hi @Sarah_McGrath - it looks like you are using QIIME 2 2017.11, but referencing a method from QIIME 2 2017.12. We recently renamed this method from merge-seq-data to merge-seqs, which would explain why you are seeing that error. As my momma taught me when I was but a wee child, "always remember to use the version of the docs that match the version of the software!" (well, maybe I am embellishing that memory a bit). So, you should be able to just tweak your command to use merge-seq-data instead, and then you are good to go! Good luck and keep us posted!

Sarah_McGrath · January 3, 2018, 1:07pm

Hi @thermokarst!

Haha, yes, your momma was a wise woman! Sorry about that, I had just done everything else in the 2017.11 version and wanted to keep up the momentum. I'll update to 2017.12. Thanks for the code info, merge-seq-data worked perfectly! I do however seem to have batch effects and cannot sum the data from both of my runs . But at least now I know how to check for those effects and can plan the rest of my analysis accordingly!

Thanks for the assistance everyone!

system · February 3, 2018, 7:07pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.