Combining Sequencing Runs for Open Reference OTU Clustering

Sara_Jeanne08 · February 21, 2018, 1:11pm

Hi All,

I have two MiSeq runs that have to be pre-processed and demultiplexed in QIIME 1 prior to importing into QIIME 2. These runs have overlapping barcodes. I want to run these two datasets together through clustering with the Open Reference VSearch tool.

What is the best way to combine these runs and still retain the sample IDs? I see two options: 1) Should I CAT the seq.fna file outputs from QIIME 1 and then import this new merged file into QIIME 2?

I run dereplication and chimera filtering in QIIME 2 prior to VSearch - so
2) is there a way to merge these filtered .qza sequence and table files prior to clustering in QIIME 2? This second option allows me to filter out PCR errors and sequencing artifacts from each individual run and seems to me to be the more accurate way to process my two runs prior to merging but I am uncertain on how to merge the two sets of .qza files after this point but before I do the OTU picking step.

Thank you very much,

Sara

Thank you!

Sara

colinbrislawn · February 21, 2018, 5:14pm

Hello Sara,

There are different ways you can merge your data sets after demultiplexing, and they are all listed over here: feature-table — QIIME 2 2018.2.0 documentation

Option 2) you mentioned sounds like feature-table merge-seqs, and I think that's a good fit for vsearch OTU clustering.

Let me know if that looks like a good fit for your data,
Colin

gregcaporaso · February 21, 2018, 8:37pm

Hi @Sara_Jeanne08,
Either of those options should work fine for you. You could cat the sequences files together and then import the resulting file, or you could import the two sequences files, run qiime vsearch dereplicate-sequences twice, and then merge the resulting feature tables with qiime feature-table merge and the resulting sequence files with qiime feature-table merge-seqs. I don't think there will be any practical difference between these two processes, so I would just recommend going with which ever is easier for you. (The exception to this is if you have some of the same samples showing up in both sequences files, in which case you should import them separately into QIIME 2, dereplicate twice, and then merge.) After this, you'll be ready to proceed to open-reference clustering.

Hope this helps!

Sara_Jeanne08 · February 21, 2018, 9:28pm

@gregcaporaso and @colinbrislawn,

Thank you for your help. I appreciate you helping me figure out which option is best for accuracy - It is good to know that there is not a difference between the two. I have already dereplicated and removed chimeras from both of these datasets individually, so it seems like merging the feature table and seqs would be my best option for moving forward with the combined analysis.

I do have a question about having overlapping IDs - the --p-overlap-method parameter - how does this work?

Sara

gregcaporaso · February 21, 2018, 10:10pm

My guess, based on your description, is that you want to use the default setting which is to error on overlapping sample ids. This means that you are combining tables which contain some, all, or none of the same features, but do not contain any of the same samples. This makes the merge straight-forward, as the individual counts are never modified. However, if some samples show up in more than one table, you'll get an error. If you do have samples and features that show up in more than one table, you can use the sum option, which will sum the counts for sample/feature pairs that show up in more than one table.

You might be wondering why you wouldn't use sum all the time. Using error_on_overlapping_samples is now faster than sum (as of QIIME 2 2018.2), but it's also a good option if you're not expecting samples to show up in more than one table, as it will error if the tables aren't meeting that expectation.

Sara_Jeanne08 · February 22, 2018, 5:41am

Hi Greg,

I do have a few samples with the same IDs, so I tried to use the sum option for the --p-overlap-method parameter. Unfortuately I am getting errors trying to merge my table and sequences. Below is the command I passed and the output. I have tried both a space separating the two file paths and a comma, neither worked:

(qiime2-2017.12) bash-3.2$ qiime feature-table merge --i-tables /Users/Sara_Jeanne/Desktop/QIIME_122017/20180202_table-nonchimeric.qza /Users/Sara_Jeanne/Desktop/QIIME2/20180217_Spider_PHII_Analysis/20180217_uchime-vsearch_phII_solo_chimera_detect/20180217_table-nonchimeric.qza --p-overlap-method sum --o-merged-table ./20180221_Merged_Spider_Table_dereplicated_nochimeras --verbose
Usage: qiime feature-table merge [OPTIONS]

Error: Got unexpected extra argument (/Users/Sara_Jeanne/Desktop/QIIME2/20180217_Spider_PHII_Analysis/20180217_uchime-vsearch_phII_solo_chimera_detect/20180217_table-nonchimeric.qza)

Thank you for your time and help with this,

Sara

thermokarst · February 22, 2018, 10:55am

Hi @Sara_Jeanne08, you will want to include the --i-tables flag before every table you wish to merge:

(qiime2-2017.12) bash-3.2$ qiime feature-table merge \
  --i-tables /Users/Sara_Jeanne/Desktop/QIIME_122017/20180202_table-nonchimeric.qza \
  --i-tables /Users/Sara_Jeanne/Desktop/QIIME2/20180217_Spider_PHII_Analysis/20180217_uchime-vsearch_phII_solo_chimera_detect/20180217_table-nonchimeric.qza \
  --p-overlap-method sum \
  --o-merged-table ./20180221_Merged_Spider_Table_dereplicated_nochimeras \
  --verbose

Good luck!

system · March 25, 2018, 4:55pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.