Separating data imported together

Hi fellow Qiime 2 users,

I am currently working with my data in Qiime 2 and have completed the denoising step. I imported data that came from two different projects together (as samples were sent for sequencing together). The problem that I am about to run into, after assigning taxonomy, is that the two projects have different experimental designs and treatments, so I want to analyze diversity metrics separately. How can I separate my projects’ data into two files that will allow me to do this? Thank you in advance!

1 Like

Hi @Rachel05,

Welcome to the :qiime2: forum!

If you have a metadata sheet with the project ID, you can filter your table into two project specific tables.

If this happens again, you've got already demultiplexed reads, you can always just import the project-specific samples using a manifest. (I do this a lot since my center tends to run multiple projects.) That way, I can do things like keep my blanks with each project. There may be a way to do this with EMP demultiplexing by modifying the sample metadata to only include the samples your interested in.

It's double the commands to run, but if you've got access to a server or cluster, you can potentially decrease run time.

Best,
Justine

Thanks Justine! I used the feature table generated from denoising to filter my table into two project specific tables based on ID. However, it seems like I need to split sequences up based on sample ID to assign taxonomy. The deblur sequences file generated from denoising is needed for assigning taxonomy and this file contains all of my sample sequences from both projects. Can I separate sequences based on sample ID so I can assign taxonomy separately or should I assign taxonomy and then split project data up? What am I missing?

Hi @Rachel05,

The classifier should behave the same way on the same sequence whether it shows up in Project A or Project B. So, my recomendation would be to do taxonomic classification on the pooled repset and then split the tables. (I would probably also just build a tree off all sequences and then use it twice.)

In QIIME 2, it wouldn’t matter if you have a taxonomic assignment or tree tip for a feature that isn’t in your table (I do this a lot). However, you may need to filter before you export if you plan to use qiime2R or something similar.

Best,
Jusitne

I think I’m confused as I don’t understand how to use the two separate tables to continue my analyses. Could you perhaps provide example code? Thank you for your patience.

Rachel

Hi @Rachel05,

Sure. So, Im going to assume that we’re starting with a feature table from both projects (feature-table.qza), a rep-set from both projects (rep-set.qza) and a mapping file (map.tsv). I’m also going to assume that in the map there’s a column I’ll call project with two values, “a” and “b”.

Okay, so the first step I’d do is to do taxonomic classification on all the samples. I happen to like naive baysian classification, so that’s what I’ll show here with my classifier, classifier.qza (but sub in your favorite taxonomic classification approach)

qiime feature-classifier classify-sklearn \
  --i-classifier classifier.qza \
  --i-reads rep-seqs.qza \
  --o-classification taxonomy.qza

Now, I’ll also build my tree using all my sequences because I think this is a fundamental step in analysis too! My favorite is fragment insertion because I like reference based techniques, so I’ll show that. Again, pick your favorite tree building method.

qiime fragment-insertion sepp \
  --i-representative-sequences rep_set.qza \
  --i-reference-database sepp-refs-gg-13-8.qza \
  --o-tree ./tree.qza \
  --o-placements ./tree_placements.qza \
  --p-threads 1 

So, now I have taxonomy in taxonomy.qza and a tree in tree.qza. I’ve only had to do the classification and building once. Now, I separate my samples:

qiime feature-table filter-samples \
  --i-table table.qza \
  --m-metadata map.tsv \
  --p-where "project='a'"
  --o-filtered-table project-a.qza

qiime feature-table filter-samples \
  --i-table table.qza \
  --m-metadata map.tsv \
  --p-where "project='b'"
  --o-filtered-table project-b.qza

At this point, I would throw my tables and tree into core diversity (or just distance calculation, YMMV). I like a rarefaction depth for 5000 for this data, so I’m going to use that. (You should of course check for your rarefaction depth by summarizing your feature table.)

qiime diversity core-metrics-phylogenetic \
  --i-phylogeny tree.qza \
  --i-table table.qza \
  --p-sampling-depth 5000 \
  --m-metadata-file map.tsv \
  --output-dir core-metrics-a

These full trees and taxonomy will work as long as you stay within the qiime 2 enviroment. If you chose to go to R or python, you may need to filter there. (But check the threads).

Best,
Justine

3 Likes

Thank you so much for your help and patience. I’m now able to move forward with my data.

Rachel

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.