Method to filter demultiplexed file

Seth · August 9, 2018, 1:16am

Hello,

I am working on a data set that was prepared using the EMP protocols for 515-806 v4 data. Is there any way that I can filter out a subset of sample data directly from an artifact created by the demux plugin.

I would like to share the data from this early state so that my collaborator can choose to process it in any way they choose (ie; test OTU clustering vs sequence variant results, etc).

colinbrislawn · August 9, 2018, 6:40am

Hello Seth,

Will, this plugin will let you filter out samples from a table... but you want to filter samples before that.

This is an excellent question! Let's see what the Qiime devs recommend.

Colin

Seth · August 9, 2018, 3:00pm

Hi Colin,

Thanks for the input! That tool does work excellently on the data after I have processed the demultiplexed reads (by dada2 in this case). But of course this means that the data that I have shared in this manner are transformed in a way that can't be reversed. I can only find tools for filtering table, sequence with taxonomy, and distance matrix artifacts.

It could be helpful to have a definitive list of filters in one place. The closest thing I can find is this tutorial, but it isn't clear to me if this covers every type of artifact that can be filtered.

Seth

colinbrislawn · August 9, 2018, 4:27pm

What commend are you using the import / demultiplex your reads? I ask because if you are using something like the fastq manifest format, you could simply leave out the files you don't want to import.

Importing is one of the trickiest parts of this process because there are so many ways you can do it, and it's hard to know which method is best...

Colin

Seth · August 9, 2018, 4:35pm

I'm using the 'qiime tools import' command with the 'type' 'EMPPairedEndSequences' option as implemented in the moving pictures tutorial. Since I know the sample barcodes, I suppose there must be a way for me to pull those out of the fastq files directly, but was hoping to find a simple method within QIIME2.

Seth

thermokarst · August 9, 2018, 11:19pm

Hey @Seth!

It sounds like your reads are still multiplexed, which means you need to demux using demux emp-paired. When you do this, you can prepare a separate metadata file for each "subset" of data you are interested in working with. Then, using the same imported muxed seqs artifact, run demux emp-paired once for each metadata file subset in your analyses. Make sense?

Seth · August 10, 2018, 6:33pm

Hi @thermokarst!

I do see what you're saying, and it is a very clever and straightforward solution!

One question though, can't I just do this directly with the subset metadata file? What makes you say that I should run demux emp-paired on the whole data set first (which has been done and worked just fine)?

Thanks very much

thermokarst · August 10, 2018, 6:49pm

Isn't that what I proposed above?

I don't think I suggested that above, I was trying to recommend the exact opposite - demux using your subset metadata files.

Seth · August 10, 2018, 6:56pm

I misinterpreted your second sentence to mean running demux emp-paired, followed by making subet metadata files.

The fix works perfectly. Thanks again for your help.

system · September 11, 2018, 1:09am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.