subsetting sequencing runs using metadata prior to denoising

hsapers · April 18, 2020, 8:55pm

Hello - I'm not sure if this should be in user support or technical support, apologies if this is the wrong topic.

I have a directory containing CASAVA 1.8 FATSQ files (demultiplexed, paired end reads, 2x251 kit, all non-biological sequences removed). These files are from three different sequencing runs. The sequencing run associated with each file is stored in a column in the keemi verified metadata (as a .tsv file). I know that I should run DADA2 separately on data from each run individually, then combine data from the runs after denoising. Is there a way I can subset the fastq files using the metadata during importing (I'm using the CasavaOneEightSingleLanePerSampleDirFmt)? Or, alternatively, can I subset the imported data using the 'sequencing run col' of the metadata prior to denoising?

I am running QIIME 2020.2.0 installed using conda on a linux server.

Thank you

colinbrislawn · April 18, 2020, 9:18pm

Hello Haley,

Welcome to the forums! :qiime2:

There is a couple of different ways to do this. One way would be to make three folders for your three runs, move your fastq files into their matching folders, then import three times. This method gives you separate artefacts without having to filter them after importing.

Another option would be to import using the Fastq manifest format. Just like above, you would import three times for your three runs, each time using a PairedEndFastqManifestPhred33V2 file that only lists the files on that run.

I think so... but I'm not 100% sure about the command.

Let's see what the Qiime devs say!

Colin

hsapers · April 18, 2020, 10:23pm

Thanks Colin! That's really helpful, I'll play around for a bit to see if I can easily find a way to subset the fastq files into separate runs using unix commands or if there's a qiime2 sub-setting command I can use after importing, removing primers with cutadapt, but before denoising.

Nicholas_Bokulich · April 18, 2020, 10:26pm

The data must be demultiplexed already but the CLI command is qiime demux filter-samples

(CasavaOneEightSingleLanePerSampleDirFmt is already demultiplexed)

If the data are not yet demultiplexed, only include the samples you wish to demultiplex in your barcodes sample metadata file.

Good luck!

hsapers · April 20, 2020, 11:54pm

I was wondering if it was possible to filter demux-paired-end.qza based on metadata. I would like to denoise subsets of the data before merging feature tables. I used cutadapt to trim primers so I was hoping there would be an easy way to filter prior to denoising rather than pull my trimmed reads into different folders for separate imports. I looked at the filtering tutorial but can't seem to find information about filtering this early in the pipeline. Thanks!

Nicholas_Bokulich · April 21, 2020, 2:51pm

Hi @hsapers,

Please see my comment above:

hsapers · April 22, 2020, 12:45am

Thank you @Nicholas_Bokulich sorry I missed that earlier - worked perfectly!

system · May 23, 2020, 6:45am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.