Best way to filter and merge the data in my case

moonlight · November 7, 2019, 10:12pm

Hello, I have a project ~ 100 samples and sequenced twice (16S rRNA PE MiSeq) because some samples didn't work during the first time sequencing. Let's say the first time 30 samples didn't work and I need to sequence these 30 samples again.

What I need to do is filter 30 samples out from the 1st sequencing data and merge the 2nd sequence data (30 samples) together with the filtered data.

I just switched from QIIME1 to QIIME2. This is what I do in QIIME 1. Since demux and QC are one-step, I will demultiplex the fastq data. I got good quality fasta data. So, it's very easy to filter the unwanted sequences from fasta file and merge with the 2nd time good sequencing data in a new fasta file. Then, start building OTU table with new fasta file.

However, I checked the information here "Filtering data — QIIME 2 2018.4.0 documentation". It seems to me that I can't filter from very beginning. Most of filtering starts after build a feature table, which is equivalent to an OTU table. If I follow the example, I have to build two feature tables based on two batches of sequencing. Later, I filter and merge the feature tables

Here are my questions:

Would it be possible for me to do something as I did in QIIME 1. I filter those unwanted samples' reads from very beginning and merge with 2nd sequencing reads. Use the total data to do the downstream analyses. In doing this, I don't have to build 2 individual feature tables. -- I think this would be a really simple workflow. If I can do this, which scripts I should use?

Any suggestions about the workflow and at which step I should start filter and merge.

llenzi · November 8, 2019, 11:44am

Hi @moonlight,

I'm not sure I can understand at which point you are, also I usually start from already demultiplexed fastq files in my pipeline.
Said that, I can see two ways for you, keep all samples in and called the sample lot (from the 2nd run) with different ids for demultiplexing and so to be recognisable in the analysis. So you can exclude the failed samples after the denoising step.

A second way, is to export the fastq files for the denoised good samples and reimport them in a final object (eg. via a manifest file) and carry on with the analysis.

Still, I need to put a warning. If you going to denoise all the samples (1st and 2nd runs ) together be careful to use deblur. That because dada2 assume samples are from the same run.

If you want to use dada2, you can denoise separately 1st and 2nd run (be careful to use same denoising settings) then discard the unwanted samples from first run, and finally merge the abundance and representative sequences form the two processes.

Hope make sense
Luca

moonlight · November 12, 2019, 2:53am

Hi Luca,

Thanks for the reply. I understand the 2nd way. Will you please explain the 1st way.
If you can tell me what scripts that I may use, it'll be better.

"keep all samples in and called the sample lot (from the 2nd run) with different ids for demultiplexing and so to be recognisable in the analysis. So you can exclude the failed samples after the denoising step."

I don't quite understand. "the sample lot'? The barcode that I used in the 1st run and 2nd run are same. For example, if sample 1 doesn't work in the 1st run sequencing, I will use the same barcode to do PCR and submit to sequencing center.

llenzi · November 12, 2019, 11:19am

Hi John,

I see I wrote in a confuse way, I'll try to be more clear. Sorry for that.
What I meant is to use different ids in the demultiplexing steps.
Given you will have to demultiplex them separately, you could call the samples obtained in the first run as:

sample-id barcode
ID1-1 bc1
ID2-1 bc2

and for the second run:
sample-id barcode
ID1 bc1
ID2 bc2

At this point you you can merge at any steps you want but also you will be able to recognise them by id and filter them as well. So for example, you could:

a) import sequences from run1 and run2 in the same qiime2 object
b) denoise them with debleur
c) exclude samples ID1-1 and ID2-1using qiime filter plug-in

Unfortunately, I always get already demultiplexed files so I mainly guessing what the steps would be but I'm happy to help you more if you need

Luca

Nicholas_Bokulich · November 14, 2019, 2:27am

You could. See qiime demux filter-samples

You will still want to merge the two separate sequencing runs after denoising, at least if you are using dada2, as @llenzi recommends.

Good luck!

moonlight · November 14, 2019, 2:46am

Hi Nick,

Thanks. I will try and just double check. I just switch from Qiime 1X to 2X.

Unlike Qiime1, Qiime2 has 2 steps, demulitplex and QC are two steps (“Moving Pictures” tutorial — QIIME 2 2019.10.0 documentation). What you mean denoising is 2nd step, which is Quality control in the tutorial, right?

For dada2 workflow, if I have multiple sequencing runs, I should always to do denoising separately for each run and combine them later.

I can't combine them after demultiplexing and using total combined file to run data2 QC workflow.

Thanks,

Nicholas_Bokulich · November 14, 2019, 4:28am

Yes

In theory there are ways but there is no need to discuss them, since this would cause problems for dada2.

So in your case you can:

demultiplex each run separately
use qiime demux filter-samples to remove samples with low read counts after demultiplexing
denoise each run separately with dada2
merge the feature tables and sequences

system · December 15, 2019, 10:29am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.