processing ITS1& ITS2 samples

Hi everyone,

I prepared separate libraries for the ITS1 and ITS2 regions, but they were sequenced in the same run. I want to compare the total fungal community across my samples and am unsure at which stage (if any) the ITS1 and ITS2 data should be merged.

Is it acceptable to merge my dataset after the rarefaction step, or is it necessary to process ITS1 and ITS2 separately from the beginning and generate independent feature tables for each region?. I would appreciate any guidance on the best practice for this situation.

Thank you!

Hello!

I would split the datasets by region before the cutadapt step to remove primers separately, discarding sequences with no primers detected. Then I would process each region separately throughout the whole pipeline. If, for some reason, you want to run dada2 on both regions together, you can still do so, but for the rest of the analyses, you should split them.

Hello!

I see your answer, but I can't approve it; it is still in the queue. I will paste it here:

Hi,

Thank you for your response. The Genomics Core removed the adapters from the samples, so adapter trimming is not part of our data processing.

Even if adapters were removed, you may still want to delete biological primers you used during library preparation (PCR).

Could you please explain how merging the datasets earlier in the workflow might affect the analysis if our objective is to examine changes in the microbial community over time?

You will have a very strong "batch effect" because your dataset consists of 2 ITS regions. Huge differences between them may mask other effects within the study.

Also, when you mentioned splitting the samples for the rest of the analysis, did you mean that I should calculate the diversity metrics separately for each dataset?

Yes, I would run all analyses separately for both regions and compare results.

Hi,

Thank you for your response. The Genomics Core removed the adapters from the samples, so adapter trimming is not part of our data processing. Could you please explain how merging the datasets earlier in the workflow might affect the analysis if our objective is to examine changes in the microbial community over time? Also, when you mentioned splitting the samples for the rest of the analysis, did you mean that I should calculate the diversity metrics separately for each dataset?

Hi, what do you mean by "cutting sequences with no primer sequence detected," and how can I perform this step?

Hi, check the cutadapt plugin description for the exact command, there should be something like "--discard_untrimmed", it will pass into the output only sequences with detected and removed primers.

1 Like