Question about experimental design

Edoardo_Scali · September 18, 2023, 11:21pm

Dear community,
I have a question about experimental design about one experiment that I am currently involved.

After DNA extraction we applied a PCR protocol using primers for 3 regions (two ITS region, and 16S). within all the regions we had 5 primes F and 5 primers R: consisting of an Illumina adaptor stub, a variable length region of 1-5 N (used as a code), and then the primer.

We performed a AMPure bead cleaning to remove prime dimers.

For each samples we have three different amplicons set: two ITS and one 16S.

What we see on the gel is two prominent bands: one in the 380-400 bp range (expected), and another at 80 bp which we can't determine what that is.

Our concern is that in an illumina run the 80 bp will occupy all the signal.

My PI is concerned about the 80 bp band, and we want to know why it happened and if we can filter this out somehow. Does anyone have an idea about what happened and if it is possible to filter it out?

Thank you so much for your help!

colinvwood · September 19, 2023, 11:43pm

Hello @Edoardo_Scali,

I'm not sure why you see the 80bp fragment, this seems like a complicated library prep so could be some type of dimer. As far as removing it, there are multiple ways to select only fragment lengths of interest like size select gels or size selection using magnetic beads.

Edoardo_Scali · October 4, 2023, 8:19am

Hello @colinvwood,

Thank you very much for your response.

Actually, I would like to know if there is a way to handle these sequences of different lengths and separate them using one of the tools available in qiime2.

I'm curious about what these 80bp sequences correspond to and was wondering if trying to filter them and then perform classification could be the right option.

I also have another question. The samples I am analyzing contain batches of three barcodes (ITS1, ITS2, 16S). Is there a way to perform taxonomic classification in a single command, or do I need to run my classification on the file containing the rep-seqs first with an ITS database and then with a 16S database, and eventually combine the taxonomy outputs (taxonomy.qza)?

Thank you very much for your help.

colinvwood · October 4, 2023, 5:17pm

Hello @Edoardo_Scali,

Actually, I would like to know if there is a way to handle these sequences of different lengths and separate them using one of the tools available in qiime2.

Yes, if you choose to sequence them there will be ways to filter out the shorter sequences. As you said previously however, this isn't in your best interest if these sequences are unexpected because they'll take up sequencing resources.

You could sequence of the 80bp bands from one or two of the samples only to see what it is if you think it's of interest.

I also have another question. The samples I am analyzing contain batches of three barcodes (ITS1, ITS2, 16S). Is there a way to perform taxonomic classification in a single command, or do I need to run my classification on the file containing the rep-seqs first with an ITS database and then with a 16S database, and eventually combine the taxonomy outputs (taxonomy.qza)?

You won't be able to perform classification in a single command I don't believe, you'll need separate databases. Combining taxonomies is possible, yes. However there are always certain assumptions that are broken when you combine results that differ in key variables such as (in your case) the targeted region and the classification database.

Edoardo_Scali · October 4, 2023, 9:39pm

Hello @colinvwood,

The issue is that the research group that asked for my assistance in understanding what happened has already sequenced a small portion of the samples. The question they have asked me is whether these samples that have already been sequenced have contamination in the 80bp band. They would also like to know if the rest of the sequences are usable or not.

I realize it's not an ideal situation, but I'd like to be of help in this work. Do you have any suggestions on how I can proceed?

Regarding the question of classification with two different databases, are there any protocols that have been established for this scenario that I can refer to?

Thank you very much for your help!

colinvwood · October 4, 2023, 10:30pm

Hello @Edoardo_Scali,

The issue is that the research group that asked for my assistance in understanding what happened has already sequenced a small portion of the samples. The question they have asked me is whether these samples that have already been sequenced have contamination in the 80bp band. They would also like to know if the rest of the sequences are usable or not.

This is in some ways good news. Now you can look at the samples that were sequenced and blast some of the 80bp sequences to see what they are. That'll inform your decision for the remaining samples (whether or not to remove them). You can also see how much of the sequencing bandwidth the 80bp sequences are taking up.

Regarding the question of classification with two different databases, are there any protocols that have been established for this scenario that I can refer to?

I'm not aware of any protocol, but if you search on this forum for e.g. "combining ITS and 16S data" there are lots of discussions.