Recently worked with a new core for my 16s V3-V4 and ITS3-4 sequencing. The core is telling me that the protocols for 16S and ITS were the same, so primers from both were used in one reaction to generate a library containing both 16S and ITS sequences. In other words, they are telling me that provided FASTQ files contain both 16S and ITS results.
Given this is the case, I am wondering how one would compute metrics for alpha and beta diversity, where the table.qza file is the input. Would it even be possible to calculate alpha and beta diversity metrics separately for bacteria and fungi, given that the file contains all of the sequences from both kingdoms?
If others have experience with this, please let me know. Based on my experience, this seems strange for the core to have done.
Thanks and best!
Hi @hpyle50 ,
This seems to be increasingly common, so is not all that unusual.
No, you should not measure alpha or beta diversity on the merged table. Technically this might be okay for some metrics (for some it definitely is not), but even then it is probably more valuable to measure on the fungal and bacterial communities separately. Separating ITS from 16S will also be essential if you want to, e.g., estimate phylogeny...
The solution: filter out the sequences by aligning to a reference database, to create a set of sequences (and table) that are 16S-only and vice versa. You can use
qiime quality-control exclude-seqs to accomplish this via alignment against a reference (for which a representative subset of sequences will be fine, as this is just a rough filter). There is a tutorial for this in the online documentation for QIIME 2 if you would like a "recipe" to follow.
I have done a few trial runs, trying to pool three different markers (16/18/28S) together and then sequencing them.
Basically I was amplifying each markers individually, cleaning up pcrs individually and then pooling the three markers per sample before indexing and sequencing.
For the analysis I have tried a couple of different things...the one that I think was more straight forward was to separate each marker by primer sequence using CUTADAPT and then analyzing each marker individually.
When I tried aligning to a reference there was always some reads from different markers that would align together. I could't figure it out why.
Thanks so much @Nicholas_Bokulich! For the reference sequences input in qiime quality-control exclude-seqs, would I use: 1) unchanged reference sequences (eg, * Silva 138 SSURef NR99 full-length sequences (MD5:
de8886bb2c059b1e8752255d271f3010) or 2) extracted reference reads from these sequences using qiime feature-classifier extract-reads or 3) classifier. I am assuming that I would go with option 1.
Also, do you have any advice for setting the –p-perc-identity and –p-perc-query-aligned parameters, given the kingdom-level filtering that I am trying to accomplish here?
Again, thank you so very much for your advice.
Hi @hpyle50 ,
Either option 1 or 2 will work as a reference.
It is tough to say off-hand. For kingdom-level filtering something like 60% similarity is probably "good enough", and as precedent I believe that this is around what deblur uses for rough filtering prior to denoising. It will not filter out noisy or chimeric seqs (which hopefully denoising should get rid of), but it should be enough to distinguish 16S from ITS.
But the %id setting could explain the poor separation that you described @HugoEira , but on the other hand you were trying to separate 16S and 18S, which would most likely require a somewhat higher threshold to differentiate...
@HugoEira 's suggestion to use q2-cutadapt is indeed a better way, and what I would prefer to use myself — but this would only work if primers are still present in the sequences! So does not cover all use cases. (and you have already denoised, @hpyle50 , so this would mean going back to the beginning. Alignment would let you do this on denoised seqs).
Good luck both!