analyzing cDNA single samples

raghid_bsat · June 11, 2019, 7:15am

I am having trouble understanding the steps of analyzing my cDNA samples.

So, the sequenced data that I have is single, and it's based on multiple regions, v3, v4, v67, v8, v9..

So I have multiple primers that are used in the sequencing process.

When I do analysis, here is what I did:
importing the files,
cutting the primers using cutadapt
denoising using dada2
generating taxonomic table using green genes.
generating bar plot.

The thing that's confusing me is the following:
when I cut the primers, do I cut ALL the 5 primers from the all the samples and carry on with the analysis?

OR

do I cut primer v3, carry on the analysis, store the data, then cut primer v4, carry on the analysis, store the data etc... till I cut every primer on its own and do the analysis for each trim I did?

What is the difference from both steps? Because it seems that I am getting different bar plot for the species present when I carry on these different steps.

Also, let's say I do the second technique I mentioned, would there be a way in qiime2 to merge these results?

Thank you!

Mehrbod_Estaki · June 11, 2019, 9:12am

Hi @raghid_bsat,
This is an interesting set of data, could you give us a bit more information about your overall set up and end-goals.
What is the sequencing platform here and your amplicon lengths?
Do you have the same samples sequenced at multiple regions or is this a collection of various samples across various studies that happen to target different sites?
What is the exact commands you are using to cut your primers?
What is the end goal here? To compare the same samples across multiple hypervariable regions to see if there are differences? Compare different samples of different sequenced regions?

Generally speaking, comparing samples sequenced at various hypervariable regions is a difficult task due to the inherited bias of these various regions and your samples will strongly cluster based on their regions confounding any true patterns. Tools like fragment insertion can certainly help with a situation like this, however this doesn't help you if the goal is to assemble the multiple regions of the same sample and use the ensemble sequence. The only tool I know that does this is SMURF which does not have a qiime2 plugin; I have never used this tool personally either.
If you do not have the same samples sequenced at multiple regions, and instead of results of multiple different runs, you will want to process each set separately, meaning any samples that came from the same sequencing run/PCR should be denoised together and merged together after. Once you have these you can create a tree using fragment insertion and assign taxonomy using this tree. From there on you would analyse your data as you normally would.

raghid_bsat · June 11, 2019, 9:44am

The data was sequenced with ion torrent.
The regions we're targeting is not more than 300 base pairs.
There are 9 samples, and all of them are sequenced at multiple regions, so 9 samples were collected, and sequenced with 4-5 primers.
I am using cutadapt, trim-single, trunc left 0 , trunc len 300,
The end goal is to make find the bacterial species that are within these samples, and to create an alpha/beta diversity between them. (let's say I want to do an alpha/beta diversity of samples 1-4 vs. samples 5-9)

Hope this would clear things.
Appreciate your help.

Mehrbod_Estaki · June 11, 2019, 7:12pm

Hi @raghid_bsat,
Thanks for the update.
In case you haven't already seen this but there are some special considerations for ion-torrent data, namely dealing with possible mixed orientation of the reads and special dada2 parameter recommendations. The former has been brought up on the forum before so you could have a search there if that's an issue in your case. I'm not familiar with Ion Torrent personally so excuse me if this is not relevant to your case.

As for your goal, it sounds like you are looking for the basics of microbiome analysis which can be done with any of the hypervariable regions and you don't actually need all of them. The easiest option would be to just use only one region of all the samples and use only those for all your analysis. The most common one being the V4 region. And later if you are interested in comparing your results between the different regions then there are some tools in qiime2 that can help with that. For example the procrustes-plot followed with a mantel test can tell you if there are overall differences between your regions.
If you're up for some additional work though, a much cooler options in my opinion would be to use SMURF as I mentioned above to assemble all the different regions first. In theory, this should be a much more accurate representation of the true composition than each region by itself so you can use this as your best data in all your analysis. Next, compare each individual region to this ensemble and see which of those regions gives you the closest composition to that. Not only would this be useful for future experiment designs, I this has potential to even spin off to a separate small benchmark publication on its own.

system · July 13, 2019, 1:12am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.