How to combine results from two different analyses (different reads lengths and trimming methods) in one taxonomic bar plot

fgara · May 26, 2020, 2:24pm

Hi everyone,

I have two samples from two different NGS machines, one has a read length of about 220 bp while the other one has a read length of about 300 bp.

How can I analyse them separately and combine the results in one taxonomic bar plot?

Many thanks

jwdebelius · May 26, 2020, 3:08pm

Hi @fgara,

My best suggestion is that you denoise and then trim the sequences to the same length. That's been the best meta analysis solution in my experience.

best,
Justine

fgara · May 27, 2020, 11:32am

Hi @jwdebelius

Thank you for your kind reply!
I've tried it (I trimmed the sequences to 220 bp), but it didn't work for me, because I lost so many reads this way:

Sample-32 and sample-35 are the two samples I want to compare together in one taxonomic bar plot. Sample-35 has read lengths of about 220 bp, while the rest are of 250-300 bp.

Is there a way for me to trim and analyse them separately and only combine the results in one taxonomic bar plot?

Many thanks once again for your kind and generous help

jwdebelius · May 27, 2020, 3:00pm

Hi @fgara,

It looks like your mixed run is failing to merge. (DADA2 doesn't particularly like mixed runs either, just FYI). You might need to consider forward sequences only to be able to combine them. You may also find deblur easier, since this provides a fixed sequence length.

Best,
Justine

kmz · May 27, 2020, 3:16pm

You could create feature tables separately, for e.g two tables that have an OTU labels assigned for each sequence. Then you can merge the two feature tables, because it only cares about the OTU labels and not the sequences itself. Maybe this will work? But if you want to use the entire sequence, this method won't work I think.

jwdebelius · May 27, 2020, 3:47pm

You can do this, but you tend to still have the length signal, at least in the benchmarking I've seen.

Best,
Justine

fgara · May 28, 2020, 12:14am

Hi @jwdebelius and @kmz,

Thank you both for your kind replies!

Thank you @kmz for your advice on merging feature tables - following your advice, I researched more and found this on the forum:

Also:
https://docs.qiime2.org/2018.8/tutorials/fmt/

First we’ll merge the two FeatureTable[Frequency] artifacts, and then we’ll merge the two FeatureData[Sequence] artifacts. This is possible because the feature ids generated in each run of denoise-single are directly comparable (in this case, the feature id is the md5 hash of the sequence defining the feature).

Wow, creating md5 hash from the actual sequences is such a brilliant idea!
I will try merging the feature tables.

@jwdebelius thank you for sharing your insights - could you kindly explain more about the "length signal" that you mentioned please? Is it a bad thing / will that affect the downstream statistical analysis?

jwdebelius · May 28, 2020, 2:42pm

HI @fgara,

Technical effects (sequence length, primers, etc) can sometimes outweight the biological signal you're looking for, or can confound it. One easy way to solve this is to trim all your sequences to the same length. At best, it increases the noise in your data. At worst, it can actually confound biological signals.

So, I strongly encourage you to think about trimming to the same length.

Best,
Justine

system · June 28, 2020, 8:51pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.