DADA2 trunc ITS2

Hello,

I have a quick question in regard to truncating parameters for ITS2 (Its4Fun, 5.8s primers). I have looked through the tutorial, the forum and even went on google to see if I could find some help but I am still at a lost. I did try to install ITSx to test it out, but I keep getting an error (I can go into this in more detail if you guys suggest I should choose this method, as opposed to DADA2).

The problem
I am working with fungal data, using the primers stated above, and after playing around w the parameters for DADA2 (mostly the truncating parameter), I cannot seem to get over a 60% output from DADA2.

What I have tried:
I have ran over ~20 different parameters so far.

  1. If I choose smaller more stringent values for truncating <185, I get a high number of sequencing passing the filtering (70-80%) step but they all fall off at the merging step (end up with about 35-45%).
    Fungal-Stats-dada2-171-165.qzv (1.2 MB)

  2. Too high number results in similar results as #3

  3. The number that seems to work the best is 209-190 but I end up with ~60% passing the filtering step and an overall of ~58 after merging.
    I have tested various truncating parameters and I cannot manage to get over 60% after merging. Fungal-Stats-dada2-205-190.qzv (1.2 MB)

So I am a little stuck, is this good enough to proceed? If not, what am I doing wrong or how can I retain more sequences when processing through DADA2?

Here is my demux file. Fungal-demux-trimmed.qzv (298.6 KB)

For my code, I have also added the “–q-trunc” parameter as previously suggested, but whether I set it to 0 and or 2, it doesn’t seem to make a difference.

If you could give me some advice, I would really appreciate it.

Hi @Fabs,
ITS sequences are usually very tricky to merge, since the length is hypervariable. I would recommend just using the forward reads as single-end data instead of attempting to merge the paired-end reads, since that small number you are losing will be selectively excluding taxa with longer ITS domains. I believe my thoughts on this are expounded in more detail on the forum, and in the tutorial you linked to.

That margin of loss (~2% more lost between filtering and merging) is probably the best I have ever seen! (though to be fair I have not done ITS sequencing for a while, when read lengths were shorter)

Analyze the outputs from that parameter combination, and compare to processing the forward reads alone. This will give you a sense of (a) what taxa are being selectively excluded in the merged reads, (b) whether the paired-end reads really give you better taxonomic resolution and © whether either of those differences actually matter. Then you can decide which looks best.

Yeah 0 vs. 2 won’t make a big difference. Stick with what you’ve done above.

Good luck!

Thank you Nicholas :slight_smile:

I will go ahead and try running the analysis with only the forward read and keep you updated. Could it be possible that it is just this library (this specific samples). They are from a wildfire (2 wk after)? The only reason I ask, is because I have data from a prescribed fire, in the same general area, and for that library I was able to select truncating parameters of 247 (F) and (207) reverse and ended up w ~75% merge.

Sure, different taxa could be present, so will have different lengths and different issues with merging. Also different run quality will impact trimming/merging success.

That actually sounds worse; in the current run you are losing ~2-10% sequences at merging (most are lost at the filtering step). Losing 75% at merging is much worse.

Needless to say, all runs that you wish to merge should be processed in the same way.

Good luck!

Sorry, no I meant that after processing w DADA2, I was able to retain ~75% of the total data. So I lost 15% less than when I ran the library on the wildfire samples.

So after using only the forward read I am able to get the best results from DADA2 when I use a trunc parameter of 150, and I am able to retain about ~75% of the sequences. Do you think this is good enough to continue with the analysis and should I use only the forward read after all?

I have attached the file below
dada2-single-end-stats-150.qzv (1.2 MB)

Yes I realize that, but the important thing here is not the total yield (75% is high! so is 50%!), but the % lost during merging, since that is what will define how much potential bias exists.

Excellent! Yes, this sounds good.

As I recommended above, you could compare the results from single-end vs. paired-end data to determine if merging biases the data, and if it actually yields better taxonomic classifications. But that is not necessary.

I see, perfect! I will go ahead and do this :slight_smile:

Thank you very much

So after running the taxonomy on both samples, the taxonomy classifications are actually different. Suggestions?

Fungal-TJ-Single-Barplot.qzv (1.2 MB)

Fungal-TJ-Barplot.qzv (952.7 KB)

Note the metadata I used to create the samples is very simplified since this is what was given to me for the analysis.

Looks like using paired-end gets the same genus-level classifications as single-end, but paired end you have more species-level classifications, as you’d expect.

Otherwise the compositions look quite similar. These samples are quite diverse, though, and I am just “eyeballing” it. I think you are probably okay using paired-end, but if you want to use some quantitative methods to assess the differences you could (1) adjust the sample-IDs in one table, merge the tables together, and then build PCoA plots to see if samples cluster by sample ID or at least by the expected groupings, rather than by table. or (2) check out the methods in q2-quality-control for comparing these at genus level.