Differences in representative sequences between QIIME1 and QIIME2

I'll try to answer as much as I can as succinctly as possible.

So it seems that the problem is in the filtering/merging/dada2 step.

I know that ITS is variable in length (although the amplicons we got, at least the ones visible on the gel, were more or less of comparable size - a nice band with little smear (and if any, it was longer, not shorter)). In any case if this was the problem, I would expect QIIME1 to fail in merging as well.

You do not mention if/how reads were joined in QIIME 1.

I used:

multiple_join_paired_ends.py

parameters:

join_paired_ends:min_overlap10
join_paired_ends:perc_max_diff15
split_libraries_fastq:phred_offset 33

you do not mention if/how you trimmed amplicons in QIIME 1 or 2, specifically to address the read-through issue

No trimming on the right side of reads was done in any case, because the merging worked well without it in QIIME1 and no read-through was expected (also a selection of the un-paired reads were checked and they looked like they simply failed due to low quality).

Stats files

datasummary.qzv (285.7 KB)
denoisetable.qzv (310.9 KB)

you have read-through in QIIME 2 but not QIIME 1 (possibly because you did not join in QIIME 1? so the single-end seqs are not long enough to have this problem), so the non-biological DNA is causing classification and alignment-to-expected issues.

I did the joining step in both QIIME1 and QIIME2. For QIIME1 the summary of the samples was as follows (many more joined reads than in QIIME2):
Summary.txt (590 Bytes)

I am curious, how are you deciding what are expected sequences? Are you sequencing positive controls or is this just based on previous examination of these samples?

Based on previous examination of these samples. Also, if we see specific taxons in QIIME1 representative sequences, but nothing even remotely similar in QIIME2, something is clearly wrong.

Thanks!

1 Like

I agree, merging should be pretty straightforward, but dada2 and qiime 1's multiple_join_paired_ends.py probably have different parameters (e.g., dada2 requires a minimum of 20 bp overlap; not sure about mismatch thresholds). This seems like a minor difference, given the lengths you are working with, but something to keep in mind.

Got it! Thanks for the summaries. The sample names look different but based on how I think they match up, indeed it looks like dada2 is dropping lots of sequences, especially from a few samples.

dada2 will output a stats file — could you please share the QZV summary of that file? That will really clarify when dada2 is dropping sequences (e.g., if merging is the issue).

All I can say is never trust previous observations or make assumptions about which one is correct. We immediately run into the two watches problem. Unless if you are using a sample with absolutely known composition, you can't cling too strongly to expectations. Alternative hypothesis: many of the sequences that qiime1 detects are noisy, error-riddled, chimeric, and otherwise trash. It is stuff like that that dada2 and other denoisers are designed to eliminate, but I don't want to make it sound like I am arguing that dada2 is not the problem here, because:

so far it sounds like you have a merging issue with dada2, causing a vast disparity. Let's see that stats file, fix merging, and then we can discuss which watch is telling the right time! :clock1: :confused: :clock10:

dada2 will output a stats file — could you please share the QZV summary of that file? That will really clarify when dada2 is dropping sequences (e.g., if merging is the issue).

Yes, merging seems to be the issue. I should have uploaded the stats file in my previous reply, but there are so many different files in QIIME2 and I'm still figuring out the many options ... Here it is:
denoising-stats.qzv (1.2 MB)

I have no idea how to identify the parameter causing the problem without simply trying to change the parameters and see what happens. Is that the way to go?

All I can say is never trust previous observations or make assumptions about which one is correct. We immediately run into the two watches problem. Unless if you are using a sample with absolutely known composition, you can’t cling too strongly to expectations. Alternative hypothesis: many of the sequences that qiime1 detects are noisy, error-riddled, chimeric, and otherwise trash. It is stuff like that that dada2 and other denoisers are designed to eliminate, but I don’t want to make it sound like I am arguing that dada2 is not the problem here, because:

I agree, of course, I'm not saying QIIME2 should give us the same output as QIIME1, because what would then be the point in using QIIME2 in the first place :slight_smile: However, if QIIME1 produces a representative sequence that is almost a perfect match to a real ITS sequence in GenBank, while in QIIME2 this sequence is gone, and this happens for several unrelated taxa, it is difficult to imagine that these taxa would be fabricated in QIIME1 as an artefact (in the "artefact is any error in the perception or representation of any information" meaning of the word), without the taxon actually being present in the sample. Any thus fabricated trash sequences would hopefully not match very well any real taxa in the database (?).

1 Like

Yes trial-and-error is one viable solution, particularly if you don't know precisely how much overlap there should be (which can be the case with ITS).

qiime demux summarize will produce a length distribution summary that you can view in the "interactive quality plot" tab — however, your summary seems to have been generated using an old version of :qiime2: (2018.4), and does not have this feature. You could update to the latest release and re-run to get this information.

Not a problem! That's why I am here. :smile:

I agree — that is a reasonable test case. Sounds like you have narrowed down the issue — it is definitely a dada2 merging problem.

Let us know if you are able to solve that merging issue!

Yes trial-and-error is one viable solution, particularly if you don’t know precisely how much overlap there should be (which can be the case with ITS).

We tried to optimize the --p-trunc-q parameter now. The highest sequence counts were observed at value 8 (but still three times lower in some samples than in QIIME1 and we are still losing lots of sequences in the merging step): denoising-stats-8.qzv (1.2 MB)

Below --p-trunc-q 8 the counts drop. This is a bit confusing - usually we use the quality 20 as the threshold (or is this something completely different from the typical FASTQ quality score?), but I now see that the default value is actually 2 (??) denoise-paired: Denoise and dereplicate paired-end sequences — QIIME 2 2018.8.0 documentation
At (default?) value 2 our counts fall to 100-10 000, which is unusable, of course. However, in this case the sequences do not fail the merging step, but actually at the filtering step.
Can you comment on that? Why would the less stringent quality truncation parameter lead to almost all reads failing the filtering? The counts of filtered sequences stay more or less the same or increase slightly when dropping the --p-trunc-q from 20 to 15, 10 and 8, but between 8 to 6 something happens and suddenly almost all sequences get filtered out.
This is at --p-trunc-q 2
denoising-stats-2.qzv (1.2 MB)

We also increased the --p-max-ee from 2 (default) to 10, but this again decreased the sequence counts.

Yes, the level of merging loss is still unacceptably high.

–p-trunc-q uses FASTQ quality scores, so you are correct about that. But the --p-max-ee parameter you see is different. That parameter decides whether to drop reads that are too noisy, based on the number of likely errors. So if you set –p-trunc-q too low (or do not trim at all), you wind up losing more sequences because more noisy bases are being retained in the reads prior to filtering, causing them to be tossed during the filtering step.

Let's imagine we have a sequence with high quality (-) and low-quality (*) bases:

---------------------------------------*-------*-----**************

If we read that whole read in, it will be thrown out by dada2 because it has too many low-quality bases. Now let's try trimming the read (and imagine this is manual trimming based on average quality profiles of all your sequences):

---------------------------------------*----

Okay, that will pass because now it only contains one probable error. But it might not merge, since it is much shorter!

Let's try a third way, with --p-trunc-q. If we set that parameter to 8, you might wind up with the scenario above (clean enough to pass filter but too short to merge) or you may wind up with something like this:

---------------------------------------*-------*-----

That contains > 2 erroneous bases, so will not pass.

QIIME 1 used a quality filtering algorithm not unlike the --p-trunc-q but with some additional features for controlling trimming a bit more (QIIME 2 retains access to this method in the q2-quality-filtering plugin). But then QIIME 1 also did not have such a persnickety denoising method (well, it had none — OTU clustering is a sort of rough denoising method and is not squeamish about clustering together noisy with good sequences... dada2 defaults scrub out the very noisy before attempting to denoise the sort of noisy sequences).

So for dada2 there is a tough balance to strike here:

  1. you need your sequences to contain few erroneous bases so that they pass filter (use p-trun-q with a sufficiently high value or manual trimming to achieve this)
    BUT
  2. you need the sequences to be long enough so that they merge successfully.

Another option is to loosen up the dada2 parameters a bit, e.g., set --p-max-ee higher so that fewer reads will be tossed out.

Aha, so you already tested that — could you share the dada2 stats?

You could also try deblur instead of dada2, and see this tutorial for steps to join, quality filter, and then deblur. This will use more qiime1-esque read joining and pre-filtering, but then use deblur for denoising instead of qiime1-style OTU clustering methods (which you can still use in QIIME 2, by the way — see here).

Last resort is just to use the forward reads as if you had single-end data.

In summary, this appears to be trouble with dada2 specifically, and the multiple filters that it performs. Your reads are moderately noisy — which is fine — but is causing trouble with dada2. You may want to try other denoising or OTU clustering methods in QIIME 2, which may just be more compatible with your data.

1 Like

Thank you very much, this actually makes the whole thing much clearer.

I'll be able to test all the helpful suggestions for the alternative merging steps tomorrow and I'll report on the results then.

For now just the test of the –p-max-ee you asked about - increasing the value to 10 (not 10.0, but I presume that wouldn't make a difference (?)) does not change much in the better-performing samples, while in the worst performing samples it actually decreases the number of successfully merged sequences:
denoising-stats-8.qzv (1.2 MB)
denoising-stats-ee10-8.qzv (1.2 MB)

1 Like

Very interesting. It does do what it should — effectively zero sequences are filtered prior to denoising — but this seems to lead to fewer merges, which is unexpected. I can't really think of a reason for that, as it seems that this just causes an additional small number to pass the pre-filter, and presumably the other reads are unaffected. Perhaps the additional noisy sequences are disrupting the error model?

A report on progress (and the whole thing gets weirder, at least to my eyes). I tested three things:

  1. Trimming the 3' primer sequences with cutadapt in case there would be a significant amount of read-through reads in the dataset.
    Result: No change in dada2 paired-end results (compared to what was discussed below). No change in deblur either (point 2 below).

  2. The deblur workflow.

You could also try deblur instead of dada2, and see this tutorial for steps to join, quality filter, and then deblur. This will use more qiime1-esque read joining and pre-filtering, but then use deblur for denoising instead of qiime1-style OTU clustering methods (which you can still use in QIIME 2, by the way — see here).

Result:
Lots of joined sequences demux-joined.qzv (292.6 KB)
Nothing removed in the filtering step demux-joined-filter-stats.qzv (1.2 MB)
Only 35 :man_facepalming: sequences left after debluring deblur-table.qzv (315.4 KB)
The statistics file deblur-stats.qzv (205.6 KB)

I used the --p-trim-length 375 in the qiime deblur denoise-other and for the --i-reference-seqs I used the dynamic UNITE database imported with
qiime tools import _
_ --type 'FeatureData[Sequence]' _
_ --input-path $DBPATH/sh_refs_qiime_ver7_dynamic_s_01.12.2017.fasta _
_ --output-path $DBPATH/dynamic_otus.qza

  1. Single-end workflow on forward reads with dada2
    Result: Better, but still up to two thirds of sequences are removed - in this case in the step between the denoised to non-chimeric data denoising-stats.qzv (1.2 MB)

Does this mean there is a deeper problem with the input data I'm somehow missing?

The issue with deblur appears to be that many of the sequences are unique and/or do not resemble the reference sequences you input (e.g., are non-fungal). See the fraction-artifact-with-minsize column in the stats visualization. You can adjust the min-size parameter to correct for this, if unique seqs are to blame (as I suspect) — however the deblur developers have warned against this, e.g., see here (that user has a similar issue with deblur).

Aha, yes that could be the issue — if these sequences are being filtered out as chimera on single-end, they are probably noisy reads that have issues passing filter/merging with the paired-end. QIIME 1 does not use chimera filtering by default so that could explain part of the discrepancy.

Personally, I think I would proceed with the single-end data. It is probably better to have slightly shorter reads rather than proceed with longer joined reads but potentially introduce an amplicon length bias (which is essentially what is occurring, and clearly impacting some samples more than others).

But I recognize that is not satisfying, when QIIME 1 seems to yield better results (the possibility that chimeric seqs are passing through and masquerading as real data may taint that assumption, though, depending on if you used a chimera filter with the qiime1 results). You could also use q2-vsearch to perform OTU clustering and see if that performs better for your data. I linked to the OTU clustering tutorial above. Your workflow would look like this:

  1. use q2-quality-filter to trim/filter sequences
  2. use vsearch dereplicate to dereplicate seqs
  3. use q2-vsearch to cluster
  4. use q2-vsearch to filter chimera

The chimera filter seems to suggest that it is a problem with the data — the sort of problem that can be fixed. Use single-end or try again with q2-vsearch and compare against (chimera-filtered) QIIME 1 results to see how they square up. Let us know what you find!

Thanks!

I checked back to my logs and I did actually attempt to do the chimera filtering step in QIIME1, but it removed no sequences.

So, reading on a similar problem (Many chimeric reads after dada2, but only in some samples) I understand that all of the above means, that the large portion of the chimera sequences is “real” (i.e. arising in the amplicon preparation step, possibly due to low amount of template DNA we were working with) and not a problem with the sequencing or analysis. In other words, it is safe to assume that the sequences that are removed, should be removed and that this improves the result rather than introduce bias to it. The reasonable solution in this case is therefore to use the single-end dada2 protocol.

In any case most samples look much better now, so single-end seems to be the way to go. Am I right?

1 Like

I agree, I think that is probably the best course of action and the correct interpretation of these results.

The read yields are looking better for those samples that had read joining issues with dada2 denoise-paired, so yes I think this is the safer route (since you are not systematically biasing samples that have longer amplicons). But you have not mentioned anything about how the results look, e.g., regarding the false positives/negatives that you mentioned seeing with dada2 denoise-paired at the start of this topic thread. I would be curious to hear what you find!

True, I forgot to mention that. In single-end data the sequences are there, as expected. As a bonus, the number of slightly different representative sequences almost identical to the same taxon is much lower than in QIIME1, so the sequence list looks less inflated (as expected from QIIME2).

In short, single-end dada2 workflow produces the result that is from what I can tell most credible based on the biological background of the samples and throws away the lowest number of sequences (and, as you say, avoids introducing the amplicon length bias).

1 Like

Finally - many thanks for all the help in this long discussion!
.
.
.
And as a reference for anyone who might have a similar problem, a short summary of conclusions:

  1. Fungal ITS 460 bp amplicon, paired-end 2×300 bp Illumina sequencing.
  2. In some samples over 90% reads failed to merge with paired-end dada2.
  3. The results of QIIME1 looked noisy/inflated, but they also contained sequences almost perfectly matching to taxons, which dissapeared in paired-end dada2.
  4. Deblur workflow performed even worse than paired-end dada2.
  5. Single-end dada2 (on just forward reads, after removing the primers with qiime cutadapt trim-single) performed best.
  6. In some samples even with single-end dada2 many (up to two thirds) of sequences were removed as chimeras - but this is most likely not a problem of sequencing/analysis, but an indication of real chimeric sequences in the data (i.e. error introduced during the amplicon preparation, possibly due to very low amount of template in some samples).
2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.