Differences in representative sequences between QIIME1 and QIIME2

Hi,

I was wondering if you could possibly offer any helpful explanations we are seeing between the representative sequence outputs of QIIME1 and QIIME2.

We are using the standard workflow described on the QIIME website for the analysis of fungal ITS data (Illumina). Here are the results we get with the same dataset:
In QIIME1 using a 97% cutoff we get 1570 representative sequences.
In QIIME2 we get 132 representative sequences.

Now, I know that the clustering algorithms are completely different between both versions of QIIME and that QIIME1 inflated the number of OTUs. That’s fine and explains the different numbers of rep seqs.

The problem we have with this output is that we seem to be losing whole taxons in QIIME2. For example, if I blast the representative sequences with an ITS of a specific fungal species, the matches are as follows:

For QIIME1:
Hit e-value percent_identity
New.CleanUp.ReferenceOTU729 0.0 98.574
New.ReferenceOTU91 0.0 95.382
New.CleanUp.ReferenceOTU2422 0.0 92.323
New.CleanUp.ReferenceOTU994 0.0 91.242
New.CleanUp.ReferenceOTU2364 1.95e-166 95.616
All these sequences, when blasted against the NR GenBank database, are matched to the species that was searched for, or relatives from the same genus.

For QIIME2:
Hit e-value percent_identity
760c2f565f24f093c8d8b433d5946594 1.78e-62 94.040
760c2f565f24f093c8d8b433d5946594 1.90e-17 95.918
3ec608156d46f6e10b772b0f5142f6a4 6.46e-57 90.123
3ec608156d46f6e10b772b0f5142f6a4 8.84e-16 93.878
a0b877aa80232376ae6e0cd4b86e69bb 6.46e-57 90.123
The e-values here are evidently much lower. The hits, when searched against the GenBank, are matched to species that are not only different from the one used in the initial query (against representative sequences), but belong to a different fungal subphylum.

The taxonomy output of QIIME corresponds to this - taxa that were clearly present in QIIME1, are completely gone in QIIME2.

Are we making some obvious mistake here?

Hi @cdeai,

Could you provide a little more information? Are you using dada2, deblur, or OTU picking (with q2-vsearch) in QIIME 2? What taxonomy classification methods are you using in each? What quality control/trimming methods?

What OTU picking method are you using in QIIME 1 (e.g., open or closed ref or de novo)?

The sequence counts seem pretty sensible and consistent with other comparisons we have made. Denoising methods in QIIME 2 do a good job of removing noisy low-abundance sequence variants that are treated as true signal by OTU picking methods.

This seems a little more like something is going wrong.

  1. Are these paired-end sequences? Are you merging these both in QIIME 1 and QIIME 2? If so, I suspect something is going wrong with the merge and possibly with the quality trimming/filtering parameters.
  2. Are your sequences in mixed orientations? The classify-sklearn method in QIIME 2 does not handle these well — they can confuse the classifier, leading to some low-quality classifications. The classify-consensus-vsearch method will be much more robust.

Please investigate both of those possible issues and let us know what you find!

Thanks for the answer, I'll try to reply as best as I can.

OTU picking

QIIME1

pick_open_reference_otus.py

QIIME2

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trim-left-f 12
--p-trim-left-r 8
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-trunc-q 20
--o-table denoisetable.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza
--p-n-threads 8
The Amplicon length is 460 bp and the sequencing was done with Illumina at 2x300 bp.

Taxonomy classification is done by:

qiime feature-classifier classify-sklearn
--i-classifier $DBPATH/dynamic_classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy.qza

However, the problem seems to come before the classification step, because already the rep-seqs file does not contain the sequences expected in it.

Other questions

Are these paired-end sequences? Are you merging these both in QIIME 1 and QIIME 2?

Yes and yes.
Indeed, while for some samples the number of the joined and filtered sequences is comparable between both QIIME versions (+-5%), for other samples QIIME2 outputs by an order of magnitude fewer joined sequences. So this might well be the source of the problem, I have no idea how I did not notice it sooner, sorry.
Are we using wrong trimming/filtering parameters?
Why would we see the differences between QIIME1 and QIIME2 only with some samples, since all were amplified with the same primers and sequenced on the same chip?

Are your sequences in mixed orientations?
No, they are not mixed, I re-checked just in case.

Hi @cdeai,
Thank you for the details.

Open-ref OTU picking vs. dada2 is like comparing apples vs. oranges, and this certainly explains the differences you observe in sequence counts and alpha diversity.

I will note that ITS is hypervariable so 460 bp may be the average length you are expecting, but many will be higher/lower than this. This also makes read-through a big issue with ITS (i.e., if read lengths are longer than amplicons, the sequencer will read past the amplicon and into the adapter sequence, adding non-biological sequences into your reads).

A few things on that note:

  1. you do not mention if/how reads were joined in QIIME 1.
  2. you must inspect your dada2 stats file (please also share it here as a QZV along with the output of demux summarize if you have additional questions — convert it to a QZV with qiime metadata tabulate)
  3. you do not mention if/how you trimmed amplicons in QIIME 1 or 2, specifically to address the read-through issue. You should use either q2-itsxpress or q2-cutadapt for this purpose (note the two are not equivalent, but will address read-through in their own ways).

I see a few possibilities:

  1. dada2 read joining is failing for many of your sequences, causing expected sequences to drop out. You need to inspect the stats summary to make sure you are not losing too many sequences, especially at merging.
  2. you have read-through in QIIME 2 but not QIIME 1 (possibly because you did not join in QIIME 1? so the single-end seqs are not long enough to have this problem), so the non-biological DNA is causing classification and alignment-to-expected issues.

aha! My worries exactly.

Not a problem, this is usually a cryptic problem. :smile:

Need to see the QZVs mentioned above to decide.

ITS is hypervariable, so different species (and even strains of same species) have different lengths. So unless if all samples have the exact same species, it is expected that systematic length biases will skew samples selectively.

I agree, this sounds like a denoising/OTU picking problem, not taxonomy classification.

I am curious, how are you deciding what are expected sequences? Are you sequencing positive controls or is this just based on previous examination of these samples?

I'll try to answer as much as I can as succinctly as possible.

So it seems that the problem is in the filtering/merging/dada2 step.

I know that ITS is variable in length (although the amplicons we got, at least the ones visible on the gel, were more or less of comparable size - a nice band with little smear (and if any, it was longer, not shorter)). In any case if this was the problem, I would expect QIIME1 to fail in merging as well.

You do not mention if/how reads were joined in QIIME 1.

I used:

multiple_join_paired_ends.py

parameters:

join_paired_ends:min_overlap10
join_paired_ends:perc_max_diff15
split_libraries_fastq:phred_offset 33

you do not mention if/how you trimmed amplicons in QIIME 1 or 2, specifically to address the read-through issue

No trimming on the right side of reads was done in any case, because the merging worked well without it in QIIME1 and no read-through was expected (also a selection of the un-paired reads were checked and they looked like they simply failed due to low quality).

Stats files

datasummary.qzv (285.7 KB)
denoisetable.qzv (310.9 KB)

you have read-through in QIIME 2 but not QIIME 1 (possibly because you did not join in QIIME 1? so the single-end seqs are not long enough to have this problem), so the non-biological DNA is causing classification and alignment-to-expected issues.

I did the joining step in both QIIME1 and QIIME2. For QIIME1 the summary of the samples was as follows (many more joined reads than in QIIME2):
Summary.txt (590 Bytes)

I am curious, how are you deciding what are expected sequences? Are you sequencing positive controls or is this just based on previous examination of these samples?

Based on previous examination of these samples. Also, if we see specific taxons in QIIME1 representative sequences, but nothing even remotely similar in QIIME2, something is clearly wrong.

Thanks!

1 Like

I agree, merging should be pretty straightforward, but dada2 and qiime 1's multiple_join_paired_ends.py probably have different parameters (e.g., dada2 requires a minimum of 20 bp overlap; not sure about mismatch thresholds). This seems like a minor difference, given the lengths you are working with, but something to keep in mind.

Got it! Thanks for the summaries. The sample names look different but based on how I think they match up, indeed it looks like dada2 is dropping lots of sequences, especially from a few samples.

dada2 will output a stats file — could you please share the QZV summary of that file? That will really clarify when dada2 is dropping sequences (e.g., if merging is the issue).

All I can say is never trust previous observations or make assumptions about which one is correct. We immediately run into the two watches problem. Unless if you are using a sample with absolutely known composition, you can't cling too strongly to expectations. Alternative hypothesis: many of the sequences that qiime1 detects are noisy, error-riddled, chimeric, and otherwise trash. It is stuff like that that dada2 and other denoisers are designed to eliminate, but I don't want to make it sound like I am arguing that dada2 is not the problem here, because:

so far it sounds like you have a merging issue with dada2, causing a vast disparity. Let's see that stats file, fix merging, and then we can discuss which watch is telling the right time! :clock1: :confused: :clock10:

dada2 will output a stats file — could you please share the QZV summary of that file? That will really clarify when dada2 is dropping sequences (e.g., if merging is the issue).

Yes, merging seems to be the issue. I should have uploaded the stats file in my previous reply, but there are so many different files in QIIME2 and I'm still figuring out the many options ... Here it is:
denoising-stats.qzv (1.2 MB)

I have no idea how to identify the parameter causing the problem without simply trying to change the parameters and see what happens. Is that the way to go?

All I can say is never trust previous observations or make assumptions about which one is correct. We immediately run into the two watches problem. Unless if you are using a sample with absolutely known composition, you can’t cling too strongly to expectations. Alternative hypothesis: many of the sequences that qiime1 detects are noisy, error-riddled, chimeric, and otherwise trash. It is stuff like that that dada2 and other denoisers are designed to eliminate, but I don’t want to make it sound like I am arguing that dada2 is not the problem here, because:

I agree, of course, I'm not saying QIIME2 should give us the same output as QIIME1, because what would then be the point in using QIIME2 in the first place :slight_smile: However, if QIIME1 produces a representative sequence that is almost a perfect match to a real ITS sequence in GenBank, while in QIIME2 this sequence is gone, and this happens for several unrelated taxa, it is difficult to imagine that these taxa would be fabricated in QIIME1 as an artefact (in the "artefact is any error in the perception or representation of any information" meaning of the word), without the taxon actually being present in the sample. Any thus fabricated trash sequences would hopefully not match very well any real taxa in the database (?).

1 Like

Yes trial-and-error is one viable solution, particularly if you don't know precisely how much overlap there should be (which can be the case with ITS).

qiime demux summarize will produce a length distribution summary that you can view in the "interactive quality plot" tab — however, your summary seems to have been generated using an old version of :qiime2: (2018.4), and does not have this feature. You could update to the latest release and re-run to get this information.

Not a problem! That's why I am here. :smile:

I agree — that is a reasonable test case. Sounds like you have narrowed down the issue — it is definitely a dada2 merging problem.

Let us know if you are able to solve that merging issue!

Yes trial-and-error is one viable solution, particularly if you don’t know precisely how much overlap there should be (which can be the case with ITS).

We tried to optimize the --p-trunc-q parameter now. The highest sequence counts were observed at value 8 (but still three times lower in some samples than in QIIME1 and we are still losing lots of sequences in the merging step): denoising-stats-8.qzv (1.2 MB)

Below --p-trunc-q 8 the counts drop. This is a bit confusing - usually we use the quality 20 as the threshold (or is this something completely different from the typical FASTQ quality score?), but I now see that the default value is actually 2 (??) denoise-paired: Denoise and dereplicate paired-end sequences — QIIME 2 2018.8.0 documentation
At (default?) value 2 our counts fall to 100-10 000, which is unusable, of course. However, in this case the sequences do not fail the merging step, but actually at the filtering step.
Can you comment on that? Why would the less stringent quality truncation parameter lead to almost all reads failing the filtering? The counts of filtered sequences stay more or less the same or increase slightly when dropping the --p-trunc-q from 20 to 15, 10 and 8, but between 8 to 6 something happens and suddenly almost all sequences get filtered out.
This is at --p-trunc-q 2
denoising-stats-2.qzv (1.2 MB)

We also increased the --p-max-ee from 2 (default) to 10, but this again decreased the sequence counts.

Yes, the level of merging loss is still unacceptably high.

–p-trunc-q uses FASTQ quality scores, so you are correct about that. But the --p-max-ee parameter you see is different. That parameter decides whether to drop reads that are too noisy, based on the number of likely errors. So if you set –p-trunc-q too low (or do not trim at all), you wind up losing more sequences because more noisy bases are being retained in the reads prior to filtering, causing them to be tossed during the filtering step.

Let's imagine we have a sequence with high quality (-) and low-quality (*) bases:

---------------------------------------*-------*-----**************

If we read that whole read in, it will be thrown out by dada2 because it has too many low-quality bases. Now let's try trimming the read (and imagine this is manual trimming based on average quality profiles of all your sequences):

---------------------------------------*----

Okay, that will pass because now it only contains one probable error. But it might not merge, since it is much shorter!

Let's try a third way, with --p-trunc-q. If we set that parameter to 8, you might wind up with the scenario above (clean enough to pass filter but too short to merge) or you may wind up with something like this:

---------------------------------------*-------*-----

That contains > 2 erroneous bases, so will not pass.

QIIME 1 used a quality filtering algorithm not unlike the --p-trunc-q but with some additional features for controlling trimming a bit more (QIIME 2 retains access to this method in the q2-quality-filtering plugin). But then QIIME 1 also did not have such a persnickety denoising method (well, it had none — OTU clustering is a sort of rough denoising method and is not squeamish about clustering together noisy with good sequences... dada2 defaults scrub out the very noisy before attempting to denoise the sort of noisy sequences).

So for dada2 there is a tough balance to strike here:

  1. you need your sequences to contain few erroneous bases so that they pass filter (use p-trun-q with a sufficiently high value or manual trimming to achieve this)
    BUT
  2. you need the sequences to be long enough so that they merge successfully.

Another option is to loosen up the dada2 parameters a bit, e.g., set --p-max-ee higher so that fewer reads will be tossed out.

Aha, so you already tested that — could you share the dada2 stats?

You could also try deblur instead of dada2, and see this tutorial for steps to join, quality filter, and then deblur. This will use more qiime1-esque read joining and pre-filtering, but then use deblur for denoising instead of qiime1-style OTU clustering methods (which you can still use in QIIME 2, by the way — see here).

Last resort is just to use the forward reads as if you had single-end data.

In summary, this appears to be trouble with dada2 specifically, and the multiple filters that it performs. Your reads are moderately noisy — which is fine — but is causing trouble with dada2. You may want to try other denoising or OTU clustering methods in QIIME 2, which may just be more compatible with your data.

1 Like

Thank you very much, this actually makes the whole thing much clearer.

I'll be able to test all the helpful suggestions for the alternative merging steps tomorrow and I'll report on the results then.

For now just the test of the –p-max-ee you asked about - increasing the value to 10 (not 10.0, but I presume that wouldn't make a difference (?)) does not change much in the better-performing samples, while in the worst performing samples it actually decreases the number of successfully merged sequences:
denoising-stats-8.qzv (1.2 MB)
denoising-stats-ee10-8.qzv (1.2 MB)

1 Like

Very interesting. It does do what it should — effectively zero sequences are filtered prior to denoising — but this seems to lead to fewer merges, which is unexpected. I can't really think of a reason for that, as it seems that this just causes an additional small number to pass the pre-filter, and presumably the other reads are unaffected. Perhaps the additional noisy sequences are disrupting the error model?

A report on progress (and the whole thing gets weirder, at least to my eyes). I tested three things:

  1. Trimming the 3' primer sequences with cutadapt in case there would be a significant amount of read-through reads in the dataset.
    Result: No change in dada2 paired-end results (compared to what was discussed below). No change in deblur either (point 2 below).

  2. The deblur workflow.

You could also try deblur instead of dada2, and see this tutorial for steps to join, quality filter, and then deblur. This will use more qiime1-esque read joining and pre-filtering, but then use deblur for denoising instead of qiime1-style OTU clustering methods (which you can still use in QIIME 2, by the way — see here).

Result:
Lots of joined sequences demux-joined.qzv (292.6 KB)
Nothing removed in the filtering step demux-joined-filter-stats.qzv (1.2 MB)
Only 35 :man_facepalming: sequences left after debluring deblur-table.qzv (315.4 KB)
The statistics file deblur-stats.qzv (205.6 KB)

I used the --p-trim-length 375 in the qiime deblur denoise-other and for the --i-reference-seqs I used the dynamic UNITE database imported with
qiime tools import _
_ --type 'FeatureData[Sequence]' _
_ --input-path $DBPATH/sh_refs_qiime_ver7_dynamic_s_01.12.2017.fasta _
_ --output-path $DBPATH/dynamic_otus.qza

  1. Single-end workflow on forward reads with dada2
    Result: Better, but still up to two thirds of sequences are removed - in this case in the step between the denoised to non-chimeric data denoising-stats.qzv (1.2 MB)

Does this mean there is a deeper problem with the input data I'm somehow missing?