analyzing single-end reads and paired-end reads together in qiime2

Hello everyone,
I am trying to compare the results of HMP data with my current microbiome dataset. HMP data is single-end whereas my data is paired-end. So, I first demultiplexed the files, construct the feature table via DADA2 for HMP and TMB separately, then merge the feature tables and representative sequences. The issue is, after merging, the taxa-bar plot looks something like this:

I only see the taxonomy of HMP and my project files go unannotated. Can anyone help me understand the issue?


Hi @Shivani2211!

Unfortunate taxonomic results aside, I think the first problem is actually with the merging step. Your data has more taxonomic resolution than the single-end (probably anyways), and so the ASVs that DADA2 picks for each study are going to be different (by virtue of length). That means your merged feature table can never actually shares features between HMP and your study, making downstream analysis a little bit futile.

Assuming you have the same primer position on the forward, you could instead only look at the forward strand and use identical trim/trunc parameters as the HMP data. Then your ASVs would start at the same position and be exactly the same length, allowing the merge to find overlapping features between the two studies.

Alternatively, this is use case can be a good reason to use closed-reference OTU picking (which can be done after denoising so your data looks better). Or you could also collapse your data by taxonomy using the same database for both studies (bringing us to your current issue). But in either case we are supplanting the ASV for some other feature which acts as a shared language between the studies.

As for why your taxonomy looks so stark for your project, I can’t really say. What kind of samples are you collecting, and which taxonomic database are you using? Additionally, what is your primer pair?


Actually after DADA2, I independently clustered the sequences from both projects to 99%. Since, OTU table only has the taxonomy reference from greengene database, I think the sequence length shouldn’t be an issue.

Actually i checked the rep-seqs for both the projects and saw that after clustering 600 OTUs (almost) were merged from both set of data. However, still the taxonomy goes unannotated for my data.

Also, I am working on your suggestion of trimming the forward file of my data and HMP data to same length. Will let you know after I get the results.

Hi @ebolyen!

I tried your suggestion adn used only the forward reads of my dataset, trimmed the HMP data and my data at the same positions , and the merge table feature worked fine!

However, I tried another way. I combined the fastq files of HMP dataset and the forward reads of my data in one folder and did demultiplexing and denoising steps. But while doing the taxonomy analysis, I got the following error:


It's surprising because when I deal with the same dataset separately and merge (as you suggested), the result is fine!
Can you please let me know where I am going wrong?

Hi @Shivani2211,

If you are using DADA 2 I would avoid this, as the error model is fit per sequencing run (which is why we recommend merging at the table step (as long as your ASVs are the same size).

Without more specific details, I can’t say what is wrong, but it’s possible one of your files got mixed up during taxonomic assignment. Provenance would tell you more about it. But I would avoid this approach altogether for the reasons above.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.