Possible Analysis Pipeline for Ion Torrent 16S Metagenomics Kit Data in QIIME2?

This might be common knowledge already, but I wanted to articulate why I think close-ref OTU picking could be a good fit for multi-region-studies

Three high-level strategies for defining OTUs... are canonically described as de novo, closed-reference, and open-reference OTU picking... Each of these methods has benefits and drawbacks.
...
In closed-reference OTU picking, input sequences are aligned to pre-defined cluster centroids in a reference database. If the input sequence does not match any reference sequence at a user-defined percent identity threshold, that sequence is excluded. (peerj, 2014).

This is essentially 'counting database hits' so

  1. resulting OTUs are 100% biased by the database :open_mouth:
  2. resulting OTUs are 100% consistent with the database :slightly_smiling_face:
  3. resulting OTUs are literally just the ones from the database :upside_down_face:

Modern ASV methods aim to be just as consistent without introducing database bias, but for this project we are knowingly using this strong bias to normalize across regions.

Let us know what you find!

Colin

P.S. You could get some popcorn and read this flaming :exploding_head: review of closed-ref clustering, or don't because we use ASVs now! :stuck_out_tongue_winking_eye:

3 Likes

Thanks for clarifying @colinbrislawn! I agree, and I also generally avoid closed-ref OTU clustering in the modern era if I can, but this ion torrent kit seems like one of the special case where I think the strengths (collapsing disparate amplicons) could outweigh the weaknesses (database bias, reduced resolution vs. ASVs) so there are times when I still use and advocate closed-ref OTU clustering.

One thing to note: we have known for a long time that OTU clustering on its own leads to inflated diversity estimates and needs to be paired with other filtering (or denoising!) methods to reduce those errors, and closed-ref OTU clustering on its own suffers from the same issues.

However, in this case you are using closed-ref OTU clustering after denoising. So the contents of that flaming review do not really apply here... erroneous sequences are being filtered/corrected by your denoising method of choice, then closed-ref OTU clustering is being used strictly to "collapse" the ASVs into full-length 16S sequences, not as a pseudo-error-filtering method. This should all still be benchmarked to see how this performs for this ion torrent kit (@cjone228's mock communities will enable that endeavor!) but that review should not discourage this analysis.

3 Likes

Hello to all,
I’m trying this pipeline on my data. This is the step (as suggeste above):
1- Import data
2- demux
3- dada2 denoise-pyro
4- qiime vsearch cluster-features-closed-reference
5- qiime fragment-insertion sepp

In the last step (5) I had an error message…

This is the script that I used:
qiime fragment-insertion sepp
–i-representative-sequences rep_seqs_cr_99.qza
–i-reference-database sepp-refs-gg-13-8.qza
–output-dir Sepp_feces
–p-threads 1

Plugin error from fragment-insertion:

Command ‘[‘run-sepp.sh’, ‘/var/folders/nk/qyjfz4t11vn5vnx07bkphqrw0000gn/T/qiime2-archive-70i_7a8b/2d6587b0-3345-4d60-b53a-de3e4054cf84/data/dna-sequences.fasta’, ‘q2-fragment-insertion’, ‘-x’, ‘1’, ‘-A’, ‘1000’, ‘-P’, ‘5000’, ‘-a’, ‘/var/folders/nk/qyjfz4t11vn5vnx07bkphqrw0000gn/T/qiime2-archive-2wfekjne/a14c6180-506b-4ecb-bacb-9cb30bc3044b/data/aligned-dna-sequences.fasta’, ‘-t’, ‘/var/folders/nk/qyjfz4t11vn5vnx07bkphqrw0000gn/T/qiime2-archive-2wfekjne/a14c6180-506b-4ecb-bacb-9cb30bc3044b/data/tree.nwk’, ‘-r’, ‘/var/folders/nk/qyjfz4t11vn5vnx07bkphqrw0000gn/T/qiime2-archive-2wfekjne/a14c6180-506b-4ecb-bacb-9cb30bc3044b/data/raxml-info.txt’]’ returned non-zero exit status 1.

Debug info has been saved to /var/folders/nk/qyjfz4t11vn5vnx07bkphqrw0000gn/T/qiime2-q2cli-err-mf2m8927.log

Fist of all the pipeline is correct?
What happened?
I think that the problem is the step 4 because when I use sepp after dada it works.
Can someone help me?

Regards,

Rubina

2 Likes

Hi Everyone!

@cjone228 and I have another couple of points we would like clarified:

  1. Our fastq files contain single-end mixed orientation reads (both forward and reverse). We imported our data using SingleEndFastqManifestPhred33V2 according to the QIIME2 the Importing Data Document. However, we recently noticed that the Importing Data Document states “In this variant of the fastq manifest format, the read directions must all either be forward or reverse.” Is there another way we should be importing our data? Or is the only solution to re-orient our reads or split based on direction prior to importing?

  2. In the event that we are able to import our fastq files as-is (i.e. in mixed orientation), we wanted to clarify whether or not DADA2 can handle mixed-orientation reads? (Based on what we read we don’t think that it can…).

So, overall we are just trying to clarify whether it is inevitable that we will need to split our reads by direction at some point or another.

Thanks!
Lauren

P.S. @rparadiso - you are actually a step ahead of us, so we don’t have an answer to your question! Hopefully someone else has some insight for you

2 Likes

Oops! I meant to thank @Nicholas_Bokulich, @colinbrislawn, and @jwdebelius for your clarifications on OTU clustering, SMURF, and phylogenetic analysis above - all of your posts were extremely helpful!

Lauren :qiime2:

3 Likes

I am not 100% certain of the semantics there, but I think the point was that reads should not be pre-joined if they are imported in that format..

Your reads are all forward or reverse because in this case F/R mean the read direction on the sequencing instrument, not the orientation respective to the genome (which is mixed in this case).

So you are doing the right thing, and this is the correct format.

dada can handle mixed-orientation reads (respective to the genome), that is not a problem technically speaking. But mixed F + R reads and pre-joined reads will cause issues.

So again you are doing things correctly.

The only issue I can think of for mixed-orientation reads and dada2 is that you will get unique ASVs for reads from the same genome that are in reverse orientations. But in theory that is not a dada2 problem, it is an alpha diversity problem! (as I think we've discussed above but this topic is so long I can't remember anymore).

The issue is that you do not need SEPP, and should not use SEPP here. When you use closed-reference OTU clustering, the features are no longer ASVs that need to be aligned/spliced into a new phylogeny. The features are now the matching reference OTUs, and you adopt the reference phylogeny.

See this issue for more details on why you should not use SEPP after closed-reference OTU picking:

So what should you (and everyone else who wants to use this pipeline) do instead? You should use the reference trees that ship with your reference database of choice (e.g., in your case use the greengenes 99% OTU reference tree since you used that same database for clustering with vsearch).

Thanks everyone! I feel like we're making a lot of progress!

2 Likes

Thanks Nicholas for your correction.
Now I’m a little confused about how to continue…
I have to download the tree from Greengenes’ database and then how do I continue?
Can I perform in the next step the core metrics?

qiime diversity core-metrics-phylogenetic
–i-phylogeny rooted-tree.qza (green genes tree?)
–i-table table.qza
–p-sampling-depth XXX
–m-metadata-file metadata.tsv
–output-dir core-metrics-results

Best,
Rubina

Hi @rparadiso,

Yes just import the tree that corresponds to the reference sequences that you used (e.g., if you used the 99% ref, use the 99% OTUs tree) as a Phylogeny[Rooted] following the import instructions.

If you run into any subsequent errors with core-metrics, see this topic:

Good luck!

Thanks for the feedback @Nicholas_Bokulich!

:+1:

:+1:

You are correct - we did cover this earlier in the post :wink:

Thinking a little further ahead, if we import and perform dada on our genomic-mixed-orientation reads, do you foresee that the reverse reads will have problems aligning to the reference database when doing closed-reference OTU clustering? (i.e. - will we lose half of our data?)

Thanks as always!
Lauren :qiime2:

1 Like

We may have just answered our question about closed reference OTU clustering - in the QIIME2 documents for closed reference clustering of features, the parameters section has the following option:

--p-strand TEXT Choices('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward
and reverse complement) strands. [default: 'plus']

We assume if we choose 'both' that would allow for mixed orientation reads?

Next, we were looking for the greengenes reference database for the OTU clustering step. We could not find the 13_8 release on the gg website. We found this post and downloaded the file. Is this is the correct file to use?

The file contains the following folders:


Screen Shot 2020-04-27 at 1.48.24 PM
For our closed reference OTU clustering, we need to use one of the xx_otus.fasta files as our reference sequences. Which version should we use - the rep_set or the rep_set_aligned?

Thanks so much!
Lauren :qiime2:
Based on your recommendations above, we would want to use a rooted tree. Which version should we use: XX_otus.tree or xx_otus_unannotated.tree ?

3 Likes

You did!

That's correct.

Looks like you found the correct file. For future reference, you can find these and other files linked from here (scroll down): Data resources — QIIME 2 2020.2.0 documentation

rep_set (these are the unaligned sequences)

Good question... I am not sure, and not totally sure if it matters. I'd say try XX_otus.tree and you will probably run into an error pretty quickly if that's the wrong one.

4 Likes

I use XX_otus.tree and I import it according to the importing instruction, but as anticipated by Nicholas... in the core matrics step I have this error

Plugin error from diversity:

All non-root nodes in tree must have a branch length.

Debug info has been saved to /var/folders/nk/qyjfz4t11vn5vnx07bkphqrw0000gn/T/qiime2-q2cli-err-u3wm8n8f.log.

Now I'm trying to solve it as suggest by Nicholas...
I hope to do it, if you solve before me please let me know :), a hand is always welcome!

Thanks,
Rubina

1 Like

Good morning to all!
In the past days I have tried to solve the problem that I described previously without results has anyone managed and can you give me a hand?
I can not go on …

thank you very much

Rubina

Hi @rparadiso,
Since this is not strictly relevant to this topic (it is an issue with that specific reference tree and, e.g., you could use a different reference tree), do you want to open up a separate topic to solve your q2-fragment-insertion issue? If that solution is relevant to the current discussion, we can link back to that here.
Thanks!

1 Like

Hi Rubina,

@Lauren and I have been trying to troubleshoot this too, but with no luck so far. We got the same error as you when we repeated what you did, and also when we tried the same code but after importing the XX_otus_unannotated.tree.

We aren't sure how to use Python to fix the branch length issue as shown in the post that @Nicholas_Bokulich linked to - our computing cluster does have Python built in but we haven't been able to figure out how to actually use it yet :sweat_smile:.

We look forward to your thread on this and will plan to chime in there!

Best,
Carli :qiime2:

1 Like

ok Nicholas

thank you :slight_smile:

Hi everyone,
@rparadiso was able to solve the greengenes branch length issue in a separate topic — see here for a few different solutions:

@cjone228 that topic lists some other options — opening and manually modifying the file, or running a python script in the bash shell (command line), so there are a few options to suit whatever you feel most comfortable working with.

Please post to that topic if you have any follow-up questions or run into any issues with fixing your tree(s).

1 Like

Hi all,

@Lauren and I have successfully completed the steps we discussed above on our mock community data thanks to the python script to fix the tree! :partying_face: @rparadiso @Nicholas_Bokulich

Just for fun, we are going to try doing the same steps again using our mock community but this time using SILVA as our reference database! :evergreen_tree: :dna:

Best,
Carli :qiime2:

5 Likes

Whoa, this is great news! :champagne:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.