I am trying to do a meta-analysis with 9 different studies. However, when I pass the closed reference OTU picking when SILVA and Grean Gene, I am losing 4 studies by the time I get to the core-metrics, using 1000 sequences per sample.
I think all of the reads from the samples from the 4 studies are not recognized by the closed reference sequences, resulting in no reads in any samples from the 4 missing studies.
Is this normal? Is there another way around this problem?
When you do closed reference OTU picking, you can do it in parallel - meaning that you can pick one sample or 1M at the same time and it doesn’t matter. If you’re having an issue with the studies, then you should troubleshoot on those individually to solve your problem because it won’t affect ht eothers.
I’m a little bit confused by thsi step. If you’re doing close-reference OTU picking, then you should just use the tree associated with the closed reference OTUs. You can import the phylogeny and work from there. So, this seems like a weird step in your pipeline to me. Could you explain it fully?
If your studies are different hypervariable regions, then current best practice says that you should do closed reference OTU picking. If you’re not mixing hypervariable regions, ASVs are better but must be the same length. Otherwise, you really aren’t working with the same data set.
You definately shouldn’t do MAFFT alignment because its a denovo alignment based on the tree. And, sequences from the same organism will seperate in MAFTT alignment based on hypervariable region. It’s to me questionable (although better) if you do fragment insertion, because then at least you’re working against a reference. (The Fragment Insertion) paper describes this. But, the MAFTT tree is not a good appraoch unless you’re using the same primers and same length sequences.
Your other option is to skip denoising and just go straight into OTUs which might save some sequences (I would still do quality filtering first, just maybe not denoising) and see if it helps with your count problem.
I just find it incredible that none of the sequences from the samples match the references sequences.
I will verify again with the individual studies using the closed OTU picking. If it doesn’t work, I might try Fragment Insertion. Another person suggested it. I was just having a hard time figuring out how to download it and combine it with QIIME 2.
No matches were identified to reference_sequences. This can happen if sequences are not homologous to reference_sequences, or if sequences are not in the same orientation as reference_sequences (i.e., if sequences are reverse complemented with respect to reference sequences). Sequence orientation can be adjusted with the strand parameter.
Debug info has been saved to /tmp/qiime2-q2cli-err-zhkcryhv.log
On a hunch, I investigated the other studies and found them all to be single strand studies. Is it possible that the “sequence orientation” is reversed and if I flip the sequences (from forward to reverse) then more sequences may align to the closed reference sequences?
I would definitely try that! Its at least worth seeing what they look like.
If you’re running 2019.4 or 2019.7, it’s included as part of the base install (q2-fragment-insertion). If you’re before that, it should be in the plugin library. It’s been pretty easy to use with greengenes in my experience. Silva is a bit more complex (read: manual), but Ive found the gg tree has worked pretty well for me.