Is it possible to analyze whole-metagenomic shotgun sequencing data with amplicon sequence data of the V4 region?
I am doing my thesis and I have two cohorts who have V4 amplicon sequence. So it was quite easy to merge them to analyze together. But, now that I have another cohort with whole-metagenomic shotgun sequencing data, I'm confused if I could use this data along with the previous two.
And if it is possible in the qiime2 pipeline? how could I make these whole-metagenomic shotgun sequencing data?
Please can you just tell me how could I use Woltka? Because they just told me to pre-process the data first. I just downloaded whole genome sequence data from the SRA database using the fondue plugin. But what I'm seeing in the Woltka tutorial they have used a .qza file against a taxonomy.qza file.
I don't have a cookbook. Partially becasue while this is published, it doesn't mean its straightforward. Partially becasue as I said, I have't done this yet. And partially because there is never a "just" in microbiome bioinformatics and the answer is always dependent on your data and your goals.
When I look at the q2-Woltka tutorial, they link to a taxonomy and tree file. So, that might be somewhere to go. But, I think first you need to figure out the align_sam bit.
Let me tell you the whole scenario. I've downloaded sequence data from the SRA database with the accession number PRJNA672260 to test the q2-greengenes plugins. These sequences are 16S rRNA V3-V4 regions. So first I have run the following code using deblur to produce 150nt long sequences.
Then I considered table-deblur_PRJNA672260_150.qza file for the next q2-greengenes2 pipeline. Then I tried to follow the following instructions and write my code as follows,
Traceback (most recent call last):
File "/home/turtle/miniforge3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 478, in call
results = self._execute_action(
File "/home/turtle/miniforge3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 539, in _execute_action
results = action(**arguments)
File "", line 2, in taxonomy_from_table
File "/home/turtle/miniforge3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self.callable_executor(
File "/home/turtle/miniforge3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/action.py", line 566, in callable_executor
output_views = self._callable(**view_args)
File "/home/turtle/miniforge3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_gg2/_methods.py", line 778, in taxonomy_from_table
tree = _load_tree_and_cache(open(str(reference_taxonomy)), features)
File "/home/turtle/miniforge3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_gg2/_methods.py", line 543, in _load_tree_and_cache
tree = tree.shear(names & features)
File "bp/_bp.pyx", line 758, in bp._bp.BP.shear
File "bp/_bp.pyx", line 800, in bp._bp.BP.shear
ValueError: No requested tips found
what I've managed to learn is that after the referencing with 2022.10.taxonomy.asv.nwk.qza file the resulting table-deblur_PRJNA672260_150_sequence_gg2.qza file lists all the samples in it. table-deblur_PRJNA672260_150_gg2.qzv (394.5 KB)
So, can you now suggest me any solution? or why I'm having this much trouble? If you can't help me is there anyone with an understanding of this problem?
The processing of your 16S and metagenomic data are two seperate pipelines.
For your 16S data, you need to preprocess:
Remove primers
Quality filter
Denoise
You have the potential issue that Greengenes2 asssumes your 16S data was amplified specifcially with 515F primers and performs taxonomic classification based on that (V4). In your initial post, you said the data was V4 and not V34, so the standard Greengenes pipeline makes sense.
Because you start at a different primer position, the assumptions don't hold, and a simple deblurring and trimming way or may not work if you want to perfrom Greengenes classification. Your simplest best will be to do Greengenes for non-V4 amplicons.
The metagenomic data has to be processed seperate because, unfortunately, its a seperate data type.
Thanks, @jwdebelius. Specifically, the non-v4-16s action for closed reference to the backbone. Taxonomy classification can also be performed using the full length Naive Bayes model