Is it possible to generate raw sequence data from any of the early-stage outputs of the QIIME2 pipeline?

bpscherer · September 18, 2024, 2:19pm

Apologies if this has been discussed elsewhere, but I did some digging and was not able to find it anywhere. It's possible I just couldn't word it accurately to find the answer, so please forgive me.

I recently inherited a project involving a plethora of 16S and ITS samples. The data I was provided only includes QIIME2 outputs and not the raw fasta/fastq files. They may still exist somewhere, but we have not found them yet. I would like to obtain them so that when it comes time to publish I can actually upload the data to NCBI.

If I have the demux.qza, is it possible to use that file to reverse engineer the original data files? Maybe this is a ridiculous question, but thank you for considering!

Nicholas_Bokulich · September 18, 2024, 2:51pm

Hi @bpscherer ,

This is not a ridiculous question at all. I was bracing myself to be the bearer of bad news until I read this part:

That might be the original data. You can check the provenance to see what was done. But if the only command you see in provenance is to import the data, then these are maybe the original, raw data.

Depending on how the sequencing was done, usually the data are delivered already demultiplexed. So unless if something was done to these data outside of QIIME 2 (e.g., trimming or filtering with another tool), then these are probably the raw data that your predecessor received from the sequencing core/service/generated themselves.

The other issue is that you can call a file anything you want, so demux.qza does not necessarily consist of demultiplexed raw sequence files, but I would assume that the name matches the contents.

So if the provenance checks out and the name aptly describes the contents, you can export the data from this QZA to get the demultiplexed sequence data. See the "exporting" tutorial for your options.

Good luck!

bpscherer · September 18, 2024, 3:11pm

Thank you so much! I'll have to see what I can do.

I think my collaborators sent off their samples to a company that did most of the bioinformatics and the demux.qza is the first step in what they did. As far as I know they didn't use any other tools to filter or trim. I also have all their DADA2 output qza files which might be helpful.