Any tips on starting processing on Qiime1 and transitioning to Qiime2 after splitting libraries?

ErikaGanda · November 8, 2016, 2:38am

Hi,

I participated on the seminar in Phoenix and am eager to start using Qiime2 to analyze my data!
In brief, I have 6 MiSeq runs already demultiplexed and concatenated in a single fna file. I would like to import this on Qiime2 and run most of the analysis we learned on the moving pictures tutorial.

My questions are:
Is this possible?
Does Qiime2 have a something similar to split_sequence_file_on_sample_ids.py?
How should I proceed to make my fna file into a qza?
Should I transition even further down the pipeline?

I know that for the time being, running DADA2 in this amount of data starting from raw sequences is not feasible, so I would love to use the first steps I performed on Qiime1 and analyze everything in Qiime2.

Please bare with me, I am a veterinarian that just found out the amazing universe of coding!
Thank you, I appreciate all the help!

Erika

gregcaporaso · November 8, 2016, 4:01pm

Hi @ErikaGanda, Glad to hear that you're excited about QIIME 2!

You have a few options right now:

You can generate a biom table and phylogenetic tree with QIIME 1 (e.g., using pick_open_reference_otus.py) and then import those files into QIIME 2 artifacts for the "downstream" analyses, including alpha and beta diversity, and taxonomic profiling and differential abundance testing. Importing is described here.
You can wait for the next release of QIIME 2, which should have multi-threaded support for DADA2 (though I can't promise a specific date for this at the moment, I expect that it will happen in 2016).
Or, you can apply DADA2 on a per-MiSeq-run basis right now, and then merge the resulting files. This process is illustrated in our FMT tutorial. This process could take a long time to run -- possibly a week or more, but we haven't tried this on a dataset of this size yet, so I'm not certain about that. There is an approach that you can use to parallelize it. If you have access to a machine with at least six cores/processors, you could start a qiime dada2 denoise job for each of the MiSeq runs, and let all of those run at the same time. I'm not certain how much memory you would need per job, but I estimate at least 4GB. This would be possible to do on Amazon Web Services (we'll be releasing an amazon image for this by the end of next week), or if you have an institutional supercomputer resource.

Thanks for attending the workshop, and for your interest in QIIME 2!

thermokarst · November 10, 2016, 5:00pm

3 posts were split to a new topic: QIIME 2 demux emp using all available file descriptors