Can i do the diversity analysis with taxonomic data from another software?

Fernanda_Costa · September 28, 2018, 2:12pm

Hello all,

I'm trying to do the diversity analysis on QIIME2 with data that came from another software. The problem is that I only have the data (identification of OTUs) in fasta format, like this:

>Sinningia.schiffneri.SS.799F::M01522:107:000000000-AW0BC:1:2104:16569:2530 1:N:0:AGTCAA
GTAGTCCATGCCGTAAACGATGAGTGTTCGCCCTTGGTCTACGCGGATCAGGGGCCCAGCTAACGCGTGAAACACTCCGCCTGGGGAGTACGGTCGCAAGACCGAAACTCAAAGGAATTGACGGGGGCCTGCACAAGCGGTG
GAGCATGTGGTTTAATTCGATACAACGCGCAAAACCTTACCAGCCCTTGACATATGAACAACAAAACCTGTCCTTAACGGGATGGTACTGACTTTCATACATGTGTTGCATGGATGTGGTCAGCTCGGGTCG
>Sinningia.magnifica.SM.799F::M01522:107:000000000-AW0BC:1:2103:8991:10744 1:N:0:AGTCAA
GTAGTCCATGCCGTAAACGATGAGTGTTCGCCCTTGGTCTACGCGGATCAGGGGCCCAGCTATCGCGTGAAACACTCCGCCTGGGGAGTACGGTCGCAAGACCGAAACTCAAAGGAATTGACGGGGGCCTGCACAAGCGGTG
GAGCACGTGGTTTAATTCGATACAACGCGCAAAACCTTACCAGCCCTTGACATATGAACAACAAAACCTGTCCTTAACGGGATGGTACTGACTTTCATACAGGTGTTGCATGGCTGTCGTCAGCTCGTGTCG

Is there a way to upload this data set result into a table format file (qza) to do the diversity analysis?

Any help will be very welcome and appreciated!

Thank you all in advance,

Nicholas_Bokulich · September 28, 2018, 5:33pm

Hi @Fernanda_Costa,
Great question!

In general:

Yes, you can use taxonomic data from any other software in QIIME 2, provided it is formatted correctly.
Diversity analyses in QIIME 2 do not require taxonomy assignments. There is the option, of course, to collapse your features based on taxonomy and analyze taxonomic diversity (as opposed to, e.g., sequence diversity). Just want to clarify that taxonomy is not required.

Your data, however, have a few formatting issues:

Biggest issue: If you only have fasta data like this, you will not be able to use QIIME 2. It looks like your data are not yet demultiplexed, and the demultiplexing methods in QIIME 2 require fastq data. You should either get the raw data for demultiplexing in QIIME 2, or you should demultiplex your data with external software, then import to QIIME 2, probably following the instructions here... check out the information about qiime 1 demultiplexed format, you will need to get your data in that format.
your data do not yet fit the formatting requirements for taxonomy data to be uploaded to QIIME 2. Taxonomy files should look something like this:

id    taxonomy
f1    Sinningia schiffneri
f2    Sinningia schiffneri
f3    Sinningia magnifica

Since your sequences already have taxonomy assigned, you can use that information in QIIME 2 but you will need to reformat to get there. It should be fairly trivial to parse that information from the fasta, but the challenging part is figuring out what your feature IDs would be... this all depends on how you address the importing issues raised above.

What external software are you using? Is the taxonomy classification what matters most to you or are you using other methods, e.g., for data quality control? If it is only taxonomy, then honestly it would be far, far easier to just import fastq data into QIIME 2 and follow one of the typical workflows available, then export your data (as fasta) for taxonomy classification with this external method... then import those classifications to QIIME 2. As is, you have multiple format incompatibility issues (multiplexed fasta data, taxonomy IDs in the fasta header lines) that are not insurmountable but would probably be more difficult to sort out than the workaround I propose.

Let us know how we can help!

Fernanda_Costa · October 1, 2018, 1:40pm

Thank you very much for your reply!

And yes, i just got the fastq files from that data and will do as you suggest: do the whole QIIME2 workflow till i get to the diversity analysis.

Thank you again and have a great week.

Cordially,

system · November 1, 2018, 7:51pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.