import mothur files to QIIME2

Hello,
I am a QIIME2 new user, and have a question for those who have expertise on both mothur and QIIME2. I have finished sequence clean-up (denosie) and basic analysis using mothur and obtained shared and taxonomy files for visualization. Now I would like to visualize it in QIIME, could anyone let me know how to import the shared and taxonomy files from mothur into QIIME2? what kind of file conversion is needed? Thank you!

Welcome @arlandan!

The taxonomy file is easy — looks like it resembles the taxonomy TSV format that QIIME 2 can import directly:

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path taxonomy.txt \
  --output-path taxonomy.qza

The shared file is a bit more complicated — it does not appear analogous to a normal observation matrix (e.g., OTU table in TSV format) that can be imported to biom-format (the format QIIME 2 expects for importing to a FeatureTable[Frequency] artifact). You can see what the “classic OTU table” format that biom-format expects looks like on the biom-format website. (NOTE: biom-format is distinct from QIIME 2 so we will have limited ability to help if you have trouble working with biom format)

The shared file format looks like a “classic” OTU table that has been grouped by sample type with some additional columns, and transposed. So we can remove those extra columns, transpose the file (solution taken from here), convert to biom-format, and then import to QIIME 2:

cut -f 2,4- shared-file.txt | 
awk '
{ 
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
    }
}
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' > table.txt

biom convert -i table.txt -o table.biom --table-type="OTU table" --to-json

qiime tools import \
  --input-path table.biom \
  --type 'FeatureTable[Frequency]' \
  --input-format BIOMV210Format \
  --output-path table.qza

I have not tested that on a real mothur shared file, so it may need some tweaking, but give it a spin! Please let us know what you find — it would be great to work out a tutorial for importing mothur formats to QIIME 2 :wink:

6 Likes

Hi Nicholas,
My response is running a bit late.Thank you very much for your reply. It took me a while today to know what bash script is and then got half of the job done. As you said, it was easy to convert taxonomy file, your script worked very well. As for the shared file, I was only able to transpose the OTU table. When I run the quoted script, I got following errors:
Traceback (most recent call last):
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/biom/parse.py”, line 660, in load_table
table = parse_biom_table(fp)
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/biom/parse.py”, line 412, in parse_biom_table
t = Table.from_tsv(fp, None, None, lambda x: x)
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/biom/table.py”, line 4631, in from_tsv
t_md_name) = Table._extract_data_from_tsv(lines, **kwargs)
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/biom/table.py”, line 4747, in _extract_data_from_tsv
md_name = header[-1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/Users/ran/miniconda3/envs/qiime2-2019.1/bin/biom”, line 11, in
sys.exit(cli())
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/click/core.py”, line 764, in call
return self.main(*args, **kwargs)
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/click/core.py”, line 717, in main
rv = self.invoke(ctx)
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/click/core.py”, line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/click/core.py”, line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/click/core.py”, line 555, in invoke
return callback(*args, **kwargs)
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/biom/cli/table_converter.py”, line 114, in convert
table = load_table(input_fp)
File “/Users/ran/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/biom/parse.py”, line 662, in load_table
raise TypeError("%s does not appear to be a BIOM file!" % f)
TypeError: mothur_shared.txt does not appear to be a BIOM file!

I googled for a way to solve this issue, no luck and just got stuck here…Would you be able to tell anything more on this? Thank you! Much appreciated!

Hey @arlandan,

You mentioned you were able to transpose successfully, would you be able to give us the first few lines of that file? Either upload or paste them between backticks (```), the backtick “fence” goes on its own lines.

That will help us work out if everything went well. I’m suspecting it’s just a comment header that is missing which biom requires.

Hi @ebolyen,

Thank you for your response. Here are the first few lines of the file transposed.

Group 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
Otu001 270 278 220 121 309 327 336 184 309 275 316 128 151 152 1 0 0 0 0 0 263 210 328 3
Otu002 0 16 0 2 8 103 0 0 0 37 0 246 182 272 2 4 0 500 707 697 0 0 0 505
Otu003 64 112 109 51 136 83 114 107 85 61 78 78 79 71 0 0 0 0 0 0 79 64 104 2
Otu004 87 72 46 85 66 51 119 36 28 99 54 40 31 32 0 0 0 0 0 0 47 68 126 1
Otu005 0 0 0 342 17 13 4 40 110 47 21 1 7 0 9 8 20 176 0 2 0 0 0 163
Otu006 33 14 17 2 20 6 28 23 49 13 70 29 14 24 0 0 0 0 0 0 7 29 35 0
Otu007 0 1 17 0 0 4 0 5 0 0 9 6 5 3 0 0 0 0 0 0 137 228 0 0
Otu008 7 13 15 2 12 3 13 5 23 30 23 11 21 13 0 0 0 0 0 0 59 38 0 0
Otu009 54 0 43 16 9 10 0 18 16 11 5 4 15 4 0 0 0 0 0 0 25 11 19 0
Otu010 15 12 17 5 0 7 12 33 9 0 9 0 18 20 0 0 0 0 0 0 8 8 12 0

When I open it in excel, all values are in the first column, I mean columns are not separated. So I changed it to tab-delimited file and use this file in following lines:

biom convert -i mothur_shared2.txt -o mothur_shared.biom --table-type="OTU table" --to-hdf5

Hi @Nicholas_Bokulich, I think this is the reason I got that error message in my previous reply. Note that I used HDF5 type instead of json, since json did not work for me. I don't know why.

Once I got the HDF5 type biom file, I was able to import the taxonomy and otu files into qiime2 and get a taxa_bar_plot to view the microbial composition in my samples. However, my barplot legend is not looking good. They are all numbers, I think they are otu numbers, not taxa. It might need a taxonomy column in any of these file, or something else. I have no idea at this point. Could any one help me with this? Thank you!

Best.

1 Like

Got it in one. :+1:
I’m not familiar enough with the mothur format to know where to place one, but I’m pretty sure biom is expecting to find a taxonomy column.

Thank you @colinbrislawn!
The taxonomy file from mothur has that column, but the shared file, which is actually the OTU table, does not. Thanks for letting me know biom needs a column for taxonomy. A good place to start. Thanks again!

Best,

That is probably not the problem here. Because the taxonomy is NOT coming from the biom table. Instead, it sounds like the taxonomy file may have columns in the unexpected order. Could you share the first few lines of that file here?

1 Like

Hi @Nicholas_Bokulich

This is how the taxonomy file looks like after conversion.
The file is also uploaded too , in case needed.
Thanks a lot !

mothur_taxonomy.qza (10.4 KB) mothur_taxonomy.qzv (1.2 MB)

thanks Arlan!
Notice that the second column is labeled “Taxon” but your taxonomy is in the third column. You need to do two things:

  1. delete the “size” column (your OTUs are being labeled with “size” in your barplots!)
  2. import as TSVTaxonomyFormat instead of HeaderlessTSVTaxonomyFormat
2 Likes

Thank you @Nicholas_Bokulich

I think I am almost there with you help. I deleted the “size” column and imported as TSVTaxonomyFormat, but it got an error message saying my taxonomy file is not a TSVTaxonomy file. Here is what I did to the taxonomy file:

The taxonomy file generated by mothur can ben opened only with text editor (not excel). So I opened it in text editor and copy-paste the content to a new excel file. I deleted the “size” column in the excel file and saved as tab-delimited file. I imported this tab-delimited file to qiime2 using following commands:

qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format TSVTaxonomyFormat
–input-path mothur_taxonomy2.txt
–output-path mothur_taxonomy.qza

And it the error message showed up. Am I doing something wrong?

Can I know why I should use TSVTaxonomyFormat instead of “Headerless…” and what the difference is in terms of the output file?

Thanks so much!

It looks like your taxonomy file has two header lines. Remove the second header line and replace it with the correct header line (Feature ID[tab]Taxon). Better yet, just delete both header lines and import as HeaderlessTSVTaxonomyFormat.

No difference in terms of output. The difference is in the input: one has a header line, the other does not.

Hi @Nicholas_Bokulich

Thank you so much. They both worked, and the output files are exactly the same. So far, I have been able to convert both shared and taxonomy file from mothur to QIIME2 format and proceed with further analysis! Greatly appreciate your help.

Best!

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.