Is it possible to import Nanopore ITS data (after BLAST)?

(Heng-An ) #1

Hello,

I have ITS sequencing data from Nanopore GridION.
I already blast the data with a custom database (Unite).
Is it possible to import blast results to QIIME2 for further analysis?
If so, what format should I use?

I am really new to this area, thanks for any advice in advance!

Best,
Heng-An

(Nicholas Bokulich) #2

Yes. If the BLAST results are in a tab-delimited format like the example below, you can import with this command:

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path taxonomy.txt \
  --output-path taxonomy.qza

The expected format:

seq1<tab>semicolon;delimited;taxonomy
seq2<tab>semicolon;delimited;taxonomy
...

If that taxonomy file has a header line, use the same command as above but use TSVTaxonomyFormat for the input format, rather than HeaderlessTSVTaxonomyFormat.

Good luck!

1 Like
(Heng-An ) #3

Thanks so much for reply!
I have a question about the format since I got some error message when I import the data.

What does the “seq1” “seq2” mean? is it the ID for each sequence? or the count of the same taxomomy group?

I have tried two types of format for the first column.
In the first format, I set the “seq1” as the count of the taxonomy, and got the error:

Taxonomy format feature IDs must be unique. The following IDs are duplicated: 3, 1, 2, 5, 4, 7, 9, 6, 8, 12

In the second format, I set “seq1” as the sequence ID, and the code works.

Am I doing this correctly?

Thanks!!!

(Nicholas Bokulich) #4

You are correct with the second format.

(Heng-An ) #5

Hi @Nicholas_Bokulich,
Thanks for your reply!! That really helps a lot.

I have another question about the data format.

  1. I am trying to run qiime taxa barplot, and find out that I need to import another file with Feature Table [Frequency] format. Since I don’t have a .biom file to import through the code below:
    qiime tools import
    –input-path feature-table-v210.biom
    –type ‘FeatureTable[Frequency]’
    –input-format BIOMV210Format
    –output-path feature-table-2.qza
    Is there another way to get this format? Or if I need to create the file by myself, what the format will look like?

  2. In total, I have 96 samples and demultiplexed by other tools. For now, I only choose one of the files to test the code. If I want to analyze all the samples (eg. for alpha and beta-diversity analysis), it looks like all the 96 samples need to be in one .qza file? Is this correct?

(Nicholas Bokulich) #6

To create a barplot (or any other representation of species abundance) you need to know the abundance of each species in each sample. This is usually represented as an observation matrix, which can be converted to biom format and then imported to QIIME 2. See biom-format conversion instructions (not a part of QIIME 2) for details on how to perform this conversion, and expected formats.

If you are importing as a feature table, yes, they should all be in the feature table. Note: it is easy to merge feature tables inside QIIME 2 if it is easier for you to import each sample separately.