Data import ASV table with taxonomy and representative sequences

I have full length 16S+18S microbiome sequencing data (the read lengths of the sequences are not same). The data that I received is already demultiplexed, ASVs created and taxonomy assigned to the ASVs. I dont know which software was used to generate ASVs. I understand that I can import ASVs table and representative sequences into Qiime. I don't know what format the data should be in. I would like to import ASVs table and representative sequences and taxonomy into Qiime for further analysis (create alpha and beta diversity plots based on metadata). The format of data that I have is shown below.

  1. ASV table with taxonomy
    In this screenshot, I am showing 4 columns for samples, I have 96 such columns in each ASV table

  2. Represetative sequences in fasta file

Also, I would like to know if it is possible to combine 10 such ASV table / representative fasta file pairs as shown above. I have 960 samples, the samples were processed in 10*96 well plates, each plate had its own barcode. I received the data as 10 ASV / representative file pairs, but they are essentially all a single data set. Is it possible to import and analyze all of them together? How do I import the metadata file in such a case.

Many thanks for any help or tips,

Hi @Gaurav_C ,
It looks like your feature table is a tab-separated text file, but the format is very strange because each taxonomic rank is a separate column and you have uneven taxonomic ranks.

This cannot be imported as it is in a very unusual format.

First you should convert to a biom-format table (see the biom-format documentation online for more details; this software is not part of QIIME 2 but is installed as part of the QIIME 2 installation).

Once you have this in biom-format you can import to QIIME 2 as described in this tutorial:

The fasta file can be imported as-is — see also the tutorial above.

See the q2-feature-table plugin help documentation... that plugin has actions for merging feature tables (ASV tables) as well as sequence data. You can see the help docs for individual plugins here:

Sample metadata are not imported, they are just input directly as a TSV (in an appropriate format). See the tutorials at the documentation that I linked to to see some examples.

If you have any additional questions please open a separate topic to keep separate questions organized. Thanks!

good luck!

1 Like

Thanks Nicholas for the help. I was able to import the sequences and the ASV table.

I have a follow up question about the following:

The fasta file and the ASV table have unique feature IDs but the IDs are not representative of the sequence. The exact same sequence will have a different feature ID for different samples.

One of the example from the snapshot of the fasta file above: the feature ID is "3283_SAMPLE_44_MOLECULE_1_CONTIG_1" where:
3283: is the plate number (so I have 10 different numbers for 10 plates)
SAMPLE_44: I think is the well number (1-96) of the plate where the contig was first observed.
MOLECULE_1: is the number of unique sequence observed in that sample
CONTIG_1: every ID ends with this text

So, the feature ID is not representative of sequence itself, but is a random ID assigned to the sequence. The same feature ID is used when I imported the fatsa and ASV file. With this type of feature IDs, is there a way to merge the imported rep-seqs and datatable from 10 different plates.

Hi @Gaurav_C ,

I see, so contigs were generated per sample rather than on pooled samples.

Yes I suppose you could try clustering the contigs into OTUs at an appropriate threshold. But the contigs might only partially overlap as you already described, so assembling contigs on pooled reads might be a better approach overall.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.