How to analyse microbiome data that already has gone halfway?

I was given some data produced from microbiome 16S RNA sequencing and I’m trying to understand how to go about analysing it.

  • 1 file has relative abundance data of ASV’s in different samples
  • Another file has information about the sample such as age, sex, disease state, etc…
  • The last file contained taxonomic information of these ASV’s with their sequences
    From what I’ve read so far, the information I’ve been given is a result of early steps of the general QIIME pipeline (or any other microbiome analysis pipeline really).
    I was wondering if these files are sufficient for analysis such as diversity analysis in different groups in QIIME or whether there are simpler tools I can use and I’m overcomplicating it.
    I’ve read the tutorial online and still don’t know were to start from.
    Also, most tutorials on the QIIME website include mapping files, metadata files which I do not have.
    NB: The files are all in excel format.

Welcome to the forum @Newbie!

QIIME 2 is designed to analyze data whether it is raw or halfway or even fully analyzed. This is accomplished by "importing" different data types into QIIME 2 artifacts (i.e., converting different standard interoperable formats into QIIME 2 files).

It's a matter of opinion but diversity analyses are not a simple thing to accomplish, so importing and running with QIIME 2 is probably the easiest thing to do, unless if e.g., you are proficient with R and would prefer using vegan etc.

Basically what you will need to do (since all files are excel format) this is where I would start:

  1. convert to TSV
  2. remove special characters (excel will insert special line breaks and do other things that totally destroy files... excel should really never be used for touching data because it causes woes such as these)
  3. import the files following the tutorials here: https://docs.qiime2.org/2020.11/tutorials/importing/

Now let me tell you about what you have and how to import:

This is your "feature table". Use biom to convert to biom format, then import using these instruction.

This sounds like your sample metadata file so I am not sure what you mean here:

you may just need to reformat this file to fit the specifications described here.

Then the correctly formatted metadata file is ready to use as a TSV, it is not "imported" in any way.

This can be imported like this:

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-path taxonomy.txt \
  --output-path taxonomy.qza

Once the feature table and taxonomy are imported, you can start analyzing your data, e.g., see this tutorial for diversity analyses that you can run with those data.

Though note that it sounds like you do not have sequences or a phylogenetic tree, so do not attempt to run the phylogenetic analyses described there. Instead of using the core-metrics-phylogenetic pipeline use the core-metrics pipeline in its place.

Good luck!

2 Likes

Thank you for the help.
I followed the tutorials given and was able to produce diversity data for my samples.

(The taxonomy file actually has AVS sequences in them so I was also able to align them and make a phylogeny file which I used for core-metrics-phylogenetic analysis.)
Hopefully, I can make some good correlations!

Many thanks

1 Like