Importing metadata from Integrative Human Microbiome Project (iHMP) publicly available data

Hello Qiime2 Forum,

As a way to practice more with Qiime2 and the general command lines, I’ve been wanting to use publicly available data from the Integrative Human Microbiome Project (iHMP) website to perform some personal experiments.

This is the website that one can obtain data: https://portal.hmpdacc.org/

For anyone that has had experience with importing metadata from iHMP to Qiime2, how would you do it? I’m aware that you can download a manifest file too on iHMP. Every time I download either file option, the resulting Excel spreadsheet doesn’t have any of the genetic sequences for Qiime to recognize and perform statistical analyses.

Feel free to let me know what you recommend I do. Especially those that have worked with publicly available data, what are some resources and ways that I can apply those sets of data to Qiime? Thank you!

1 Like

Hello Hasti,

Oh wow, I contributed to the HMP when I was an intern, 5 years ago. I’m glad to see you are interested in the data. :hugs:

And welcome to the forums! :qiime2:

Take a look at that “File Counts by File Format” and “File Counts by File Type”. For ones like .biom and .fastq, you can import these by following the instructions over here.
https://docs.qiime2.org/2020.2/tutorials/importing/

I’m guessing the Fastq Manifest Format will be best, but you can see what works for you.

Another option is Qiita. These offer similar formats to the HMP and can also be imported into Qiime2. :gift:

Let me know what you try and how it works for you.

Colin

2 Likes

Hello Colin,

Such a pleasure to hear from you. That’s cool to know that you interned with them! I’m happy to use the forums so far.
So this is an example of one of the Manifest files that I choose to download from iHMP:

Body site: Feces
File Type: 16s raw seq set
File Format: Fastq
Project: Integrative Human Microbiome Project
Study: MOMS-PI
Gender: Female

For the .tsv manifest data that I downloaded, the columns are listed as follows: file_id, md5, size, urls, and sample_id. From looking at the Qiime2 docs for Fastq Manifest Format, it seems like the file needs to be formatted with the first column being sample-id and the second one being absolute-filepath. For some reason though, the urls on the file don’t have an ending of .gz. Would this make a difference?

Thank you again for your help!

1 Like

Hello again,

That’s right. That tsv file is not a Qiime 2 Fastq Manifest File, and you will have to make a copy and edit it so it matches.

Not really. .fastq is normal and .fastq.gz is compressed. The fastq manifest format supports both.

As this if your first time using Qiime2, you might find it useful to complete one of the excellent tutorials that are filled with examples of how to do things with Qiime 2. I really like the Parkinson’s Mouse Tutorial, and that’s perfect for your data set because they also use the fastq manifest format to import their data.
:older_adult: :mouse2: :qiime2:

I’m here if you have any questions about Qiime or about the HMP.

Colin

1 Like

Thank you for your response.

Oh, I see what you mean! I’ll test that out some more and see what I come up with. I will keep you updated and let you know if I have any other questions.

I’ve gone through all of the tutorials previously but I think it would be smart to look back at them again. Thank you for the recommendation.

Best,
-Hasti

1 Like

Hello Hasti,

Oh awesome! I’m never really sure what experience folks have, so I’m glad to hear that you have tried the tutorials. I’m also glad you are stretching your wings using real data from public projects.

Let us know if you have more questions,
Colin

1 Like

Hello Colin,

I seem to still have trouble with importing the data from iHMP. With the Fastq Manifest File that I download from the publicly available data (same project I mentioned earlier), this is how I reformatted the LibreOffice File on Qiime2:
First column and what’s underneath:
|sample-id |
|76612bd9a41885add4f6b0b7689cad82|

Second column and what’s underneath:
| absolute-filepath |
| fasp://aspera.ihmpdcc.org/ptb/genome/microbiome/16s/raw/EP311478_K90_BS1D.rawseqset.tar |

This format is similar for the rest of the rows. I only formatted the file to have the first column be sample-id and the second column be absolute-filepath. I noticed that the url has an ending of .tar. Could this be the reason why I have an error? At the moment, I’m still stuck with how I can import the data simply because of the formatting.

This was the command I attempted but produced an error:
*qiime tools import *
*> --type ‘SampleData[SequencesWithQuality]’ *
*> --input-path hmp_manifest_262a1bf3b0.tsv *
*> --output-path HMPManifest.qza *
> --input-format SingleEndFastqManifestPhred33V2

This was the error:

There was a problem importing hmp_manifest_262a1bf3b0.tsv:

hmp_manifest_262a1bf3b0.tsv is not a(n) SingleEndFastqManifestPhred33V2 file:

Filepath on line 1 and column “absolute-filepath” could not be found (fasp://aspera.ihmpdcc.org/ptb/genome/microbiome/16s/raw/EP311478_K90_BS1D.rawseqset.tar) for sample “76612bd9a41885add4f6b0b7689cad82”.

What do you recommend I do next? Thank you again for all of your help!

1 Like

Good to hear from you again!

Looks like your fastq manifest is off to a good start. :scroll:

I think you are pretty close! I think you should follow the examples more closely :wink:

In the fastq manifest format shown in the importing tutorial and in the pd-mice tutorial, are the files you import .tar files or .fastq files?
Have they already been downloaded to the local computer, or are they URLs?

Colin

Good to hear back from you too! I truly appreciate your help and bearing with me!

From what I see in the importing and the pd-mice tutorial, the files that you import are .fastq files. From what I’m noticing, the files in the tutorials are ones that, when loaded up, have the sample and information for the samples like the barcode sequences, etc. However, with the manifest file that I download from iHMP, the data only has the sample id’s and the URLs with .tar endings. I tried downloading the individual URLs to the local computer but they seemed to not work as well individually.

I will look closer at the examples! Let me know what you think.

Thank you,
-Hasti

1 Like

I think you are on the right track.

Keep in mind that the HMP manifest files and Qiime2 fastq manifest files are not the same format! You will have to make sure you convert everything to the right format, using the tutorial as an example.

The Qiime2 tutorial I’m referencing (pd-mice tutorial) focuses on importing 16S amplicon data from .fastq files, so that’s the kind of data I am recommending you find from HMP. Other Qiime2 plugins let you import everything from metabolites to shotgun sequences, but I think 16S is a good place to start.

We are always here to help!

Colin