Importing metadata from Integrative Human Microbiome Project (iHMP) publicly available data

Hello Qiime2 Forum,

As a way to practice more with Qiime2 and the general command lines, I’ve been wanting to use publicly available data from the Integrative Human Microbiome Project (iHMP) website to perform some personal experiments.

This is the website that one can obtain data: https://portal.hmpdacc.org/

For anyone that has had experience with importing metadata from iHMP to Qiime2, how would you do it? I’m aware that you can download a manifest file too on iHMP. Every time I download either file option, the resulting Excel spreadsheet doesn’t have any of the genetic sequences for Qiime to recognize and perform statistical analyses.

Feel free to let me know what you recommend I do. Especially those that have worked with publicly available data, what are some resources and ways that I can apply those sets of data to Qiime? Thank you!

1 Like

Hello Hasti,

Oh wow, I contributed to the HMP when I was an intern, 5 years ago. I'm glad to see you are interested in the data. :hugs:

And welcome to the forums! :qiime2:

Take a look at that "File Counts by File Format" and "File Counts by File Type". For ones like .biom and .fastq, you can import these by following the instructions over here.
https://docs.qiime2.org/2020.2/tutorials/importing/

I'm guessing the Fastq Manifest Format will be best, but you can see what works for you.

Another option is Qiita. These offer similar formats to the HMP and can also be imported into Qiime2. :gift:

Let me know what you try and how it works for you.

Colin

2 Likes

Hello Colin,

Such a pleasure to hear from you. That’s cool to know that you interned with them! I’m happy to use the forums so far.
So this is an example of one of the Manifest files that I choose to download from iHMP:

Body site: Feces
File Type: 16s raw seq set
File Format: Fastq
Project: Integrative Human Microbiome Project
Study: MOMS-PI
Gender: Female

For the .tsv manifest data that I downloaded, the columns are listed as follows: file_id, md5, size, urls, and sample_id. From looking at the Qiime2 docs for Fastq Manifest Format, it seems like the file needs to be formatted with the first column being sample-id and the second one being absolute-filepath. For some reason though, the urls on the file don’t have an ending of .gz. Would this make a difference?

Thank you again for your help!

1 Like

Hello again,

That's right. That tsv file is not a Qiime 2 Fastq Manifest File, and you will have to make a copy and edit it so it matches.

Not really. .fastq is normal and .fastq.gz is compressed. The fastq manifest format supports both.

As this if your first time using Qiime2, you might find it useful to complete one of the excellent tutorials that are filled with examples of how to do things with Qiime 2. I really like the Parkinson’s Mouse Tutorial, and that's perfect for your data set because they also use the fastq manifest format to import their data.
:older_adult: :mouse2: :qiime2:

I'm here if you have any questions about Qiime or about the HMP.

Colin

1 Like

Thank you for your response.

Oh, I see what you mean! I’ll test that out some more and see what I come up with. I will keep you updated and let you know if I have any other questions.

I’ve gone through all of the tutorials previously but I think it would be smart to look back at them again. Thank you for the recommendation.

Best,
-Hasti

1 Like

Hello Hasti,

Oh awesome! I'm never really sure what experience folks have, so I'm glad to hear that you have tried the tutorials. I'm also glad you are stretching your wings using real data from public projects.

Let us know if you have more questions,
Colin

1 Like

Hello Colin,

I seem to still have trouble with importing the data from iHMP. With the Fastq Manifest File that I download from the publicly available data (same project I mentioned earlier), this is how I reformatted the LibreOffice File on Qiime2:
First column and what’s underneath:
|sample-id |
|76612bd9a41885add4f6b0b7689cad82|

Second column and what’s underneath:
| absolute-filepath |
| fasp://aspera.ihmpdcc.org/ptb/genome/microbiome/16s/raw/EP311478_K90_BS1D.rawseqset.tar |

This format is similar for the rest of the rows. I only formatted the file to have the first column be sample-id and the second column be absolute-filepath. I noticed that the url has an ending of .tar. Could this be the reason why I have an error? At the moment, I’m still stuck with how I can import the data simply because of the formatting.

This was the command I attempted but produced an error:
*qiime tools import *
*> --type ‘SampleData[SequencesWithQuality]’ *
*> --input-path hmp_manifest_262a1bf3b0.tsv *
*> --output-path HMPManifest.qza *
> --input-format SingleEndFastqManifestPhred33V2

This was the error:

There was a problem importing hmp_manifest_262a1bf3b0.tsv:

hmp_manifest_262a1bf3b0.tsv is not a(n) SingleEndFastqManifestPhred33V2 file:

Filepath on line 1 and column “absolute-filepath” could not be found (fasp://aspera.ihmpdcc.org/ptb/genome/microbiome/16s/raw/EP311478_K90_BS1D.rawseqset.tar) for sample “76612bd9a41885add4f6b0b7689cad82”.

What do you recommend I do next? Thank you again for all of your help!

1 Like

Good to hear from you again!

Looks like your fastq manifest is off to a good start. :scroll:

I think you are pretty close! I think you should follow the examples more closely :wink:

In the fastq manifest format shown in the importing tutorial and in the pd-mice tutorial, are the files you import .tar files or .fastq files?
Have they already been downloaded to the local computer, or are they URLs?

Colin

Good to hear back from you too! I truly appreciate your help and bearing with me!

From what I see in the importing and the pd-mice tutorial, the files that you import are .fastq files. From what I’m noticing, the files in the tutorials are ones that, when loaded up, have the sample and information for the samples like the barcode sequences, etc. However, with the manifest file that I download from iHMP, the data only has the sample id’s and the URLs with .tar endings. I tried downloading the individual URLs to the local computer but they seemed to not work as well individually.

I will look closer at the examples! Let me know what you think.

Thank you,
-Hasti

1 Like

I think you are on the right track.

Keep in mind that the HMP manifest files and Qiime2 fastq manifest files are not the same format! You will have to make sure you convert everything to the right format, using the tutorial as an example.

The Qiime2 tutorial I’m referencing (pd-mice tutorial) focuses on importing 16S amplicon data from .fastq files, so that’s the kind of data I am recommending you find from HMP. Other Qiime2 plugins let you import everything from metabolites to shotgun sequences, but I think 16S is a good place to start.

We are always here to help!

Colin

Hi Colin!

I’ve been trying to dig deeper on the actual iHMP website and I found this link:
https://www.hmpdacc.org/hmp/resources/download.php

From what I’m realizing now, I have the file formatted in the right way in terms of the column names (which I used the Qiime tutorial as a reference). The problem that I still have at the moment is with the URL’s and the absolute-filepath. What I’m noticing is that while I do have the manifest and metadata (which I was able to successfully import as a .qzv file) downloaded, the URLs on the manifest file have an absolute-filepath from the aspera server. From the link above, I also requested access to iHMP data which may be one of the steps needed to get closer. At this point, I downloaded Aspera into QIIME2 and I’m not sure where to go from there.

Let me know what you think! I don’t want to go deeper into a rabbit hole with Aspera if its not necessary.

-Hasti

1 Like

Hello Hasti,

That's correct!

The DACC is giving you an aspera manifest file, which you can use to download these files.
Later on, you make a Qiime 2 Fastq manifest file to import them.

That link you posted is for HMP 1, where as we are discussing HMP 2 (also called iHMP).

In the Cart, there is the How do download files in my Cart?


When we click here we find the portal_client that works like this:

portal_client --manifest /path/to/my/manifest.tsv

Let me know if you can get the portal_client installed and some files downloaded!

Colin

1 Like

Ohhhh! Wow, there is so much to learn.

So on GitHub link that you added for the portal_client, I’m noticing that the code that its running is for Python specifically. Do I need to install Python outside of Qiime2 Virtual Box (which I use) in order to download the files or should I download Python within Virtual Box? I downloaded the portal_client to Qiime2 on Virtual Box and when I run the code that you inserted, it doesn’t work. I figured this would happen. So how should I go about installing Python if its necessary?

Thank you!
-Hasti

1 Like

That's what I would do! In fact, the qiime2 conda environment already includes python, so you could use that python and install this software using pip.

Here's the guide for installing it using pip: portal_client/INSTALL.md at master · IGS/portal_client · GitHub

Once installed, hopefully you can run
portal_client --help to see that it's installed and then
portal_client --manifest /path/to/the/HMP2-manifest-you-downloaded.tsv

I appreciate your willingness to work through this. A lot of this process assumes prior knowledge, and can be confusing the first time through (and a hassle all the times after that!)

I think you are making good progress.

Colin

1 Like

Perfect. I seem to be running into another problem. I created a new directory named portal_client as directed, opened it using cd portal_client, and this is what happens when I put in the sudo command:

(qiime2-2019.10) qiime2@qiime2core2019-10:~ mkdir portal_client
(qiime2-2019.10) qiime2@qiime2core2019-10:~ cd portal_client
(qiime2-2019.10) qiime2@qiime2core2019-10:~/portal_client$ sudo pip3 install
[sudo] password for qiime2:

I’m slightly unsure about what the password would be. I tried to put in the actual password that I use to get into Qiime 2, which is just “qiime2” but a message pops up saying “Sorry, try again”

Any input? :slight_smile:
Thank you for bearing with me as I try to figure this out. I know that I’m so close to success and I don’t want to give up now! It is confusing since I don’t have the prior knowledge but I’m so grateful to be learning so much from you. I truly appreciate it.

Hasti

1 Like

Trying running
pip3 install
so without sodo and see if that works.

Hey, devs, what’s the sudo password for the Q2 VirtualBox image?
(Or what’s a better way to install this?)

1 Like

Alright, I tried the command and here’s what popped up:

(qiime2-2019.10) qiime2@qiime2core2019-10:~/portal_client$ pip3 install

Command ‘pip3’ not found, but can be installed with:

sudo apt install python3-pip

(qiime2-2019.10) qiime2@qiime2core2019-10:~/portal_client$ sudo apt install python3-pip
[sudo] password for qiime2:

I’m guessing I will need a password if not another method for installing it :open_mouth:

What about
cd portal_client
pip install .
(So pip instead of pip3, and just a period at the end, and running this from inside the portal_client folder)

1 Like