Barcode Files from HMP data and Merging two datasets with different read length

steffi · January 14, 2020, 4:00pm

Dear All,
I am not sure whether I can ask this question here or not. I have a set of paired end fastq files and I want to compare with the my data with the HMP project. My data is de-multiplexed. But I could not find any information whether data which in HMP website are de-multiplexed. Where do I know the same and where I can find the barcode information for those data?

And My second part of question : I have two sets of fastq files with different read length. For example one set of files has 251 bp and the second set has 276 bp. How do I analysis together? I encountered an error while giving the truncated length in DADA2 step. How can I achieve this?

jwdebelius · January 14, 2020, 7:17pm

Hi @steffi,

The HMP1 data (Huttenhower et al) can be downloaded in a demultiplexed format from the HMPdacc site. You can also get processed tables through qiita accessions 1927 and 1928. I'm not sure how to get the HMP 2 data, but if you're working through EBI or SRA, it will be demultiplexed.

The issue you may (will) run into in combining the data will likely have to do with primer region and sequencing technology. HMP was sequenced with 454 pyrosequencing and has longish reads (Illumina is approaching that length again, though). So, you can denoise them with DADA2. However, primers and hypervariable region make a difference! So, you need to make sure that your data uses the same hypervariable region as the HMP. If not, current best practice is cluster closed reference against a database because then you're comparing against the same scaffold. No, it's not ideal, but it's kind of your best bet.

Finally, when you do analysis, try to consider study effect in your statistics! Try to pick multivariate models that let you account for variation due to study and/or look for replication. If you use all the body sites, the study effect may be hidden so check within bodysite as well.

Best,
Justine

steffi · January 18, 2020, 1:49pm

Hi
Thank you for clarifying that HMP database has demultiplexed data. I hope that they have raw data (without any prepossessing steps). I have filtered and downloaded 103 samples from the IBDMD study . How can I get the metadata for those? I checked IBDMD database ( HMP database) and got the metadata (https://ibdmdb.org/tunnel/products/HMP2/Metadata/hmp2_metadata.csv). Can I use the same metadata for data downloaded from HMP database?

jwdebelius · January 19, 2020, 12:17pm

Hi @steffi,

I'm less familiar with the IBDMD study. I would assume the metadata on the webpage is the metadata you need for your IBD patients. I think the best place for metadata from the original HMP (HMP 1) may be qiita; but you may also find it on HMPDACC I'm just not sure where.

Best,
Justine

steffi · January 20, 2020, 9:26am

Thank you for the reply. I got the meta file from IBDMDB

system · February 20, 2020, 3:26pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.