Use QIIME2 in AGP

Hi,
I am extracting some samples from the American Gut Project (AGP) for a pilot project which will compare with my own dataset later. I need some help with the data filtering and analysis in QIIME2.

(1) I have finished filtering the OTU-table based on child’s age and stool sample. When I review the Feature Detail in the otu-table.qzv, feature ID only shows between 4 to 6 digital numbers rather than the detail ID with letters and numbers. How can I find these detail feature IDs?

(2) I downloaded sequence.fna file and has transferred to sequence.qza, how can I filter it based on the age 7-18 years old? Does the AGP have a rep-seqs.qza in their dataset I can use or I need to create one by myself? What command I should use if I have to import and create rep-sequences.qza? It looks like the AGP still use 97% OTU picking so I am not sure how I can create rep-sequences.qza?

  1. For alpha diversity, they have PD_whole_tree.txt, chao1.txt, observed_otus.txt, and shannon.txt. Multiple rarefactions were listed in each file. Do I need to transfer these files into QIIME2 to find the differences of alpha diversity based on different variables such as gender and race? If I need to import these files to QIIME2, how? If not, how can I calculate the alpha diversity based on different variables from the metadata?

  2. Beta diversity was described in unweighted_unifrac_ag.txt and weighted_unifrac_ag.txt, how can I import them into QIIME2 for analysis? How can I create any other analysis such as Emperor?

  3. Taxa results include File:otu_table_L2.biom, otu_table_L2.txt; otu_table_L3.biom; otu_table_L3.txt; otu_table_L6.biom; and otu_table_L6.txt. I have transferred them into these .biom to .qza files. How can I visualize this information and get the percentages for each level of taxa?

Sorry for my very lengthy questions!

Thanks,
Bing

Hi Bing,
It sounds like the AGP data you have are QIIME1 results, using 97% OTU picking. Unless if you process your data with the same exact pipeline that was used for AGP, you cannot compare these datasets. Hence, you have two choices:

  1. Process your new data with QIIME1 using the exact same pipeline that was used for AGP (you will need to use 97% OTU picking with the AGP 97% rep seqs as a reference dataset but you will need to refer to the qiime1 tutorials/forum for more details on how that is done).
  2. Process your new data and the raw AGP data with QIIME2, in which case you must disregard all results for the AGP qiime1 data. (this effectively nullified questions 1-5, since these are using qiime1 otu tables and diversity data, which cannot be compared to data processing in QIIME2)

It looks like the AGP raw data are available on QIITA in QIIME2-importable formats. You can use the denoise results — but if you do, make sure that you pay close attention to the exact trimming and denoising parameters that you select, ensure that all AGP datasets that you download used these same parameters, and that you use these same parameters on your own dataset.

Alternatively, download the raw data from QIITA and upload to QIIME2 following this tutorial, then process as you have with your own data in qiime2.

2 Likes

@Nicholas_Bokulich Thank you for your immediate response!

I decided to use QIIME2 to rerun the data. I have downloaded the sequence.fna file which is demultiplexed already. I can continue running the quality control process following QIIME2 tutorials. The question is that how I can filter the samples (aged 7-18 years old with stool samples) in this sequence.fna file?

Thanks,
Bing

Hi Bing,
Unfortunately, QIIME2 does not yet support FASTA file filtering. I have created an issue request to add this functionality in the future.

For now, I recommend using qiime1’s filter_fasta.py prior to importing into QIIME2. It looks like what you will need to do is filter the AGP mapping file to only contain samples you want (aged 7-18 years old), then use that mapping file as a --sample_id_fp like so:

filter_fasta.py -f sl_inseqs.fasta -o sample_id_list_filtered_seqs.fasta --sample_id_fp map.txt

If you have any issues with this command, please consult the qiime1 forum.

2 Likes

@Nicholas_Bokulich
I have finished the filter of the sequence.fna. Now I have an issues with the data summarize because looks like the .fna file does not have quality information. Please guide me what i should do to handle this issue! Do I need to rerun the raw sequences?

Thanks,
Bing

Hi Bing,
QIITA has demultiplexed fastq files (seqs.fastq) in addition to demultiplexed fastas. You will need to work from fastq to run through dada2 or deblur in QIIME2.

filter_fasta.py does support fastq filtering, so you will still be able to filter prior to using q2-demux and dada2.

You should probably contact AGP about this — they may have QIIME2-compatible files (e.g., merged deblur sequences/tables) that you can work from.

@Nicholas_Bokulich
Could you show me a little detail about how to download seqs.fastq from QIITA. I went to American Gut Project data in QIITA and do not know where to download the dataset I need?

Thanks for your help!
Bing

Hi Bing,
On the left-hand side of the screen, click “16S” and a drop-down menu will appear with many different sequencing runs. You will need to click through these individually to find whether the fastq files are located.

You should really get in touch with the AGP or QIITA admins, though — there is probably a much faster, easier way to get the data that you need.

1 Like

@Nicholas_Bokulich
I sent email to both AGP and QIITA groups. The AGP manager answered email so slowly recently. Hopefully will get some news soon!

If they do not have this merged sequence.fastq, I probably need to run them one by one! Could I have a phone talk with you about the downloading and running issues? I have spent so much time on this and always found I got the wrong file I need.

I tried to download the first run with .fastq and mapping file. There are still some issues with the running in Qiime2.

  1. I tried to run the seq.demux and with the following error: ValueError: seqs.demux is not a QIIME archive. Do I need to fix the extension of this file or?
  2. I tried to run the seqs.fastq in qiime2, the question is that there is no where to find the barcode.fastq file.

Could you let me know whether I can find a time to have a talk with you about these processes if I finally have to run them individually by myself?

Thanks,
Bing

Hi Bing,
You should really consult with QIITA/AGP about issues with obtaining/downloading the files that you need. Those are out of my expertise and off-topic in this forum. As for your other questions, I have tried to answer these below.

I tried to run the seq.demux and with the following error: ValueError: seqs.demux is not a QIIME archive. Do I need to fix the extension of this file or?

I also cannot import that file in QIIME2, and the extension does not appear to be the issue. Googling the issue pulled up some information on the format of this file in the qiita docs. It looks like these are in a QIITA-specific format and will need conversion before importing to qiime2 — you should consult those docs and/or QIITA admins to figure out how to convert.

I tried to run the seqs.fastq in qiime2, the question is that there is no where to find the barcode.fastq file.

That is a good point — you will need to work from the seqs.demux or get access to the raw data (with barcodes) from QIITA or AGP.

Sorry I can't provide more support here — at this point your questions are really issues with getting the appropriate raw data from QIITA/AGP. These are not really related to QIIME2 any more, and I cannot really provide support for these questions.

2 Likes

Thank you so much! Bing

@Nicholas_Bokulich

I have talked with QIITA manager. I may have to pull out all of the demultiplexed fastq file from each run. How can I import this kind of demultiplex fastq files into Qiime2?

  1. I have tried one fastq file and got the following error. What is the right way to import this type of demux fastq data?

     `(qiime2-2017.6) Jinbings-iMac:desktop jinbingbai$ qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path casava-18-single-end-demultiplexed --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-single-end.qza 
    

    Traceback (most recent call last):
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/bin/qiime", line 6, in
    sys.exit(q2cli.main.qiime())
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/click/core.py", line 722, in call
    return self.main(*args, **kwargs)
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/q2cli/tools.py", line 111, in import_data
    view_type=source_format)
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/qiime2/sdk/result.py", line 192, in import_data
    return cls.from_view(type, view, view_type, provenance_capture)
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/qiime2/sdk/result.py", line 217, in _from_view
    result = transformation(view)
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/qiime2/core/transform.py", line 57, in transformation
    self.validate(view)
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/qiime2/core/transform.py", line 131, in validate
    view.validate()
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py", line 168, in validate
    getattr(self, field)._validate_members(collected_paths)
    File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py", line 104, in validate_members
    self.pathspec))
    ValueError: Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+
    .+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz'`

  2. I have pulled out so many fastq files, is there a way to combine these fastq files as one in QIIME1 or QIIME2 before the filtering and importing into QIIME?

  3. If there is a way to combine them, after that I can filter using the mapping file sample_id, right??

Thanks,
Bing

Importing these files is described in this tutorial. The directory casava-18-single-end-demultiplexed needs to contain files with names following the format described in that tutorial.

No, do not combine those files. See the tutorial linked above. You import the directory of demultiplexed fastqs and QIIME takes care of the rest. The result will be a single .qza file for each directory you import, containing all demultiplexed sequences. NOTE that you should import each QIITA directory separately and analyze with dada2 separately. After running dada2 on each separate sequence artifact, you can merge the feature tables using feature-table merge and merge the representative sequences using feature-table merge-seq-data.

There is no way to combine, as mentioned above. QIIME2 cannot filter these samples out until they are in a feature table (post-dada2). Since the QIITA sequences are already demultiplexed, and the sequences in each directory should contain a sample ID in the name, you could remove these samples from the directories before uploading to QIIME. Unless if you can write a script to automatically do this, it would probably be easier to just process all data and filter out samples from the FeatureTable post-dada2, or see if you can get access to the raw data from QIITA or AGP and demux yourself.

1 Like

@Nicholas_Bokulich
Thank you for the patient answering my questions.

Let me talk with QIITA team again to see how I can convert that demux fastq to the ones as described in this tutorial fastq.gz!

Thanks,
Bing

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.