Generating a mapping file containing metagenomic data

Brigitta1 · July 12, 2022, 7:00am

Hello,

I am a student who is trying to get my metagenomic sequences analyzed. I want to know how a can automatically generate a mapping file because I have about 80,000 sequences. My fasta file looks something like the following but with sequences that go on for 111 pages

ZBOGC:00037:00047

GGACTACAGGGGTATCTAATGTATTACCGCGGCTGCTGGCAC

ZBOGC:00042:00030

GGACTACGCGGGTATCTATGTATTACCGCGGCTGCTGGCAC

Please help me analyse these sequences

llenzi · July 12, 2022, 8:23am

Hi @Brigitta1,

welcome in the forum!

Usually you are working with same sample, which once it is sequenced, it provided you with a variable number of sequences (depending on the sequencing platform you could obtain fastq files or fasta files or both).

In general, the metadata are information associated to any single sample (e.g. treatment group, reagent kit, or age, diet, and so on). These will be useful for groups comparisons later on!

So, first of all, are the samples obtained by amplifying a specific 16S region, or any other marker gene (if so, which gene and region)? What sequencing platform did you use?
I strongly suggest you to look at the qiime2 tutorial as starting point (“Moving Pictures” tutorial — QIIME 2 2022.2.0 documentation), may be not your specific case but you can familiarize with generic concept that you will apply later in your analisys!

Best wishes,
Luca

Brigitta1 · July 12, 2022, 12:46pm

Thank you for the quick response Luca.

My samples were obtained by amplifying the V4 domain of bacterial 16S rDNA. The platform used was ion torrent. I am using QIIME 1 software and I am still new to all of this. I got the sequences in BAM format and I converted them to FASTA and FASTQ formats. I need to output an OTU table so I am following the scripts given in qiime.org.
I need to generate a qual file so I decided to run the following code given on qiime.org
convert_fastaqual_fastq.py -c fastq_to_fastaqual -f seqs.fastq -o fastaqual
The code doesn't work even after installing miniconda.

Also, i need to obtain a split library file that includes information regarding the number of sequences that pass quality control so I need to run the following code,
split_libraries.py -m Mapping_File.txt -f 1.TCA.454Reads.fna -q 1.TCA.454Reads.qual -o Split_Library_Output/

For that, I need a mapping file. According to the information given in qiime, I have to generate a mapping file by hand which in my case is impossible because I have more than 80,000 sequences.

Your input in helping me figure out my way around qiime is greatly appreciated.

Thank you,
Brigitta

llenzi · July 12, 2022, 2:53pm

Hi @Brigitta1,

thanks for the information! The main question then is if you have installed QIIME1 as well as you have its conda environment active while you running the python script.

The metadata file includes the information related to each sample you have processed, then you will have all the sequences produces for a sample in its fastq file.

Again, I would suggest to work with qiime2 and not qiime1 which may be more tricky to use.
In particular, I found thesethreads related to Ion Torrent data:

After you import the sequences inti qiime2, you will be able to export the number of sequences per each sample easily!

Hope that helps
Luca

Brigitta1 · July 15, 2022, 4:57am

Thank you for your support Luca.

I have qiime 1 installed. I'm accessing the software via a virtual machine. The reason I chose to continue the analysis process with qiime1 is because I have already found step-by-step scripts to produce an otu table using the fasta files I have using qiime 1. I was unable to find scripts to analyze my data using qiime 2.

I am working on activating miniconda3. The problem I'm facing is that I don't know how to generate a mapping file.

Best wishes,
Brigitta

llenzi · July 15, 2022, 8:36am

hi @Brigitta1,

the description of the metadata file for qiime1 is in the following page:
http://qiime.org/documentation/file_formats.html

if your qiime1 environment is correct and active, you can use the script to check your file:
:
validate_mapping_file.py

Please be aware that qiime1 is much more strict in the column name you can use in the metadata file.

As general note, qiime1 is not supported in this forum. Feel free to ask if you have any problems with its following scripts, I know there will be people happy to help! However, qiime1 is considered (and treated as) 'other informatic tool' and help is not assured for this category!

As unsolicited suggestion, if you have to start to learn a tool to analyze metagenomic data, I would not advise to invest effort in qiime1 which is no longer developed or supported (its methods are still valid tho! ). I used both, qiime2 is so much easier to use ...

Hope it helps!
Luca

Brigitta1 · July 15, 2022, 10:25am

Thank you so much Luca!

Brigitta1 · July 16, 2022, 3:19am

Hey Luca,
Can i know whether it is possible to construct an otu table using Qiime2. I went through the Qiime2 tutorials and i was unable to find a way to do it.
Best Regards,
Brigitta

llenzi · July 18, 2022, 8:21am

Hi @Brigitta1,
Thanks for opening a second thread, makes much more sense to keep thing tidy in the forum!
Thanks to @colinbrislawn, already answered!
Luca

system · August 18, 2022, 2:22pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.