About the validation of metadata file using keemei plugin


I’m trying to validate the metadata file using keemei plugin. I created a table with two empty columns :
I didn’t mention any information in those two columns because i used the same forward primer for all the samples and i don’t have the exact combination of barcode sequences.

Keemei don’t validate this table. Is there any solution to validate it without taking into consideration those information BarcodeSequence and LinkerPrimerSequence ?

Thank you

Hi @M_F,
Welcome to the forum!

Can you give us a bit more detail as what you are planning to do exactly? It is unclear what type of files you have and what you need.

Are your files already demultiplexed? If so, then why do you need these 2 columns at all? Unless for some reason demultiplexing didn’t remove those sequences and your plan is to remove them through another step?

Note that the BarcodeSequence and LinkerPrimerSequence usually do not refer to your forward primer sequence. That would be its own column.

Does that mean your files are not demultiplexed ? If not, then how do you plan on demultiplexing them without the barcode sequences? You’ll need to find those.

By the way those 2 columns are not required so you can just leave them out and run keemei on your metadata file without them.

1 Like

Thank you for the response

concerning the LinkerPrimerSequence i have the forward and reverse primers used to amplify v3-v4 regions of all samples as you mentioned this column does not refer to your forward primer sequence that’s why i’ve kept this column empty without information.
What they mean exactly by LinkerPrimerSequence ?

I find the barcodes i will give you an example for a sample X i found
I7_Index_ID N701 index1 TAAGGCGA
I5_Index_ID S517 index2 GCGTAAGA
noting that for 16SrDNA sequencing of bacterial DNA (v3-v4) using illumina technology dual indices and Illumina sequencing adapters were attached using the Nextera XT Index Kit.

For the sample X shall i put barcode sequence TAAGGCGA corresponding to the index 1 or may i use index 2 ?

FINALLY if it’s not mandatory to keep those columns i can remove them.
Thank you for your help

Hi @M_F,
I'm still not sure if your reads are demultiplexed or not. If they are demultiplexed you don't need to worry about the barcode sequences at all.
I think you might be getting some of this terminology mixed around, which is totally normal, they can be very confusing! Check out the image below

The I7 and I5 that you have sequences for are the adapters that bind to the flowcell. The index1 and index2 are the barcodes which will be unique to each sample. The linker is what allows PCR1 and PCR2 to be merge their products.
Basically the goal is to get rid of everything except the Gene target.
If you have demultiplexed reads (each sample has their own forward + reverse fastq) then you don't need to know the barcodes because that has already been taken care of for you. If you have only 1 forward and 1 reverse fastq for all of your samples, then you will need to find out your barcode sequences so you can demultiplex the reads. If the primer and linker are still in your reads, you will want to use q2-cutadapt to remove those as well.

If you are unsure about these, you should discuss these with your sequencing facility, they will have these answers for sure.

Hope this helped!

Hi Mehrbod,

Thank you so much for all those clarifications.

The samples are demultiplexed automatically after sequencing by the person in charge of the platform.
Now it becomes clear, i thought in the beginning that the BarcodeSequence and LinkerPrimerSequence columns are mandatory to validate the metadata file, it is not the case.
The image that you sent to me clarify the terminology linker index…

Thank you so much for your help it allows me to resolve the problem.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.