I need a help for importing and demultiplexing my data

Skyhalols · June 2, 2021, 5:56am

Hey there, I'm new to QIIME2, I am having some trouble when demultiplexing my data.

I have some illumina Miseq M3-M4 sequencing results from the sequencing company, they provided me some Multiplexed data and a 'barcode lookup table'.

Here is the data I got:

This is an example of the R1.fastq.gz:

Here is the 'barcode lookup table' they gave me:

Based on these, I have made myself a mappingfile.tsv:

So here's what my problem is. I tried to import my data following this:

Then I demultiplexed my data using this:

Finally, I got a tsv file looks like this on QIIME2 View:

As you might notice, I only have a couple thousands of reads of my 72, 73, and 74 samples. But in face, I should have couple of ten thousands of reads.

How do i solve this problem? Can any one help in this regards?
Thanks for your valuable time.

llenzi · June 2, 2021, 8:27am

Hi @Skyhalols,

Unfortunately I don't think the current version of QIIME2 demux command is able to demutliplex the "combinatorial barcodes" as in your case (where the same barcode sequence is used in more than one sample), please see:

I believe you have to find a tool to demultiplex the data outside QIIME2, then import them as demultiplexed samples.
Keep up posted how it goes
Luca

Skyhalols · June 3, 2021, 4:57am

Hi Luca,

Thanks for the help.
I tried to find the barcode in the reads myself, but I couldn’t find any of it in the sequence.
Is it possible that the results are multiplexed, but there’s no barcodes in the reads? If so, is it possible for me to demultiplex the results?

Thanks,
Yuan

Skyhalols · June 3, 2021, 4:57am

Also, could you recommend a software or a tool for me to demultiplex the 'combinatorial barcodes'? Because I am quite new to this subject and I do not have a background in computer science, I'm feeling quite lost when searching for other tools besides QIIME.

I'm looking at this one, but I don't know if it's useful to me or not.

Best,
Yuan

llenzi · June 3, 2021, 9:52am

Hi @Skyhalols,

For me would be unusual to get multiplexed files without barcodes, but I suppose it is in the hands of your sequence provider. Could you contact them to clarify this point?

The cutadapt installed in your qiime2 environment should be able to do the job for you (as long as is version >= 2.4), but you should look at the cutadapt documents and not at the q2-cutadapt plug in in this case.

An example could be as described in here:

Hope it helps
Luca

Skyhalols · June 6, 2021, 3:07pm

Hi Luca,

This is a really interesting example, I am studying cutadapt in depth and try to demux my sample with it.

Just want to ask a very starter question:

So, the sequencing company provided me this 'barcode look up table'. (P.S. this sequencing company does not use QIIME/QIIME2 for data analysis)

We can see that the Read2RC has either TCTCGCGC or AGCGATAG, two barcodes.
Then I opened the fastq file, and summarized the first barcode on the @ line of each read:

An amazing thing happened! These two barcodes: G(N)CGCGAGA, and C(N)TATCGCT was all I can found.

Does this mean the barcodes that the sequencing company provided me was wrong? (the 'Read3' were all matched though, only this 'Read2RC' was not matched).

Finally, does the 'Read2RC' normally means the 'forward barcode' and the 'Read3' means the 'reverse barcode' in Illumina sequencing?

Thanks so much for your patiently help!

llenzi · June 7, 2021, 8:51am

Hi @Skyhalols,

The column names for your barcode lookup tables are a kind of mystery to me, really. The use ‘OTUID’ here, it would imply to me that the sequences are already processed to get some clusters, which it does not seem the case giver you have R1 and R2 file pairs, which should relate to a sample each. I would assume ‘Read2’ → R1 and 'Read3 → R2, as first instance, but is a guess at the moment. In which read did you find the barcode ‘G(N)CGCGAGA’ ?
This sequence seems to indicate that you should find the reverse complement of Read2RC barcodes, as its reverse complement is ‘TCTCGCCG’ as expected from column Read2RC. That makes me wonder on which overall orientation are your reads, that is important to know for your analysis to be able to work with them (merging or aligning).
I would get back to your provider asking for more information on the design of the library they used and how demultiplex the data rather than try to guess, it may be quicker.

Hope it helps
Luca

Skyhalols · June 8, 2021, 5:46am

Yes, Luca, it looks very confused.
The reason behind this is that the sequencing company already provided us the .biom files, which leads to the OTUtable.txt. After this, we would like to learn and do data analysis by ourselves in the future. So, we asked them to provide the raw/untrimmed reads and the related barcodes.
But as I said, this company does not use QIIME for data analysis, so probably the 'barcode look up table' the provided has some problems?

What I am sure is my R1 means forward.fastq where my R2 means reverse.fastq, as the respective primer sequences were labeled in the file. I searched on the internet and found out read2 means the illumina instrument were processing all reads from the 5' to 3' (forward), and read3 means to processing the same reads from the 3' to 5' (reverse). So, I guess the Read2 is for R1 (forward) and Read3 is for R2 (reverse), too.

Well, this is the real question that confused me. I tried to find the read2/read3 barcodes in the reads. But I couldn't find them in any reads (As least in the ~10 reads that I investigated). That's why I asked you this:

Also, I copied 3 reads from the same sample （they should have the same barcodes） and pasted them in an excel sheet. I tried to find out what is same between them. But I failed! These three reads do not have a same sequence with 8 bp length!

Well, I am the guy in our team to process data, where my partner is the guy to contact the sequence company, she guaranteed that she already got everything she can from the company. So, . I really have no idea whether I can solve this or not.

Anyway, I'll try to use cutadapt to process my files with combinatorial barcodes. But I couldn't find the barcodes in the reads is the real thing that concerns me.

llenzi · June 8, 2021, 9:03am

Well, this is the real question that confused me. I tried to find the read2/read3 barcodes in the reads. But I couldn’t find them in any reads (As least in the ~10 reads that I investigated). That’s why I asked you this:

I think you have to: 1) in read 1, look for the reverse complement of the sequences you find in 'Read2RC' column of your loo up table; 2) in read 2 look for the sequence as it is in the column 'Read3' in your look up table.

Also, I copied 3 reads from the same sample （they should have the same barcodes） and pasted them in an excel sheet. I tried to find out what is same between them. But I failed! These three reads do not have a same sequence with 8 bp length!

Did you look for exact matches or allowing a degree of mismatches, using only exact matches may be the reason why you did not find any sequences.

Let us know any progress
Luca

system · July 9, 2021, 3:04pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.