How do I demultiplexed and pairwise merge double-end sequencing results that contain only Forward sequence barcode sequence?

KonradV · October 25, 2023, 8:44am

I used the following command to import the data. Many .gz files containing forward and reverse sequences. But only the forward sequence contains the barcode sequence.
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path total/manifest.txt
--output-path total.qza
--input-format PairedEndFastqManifestPhred33

Then I'm stuck. What command I should be using to demultiplex them and then pair them up, well after that do the dada2 process?
I try to use 'qiime cutadapt demux-paired' or 'qiime demux' or 'vsearch', but it failed.
Also some of my numerous sequencing results use the same barcode, but they have different 'sample-id', will this interfere with my subsequent process?

KonradV · October 25, 2023, 10:21am

Specifically, if I run demux-single, then all reverse sequences become unclassified, and if I do demux-paired but my reverse sequences don't have barcode. How do I pair them after demultiplexing them?

KonradV · October 25, 2023, 2:42pm

I tried. I tried the following code, trying to piece together data from double-ended sequencing before treating it as single-ended data

qiime vsearch merge-pairs
--i-demultiplexed-seqs 1.qza
--o-merged-sequences 1joined.qza

Then I used the traditional 'cutadapt' command to split the barcode and non-biological sequences and had a problem.

qiime cutadapt demux-single
--i-seqs 1joined.qza
--m-barcodes-file total/metadata.tsv
--m-barcodes-column barcodes
--p-error-rate 0.3
--o-per-sample-sequences demultiplexed-seqs.qza
--o-untrimmed-sequences untrimmed.qza

(1/1) Invalid value for '--i-seqs': Expected an artifact of at least type
MultiplexedSingleEndBarcodeInSequence. An artifact of type
SampleData[JoinedSequencesWithQuality] was provided.

Well, he doesn't recognise merge data, so how do I convert this data type?

qiime tools import
--type MultiplexedSingleEndBarcodeInSequence
--input-path 1joined.qza
--output-path multiplexed-seqs.qza

There was a problem importing 1joined.qza:
1joined.qza is not a(n) FastqGzFormat file:
File is uncompressed

Ok, the world wasn't as simple as I thought it would be, and I broke down. Please help me.

colinvwood · October 25, 2023, 5:29pm

Hello @KonradV,

You don't need to merge pairs before demuxing. The cutadapt demux-paired command does not require barcodes in both read directions, so this will work fine for you.

KonradV · October 26, 2023, 1:25am

@ colinvwood
Wow! I am very happy to know that cutadapt demux-paired command does not require barcodes in both read directions. But this command only accept "MultiplexedPairedEndBarcodeInSequence" artifact but not "SampleData[PairedEndSequencesWithQuality]". My data is in multi .gz files, which I will only import via

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'

How do I make my data of "MultiplexedPairedEndBarcodeInSequence" type? Or how do I make cutadapt command to operate on 'SampleData[PairedEndSequencesWithQuality]' type of artifact?

Very thank you!!!

colinvwood · October 26, 2023, 5:01pm

Hello @KonradV,

If your reads are already separated into multiple files then they are likely already demultiplexed. Where did you get these sequencing files from?

KonradV · October 27, 2023, 1:10pm

Hi, @colinvwood
I did several sequencing rounds in separate runs, which produced multiple sets of NGS sequencing data, and they were saved in separate .gz files. They are mixed barcode strategy, not using the same barcode, so it is possible to distinguish them based on barcode. My library building strategy is index i5+i7 while using the barcode sequence at the 5' end where index i5 is located.

The difficulty I'm having is that I want to manipulate these files all at once. Since I don't know how to split the barcode sequence to form a barcode.fastq file, I can't use the EMP import method. Also I have imported my data by way of MultiplexedPairEndBarcodeInSequence, but it seems that this command doesn't support a folder with many .gz files? Eventually I managed to import them as artifacts of type Sampledata[PairedEndSequenceWithQuality]. but I can't go further though with the cutadapt and demux commands because neither of them accept artifacts of type Sampledata.

That's my problem, they are not demultiplexed. The files that are not being demuxed are split into multiple .gz files.

Thank you for your reply.

colinvwood · October 30, 2023, 8:03pm

Hello @KonradV,

Do you have one forward and one reverse file for each of the sequencing runs? If so then you can merge each of the forward files and separately merge each of the reverse files, and then you can follow the instructions here to import them. You can use the zcat command to merge the files.

KonradV · November 2, 2023, 10:53am

@colinvwood
Hi, thank you very much for your advice, but I'm a linux noob and I'm not really sure what the zcat command is used for, it seems like as a command to display the contents of the files in a zip file?
I used another command: cat, to put my files together and then finished the species composition analysis using the file of type MultiplexedPairedEndBarcodeInSequence. Anyway, thank you very much for your help. It would be great if you could tell me how to use zcat and how different it is from the cat command.

colinvwood · November 2, 2023, 4:04pm

Hello @KonradV,

zcat does exactly what cat does but for compressed files, which I assumed you had.

system · December 3, 2023, 10:36pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.