two different sequencing chip with the same barcodes, how to use Dada 2?

PatoUru · April 25, 2019, 10:10pm

Hi All,
I think I have a big problem !! Maybe you have a solution for my case.
We sequenced 40 samples on the ION torrent platform. These samples were sequenced in 2 different chips (20 samples were sequenced in each chip). The problem is that the same 20 barcodes were used in both chips.
To analyze these 40 samples together with DADA2 in order to obtain ASV … is there any pre-processing to join the two fastq without mixing the sequences of 2 different samples that have the same barcode?
I thought about demultiplexing the 2 chips to not have that problem with split libraries, but according to the tutorial of qiime2, DADA2 only accepts files with fastq format.
I hope I have been clear about the problem,
Many thanks!!
Patricia

Mehrbod_Estaki · April 25, 2019, 10:32pm

Hi @PatoUru,
The first comment I want to make is that the current version of dada2 in qiime2 is not ideal for Ion torrent data, though a future version is in works to be able to handle these. For the time being you should consider running the native version of DADA2 in R for Ion Torrent data with the recommended parameter modifications.
That being said, given that your samples were run on 2 different chips you should denoise them separately with the same trim/truncating parameters, then merge them after. That also resolves the issue of demultiplexing with the same barcodes being used in both chips.

PatoUru · April 25, 2019, 10:39pm

Hi @Mehrbod_Estaki,
Thank you very much for your quick help! Then I will take your advice for this data, and for the next ones I will wait for the new updates.
Thank you very much
Patricia

PatoUru · April 28, 2019, 2:52am

Hi @Mehrbod_Estaki,
Hope you’re well! I solved the first problem of the repetition of barcodes by replacing them for new ones on the chip1. So now I can join the two runs without problems. It was laborious for an inexperienced like me but it worked!
I would like to know what other method of clustering can I use in qiime2 that does not generate OTUs to a certain treshold as it usually has. I think that the method you suggested to me in R is going to be complicated for me. I would like to apply the ASV method (which for now you are developing for qiime2 with torrent ion data) or it could also be zOTU or swarm.
Thank you very much in advance!
Patricia

Mehrbod_Estaki · April 28, 2019, 5:45pm

Hi @PatoUru,
Just to clarify, I am not the developer of DADA2, that is Benjamin Callahan who does pop by on the Forum when called on. So for more accurate release dates, functionality etc on those plugins, he is the person you want to be looking out for, especially on the DADA2 github page.
I'm glad you solved the problem but for others who may be following this can you explain what you did? I ask because, perhaps I didn't understand the problem properly in the first place. You had 2 chips that shared similar barcodes. My recommendation was to demultiplex the chips separately so that the barcodes wouldn't be mixed up and then denoise these runs, and finally join them together after. Sounds like you had to do something different? Care to expand?

Deblur and DADA2 are the 2 denoising methods available in Qiime2 at the moment which produce ASVs. Deblur uses a pre-built error model based on Illumina machines so that isn't a good option I would think for you. In further reading, it sounds like maybe you can still use DADA2 with some modification, see this discussion here. Sounds like the denoising works well enough with Ion Torrent data albeit not as good as Illumina by default settings. Might be worth a try! Other than that I think your next open would be to perform a clustering method using 100% similarity threshold.

PatoUru · April 28, 2019, 6:21pm

@Mehrbod_Estaki A_demultiplex-seqs.qzv (297.1 KB)

First I wanted to thank you for your help!
Finally I did it as you recommended me because I understood that I was going to be able to join those demultiplexed files to continue with the analysis. But just to test, I replaced each barcode of the chip2 with a list of new ships, that had the same length as the previous ships. To do this use the following command
$ sed -i 's / CTAAGGTAAC / TCGCAATTAC / g' chip2.fastq

It is not efficient at all in time! but maybe this can help someone.

I have one more query for you. After demultiplexing I obtained longer sequences than expected. We used primers for the V4 region and obtained approximately 280 bp amplicons, and when I visualize the results of the demultiplexing of sequences, 500 bp amplicons appear.
The analysis I did was the following:

$ qiime tools import --type MultiplexedSingleEndBarcodeInSequence --input-path Chip1.fastq.gz --output-path Chip_A.qza

$ qiime cutadapt demux-single - i-seqs Chip_A.qza --m-barcodes-file mapping_file_ChipA_Bacteria --m-barcodes-column BarcodeSequence --p-error-rate 0 --or-per-sample-sequences A_demultiplex-seqs. qza --o-untrimmed-sequences A_untrimmed.qza
#Saved SampleData [SequencesWithQuality] to: A_demultiplex-seqs.qza
#Saved MultiplexedSingleEndBarcodeInSequence to: A_untrimmed.qza

What could be the problem? I attached the file A_demultiplex-seqs.qzv so you can see
Thank you!!

Mehrbod_Estaki · April 28, 2019, 6:36pm

Hi @PatoUru,
Thanks for the update and always glad to be of any help!
These long lengths are odd indeed. From what you’ve shown, there’s no reason why they should be this long unless your reads were previously modified somehow prior to demultiplexing. When you visualized the reads after importing (prior to demultiplexing) are they the expected length?

PatoUru · April 28, 2019, 7:07pm

The fastq file that I imported did not have any extra processing, this file was obtained directly from Ion Torrent. Could you tell me how I visualize the sequences after they were imported? I did not find how to do it in qiime2

PatoUru · April 29, 2019, 2:26am

I'm here again! I had analyzed these data before in qiime1! I attached the file so you can see the result of the split_libraries script.

histograms.txt (993 Bytes)
split_library_log.txt (1.8 KB)

The sequences have the correct length 260 approx in qiime1, while after demultiplexing in qiime2 the sequences have an approximate length of 500 bp. The problem should be in the import or demultiplexing of the sequences in qiime2.
what do you think?

PatoUru · April 28, 2019, 5:01pm

Hi All,
I have an ION TORRENT data set using primers for the V4 region. I perform the demultiplexing as follows:

qiime cutadapt demux-single - i-seqs Chip_A.qza --m-barcodes-file mapping_file_ChipA_Bacteria --m-barcodes-column BarcodeSequence --p-error-rate 0 --or-per-sample-sequences A_demultiplex-seqs. qza --o-untrimmed-sequences A_untrimmed.qza
#Saved SampleData [SequencesWithQuality] to: A_demultiplex-seqs.qza
#Saved MultiplexedSingleEndBarcodeInSequence to: A_untrimmed.qza

And when I visualize the results in
qiime demux summarize --i-data A_demultiplex-seqs.qza --o-visualization A_demultiplex-seqs.qzv
Saved Visualization to: A_demultiplex-seqs.qzv

The length of the sequences would seem to be twice as long as it should be. The length of the amplicons would be approximately 280 bp, but the results show that they are 500 bp. I attached the result:
A_demultiplex-seqs.qzv (297.1 KB)

Do you know what the error could be?

The previous step that I made was:
qiime tools import --type MultiplexedSingleEndBarcodeInSequence --input-path R_2018_07_19_20_09_31.fastq.gz --output-path Chip_A.qza

Thank you very much in advance!
Patricia

Mehrbod_Estaki · April 29, 2019, 3:54pm

Hi @PatoUru,
Thanks for the update again. Unfortunately my experience with Ion Torrent data is limited so I’m not sure where to go in trouble-shooting this. I’ve seen a few recent posts about Ion Torrent that seem to have the same pattern of some -though relatively small portion - longer than expected reads which leads me to think this is not unique to your case. It might be best if you could ask your sequencing facility about this and their recommendations. Or perhaps someone here with more experience with this type of system could qiime in. Sorry!

thermokarst · April 29, 2019, 4:00pm

Hi there @PatoUru!

This is not quite right --- let's look at the sequence length summary you attached:

You'll see here that almost all of your reads are ~243 nts long --- only a small handful are 435 nts long (98th percentile of read length distribution), and a few are pretty short, too (~32nts). Where are you seeing 500 nts at?

My suggestion is to chat with your sequencing center about this, since these are effectively the "raw" reads - they might have some insight for you.

Thanks!

system · May 30, 2019, 10:00pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.