Problem with demux

demux
pending-development

#1

Dear Qiime2 team,

Happy new year!

I am trying to learn how to use Qiime2. I have already tried the tutorials and now I am working on my own data.

I am dealing with two runs of pair end data. Unfortunately, I did not get a barcode.fastq file from the sequencing company. I could not find a way to transform my data to what is needed for Qiime2 according to the “Atacama tutorial”, so I have done the following.

  1. I started off with files provided by the company that had already been joined.
  2. I then concatenated the fasta files from both runs, as well as the qual files from both runs.
  3. Converted them to fastq
  4. Extracted the barcodes with Qiime1 (extract_barcodes.py) without reverse complementing them (when I did reverse them, I only had 3 samples left).
  5. Compressed fastq files to fastq.gz
  6. Imported them into qiime2 as single end sequences
  7. Demultiplex them as single-end reads
  8. Summarize the data

When viewing the results, I obtain this very weird graph:

Screenshot from 2018-01-02 21-16-21

I would very much appreciate if anyone could help me understand this graph, and where may I have made a mistake.

Thanks a lot!

Aileen


Importing format with multiplexed fastq format, single read iwth barcode data in the reads, both forward and reverse directions
Same classifier, same otu, different sequencing run = different classification
Mixed_oriented_reads
(Nicholas Bokulich) #3

Hi @Linnie,
Sounds like you are having similar trouble to this user. I think the funny quality plot that you are seeing is technically okay — it looks like maybe what is happening is the read joining method used by the company is assigning an arbitrarily high PHRED score to the overlapping bases. The non-overlapping bases are unadjusted, leaving you with a goofy looking plot. But I think that is okay! (though the lack of real quality scores may impact the ability of methods like dada2/deblur to detect noisy sequences within this overlapping region :disappointed:)

Check out this tutorial on working with joined paired-end sequences — note that the quality plots on that page look similar to yours.

I hope that helps! Good luck!


#5

Hi @Nicholas_Bokulich,

Thank you so much for your answer!!

I have a follow-up question. I have just realized that the company attempts to join reads if the amplicon is longer than 300bp, which is not my case (I used 515F-806R). Given the length of my amplicon, would joining the fwd and reverse reads be of bad quality?

If this is the case, would it be better if I just analyze the data as single reads without joining them? Since the company provides a mixture of forward and reverse reads in each of the R1 and R2 files, how could I proceed? Should I reorient the reverse reads in both files, concatenate the files and remove primers and adaptors, then import them to qiime2 as single end sequences?

I guess my problem is the same as this one: Import multiplexed R1.fastq and R2.fastq with mixed forward and reverse reads + truncate reverse primer (although our amplicon sizes are different) and the same as the link you sent earlier. It seems we cannot import the files into qiime2 with PairEndSequences because we have fwd and rev reads in each of our R1 and R2.

Thank you so much for your help, I really appreciate your time!

L.


(Nicholas Bokulich) #6

Hi @Linnie,

If it’s possible to separately upload these reads into QIIME2, I would personally go that route and run through dada2 (which will join reads after QC).

It sounds like that might not be possible, though.

It is worth using your data as-is and see what happens — if the results seem rather noisy upon further inspection, it may be that the paired-end joining prior to dada2 is preventing detection of sequence errors. @benjjneb — do you have any thoughts on how read joining prior to dada2 would impact error detection?

I am not sure if this will prevent importation, but having reads in mixed orientations will cause issues for alignment and ASV/OTU detection downstream. The post that you linked to has the additional issue that reverse primers are contained in the read (which also implies that non-biological sequence is downstream of that!), which should not be an issue with your reads. So perhaps we can make something work.

Perhaps you could re-orient your reads, e.g., if the reverse primer is detected in a read then reverse complement it? (and vice versa for the reverse read). If primers are not still in reads, perhaps there is still a way (outside of QIIME2)… e.g., it sounds like you do have barcodes in-line in the reads:

  1. Concatenate R1 + R2 --> “forward” reads
  2. Concatenate R2 + R1 --> “reverse” reads
  3. extract barcodes from “forward” reads.
  4. Import into QIIME2 as paired reads.
  5. demux and only read pairs in the correct forward orientation will be retained.

Would that work?

Alternatively, with a custom script you could determine whether the first X bases match an expected barcode; if yes, write out the forward/reverse reads as they are, and write out the barcode to a barcodes file; if no, determine whether the first X bases on the reverse read match an expected barcode; if yes, write out the reverse complements, and write out the barcode to the barcodes file.

I hope one of those solutions will solve your issue! Please let us know — this could help us decide how best to support sequence data following your setup in the future.


Stuck on DADA2 denoising
#7

Hi @Linnie,

Indeed, it sounds to me, that you encounter the same problems I had (with the same company?).

If I understood right, the company generates first an amplicon using the barcode+forward primer and reverse primer (without technical sequencing adapters). Then, they clone the amplicon into a vector, which introduces the Illumina sequencing adapater (this reduces PCR bias). I am not sure about the details, but you end up with amplicons oriented in both directions. Therefore, the company joins the reads from the forward run and the reverse run and re-orientates everything in the same direction. For joining they allow a lot of mistakes.

Therefore, I ended up with asking the company for the raw sequencing data (they were downloadable from Illumina basespace) to join and re-orientate them by my own allowing as less joining mistakes as possible. I found this discussion very helpful. Finally I ended up with the following qiime1 script:

extract_barcodes.py
-f SAM1-19_S3_L001_R1_001.fastq/data
-r SAM1-19_S3_L001_R2_001.fastq/data
–attempt_read_reorientation
–input_type barcode_paired_end
–bc1_len 8
–bc2_len 0
-m mapping_SAM1-19.tsv
-o SAM1-19_S3_L001_R1_001.fastq/

join_paired_ends.py
-f SAM1-19_S3_L001_R1_001.fastq/reads1.fastq
-r SAM1-19_S3_L001_R1_001.fastq/reads2.fastq
-b SAM1-19_S3_L001_R1_001.fastq/barcodes.fastq
-j 128
-p 0
-o SAM1-19_S3_L001_R1_001.fastq/

For you, it would maybe make sense not the join the reads, but simply either use the forward or the reverse run and re-orientate the sequences in the same direction.

Good luck :+1:


#8

Hi @Nicholas_Bokulich,
thank you so much for such a thorough answer.
I will give a try to some of those options and let you know how it goes :slight_smile:


#9

Hi @Martin,

thanks for your message and the scripts! I will definitely try them as they seem exactly what I need.

Thank you also for bringing to my attention the qiime1 forum discussion, I also find it very useful.

I also happen to have the reverse primers in the sequence, so I will follow the steps that you and @thermokarst described in your thread. Thank you again! They are very informative.

So, would the pipeline look like this? Extract barcodes, join reads, import your sequences to Qiime2 with the multiplexed single-end protocol, demultiplex, then trim fwd and reverse primers, deblur?

And one last question (sorry, I am not very familiar with Ilumina data yet): why wouldn’t it make sense for my reads to be joined? (As a bit of background information, just in case it helps, my amplicon is 291bp long and I have used 2*251bp sequencing).

Thanks a lot again!


#10

Hi @Linnie,

happy to hear that I could help a bit.

Yes, I did it the same way :slight_smile: (with raw basespace data) No guarantee, if there are more straight forward ways to handle these data.

Sorry, I thought your amplicon was smaller than 251 bp. Since it is longer, you should probably also join the forward and reverse reads. :+1:

Best,
Martin


#11

Thank you so much @Martin!!
I really appreciate your help!
L.


(system) #12

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.