Importing two directions 454 seq data for denoise-pyro

Payton · November 5, 2020, 4:48am

Hi all,

I retrieved 2 directions (forward and reverse) 454 sequencing data (already in fastq format), and would like to use dada2 denoise-pyto function. However, it only supports single file. Is there any ways to combine the outcomes from both directions? As far as I understand, that is not a "paired-end" data like illumina.

Payton · November 6, 2020, 3:46pm

After some searches in the forum here, I processed my forward and reverse data separately using dada2 denoise-pyro. and then applied feature-table merge with --p-overlap-method sum for the tables and feature-table merge-seqs for the ref-seq files.

I think it was the best option to handle the data here.

llenzi · November 6, 2020, 4:20pm

Hi @Payton,

Welcome in the forum!

It probably would be better to reorient the reverse sequences using the following command before the dada2 step, to be sure the same sequences is merged correctly regardless the initial orientation

qiime rescript orient-seqs
--i-sequences query-sequences.qza
--i-reference-sequences reference-sequences.qza
--o-oriented-seqs oriented-query-sequences.qza
--o-unmatched-seqs unmatched-sequences.qza

You need to install the rescript plug in to be able to do that (https://library.qiime2.org/plugins/rescript/27/).

But, to finish I'd like to go with an unrequested thought In the hope it will be helpful! What is puzzling me is how you end up in having two separate files. With a 454 run I would expect to receive a single fasta file and its companion quality file, which I would merge to get a fastq file.

Before proceeding with the analysis I think would be super helpful for you to understand which preprocessing steps were applied to the original sequences. Were your sequences separated by their orientation compared to a reference? (I honestly don't remember if with 454 it is possible to get mixed-oriented sequences), or were they split by removing an adapter in the middle of the sequences (which would suggest a 454 paired end library?)
What are the length distributions for the two sequence files?

Hope it helps

Payton · November 6, 2020, 6:34pm

Hi @llenzi,

I first thinks for your suggestion and will reorient the reverse sequences. That is really helpful.

I retrieved this fastq 2 directions 454 sequencing data from SRA database. The authors used qiime 1 converted both fasta and quality files before the fastq files submission. Both directions are around 620 bp.

Roche had an protocol for forward and reverse directions sequencing, but each sequence was from an unique fragment. That is why it has both directions but not paired-end reads. Some people discussed about this in biostars years ago.

Payton

llenzi · November 10, 2020, 10:46am

Hi @Payton,
Sorry for keep you waiting
Just to clarify, to use rescript to reorient the sequences you have to import forward and reverse as separate sequences file. But given these are not paired end files technically speaking I was supposing you were doing this anyway!

Now, for the denoising bit, I have a concern I hope someone more deep into dada2 can help to resolve (@benjjneb, I hope?? ). If you revert the sequences before denoising step, is having the low quality stretch at the beginning instead of the tail of the sequences, going to break any dada2 pyro assumption on the data?

As alternative, you could denoise separately forward and reverse, then taxonomic assign the ASVs, and finally merge the two results after taxonomic assignment (the negative side of this, you may not be able to do phylogenetic test on these data).
Hope it helps

benjjneb · November 10, 2020, 5:28pm

In principle, no that should be fine as the quality scores indicating those lower quality bases are at the beginning of the reads will guide the dada2 denoising.

However, the critical thing for dada2 is that the oriented/re-oriented reads all start at the same position. This may (probably will) require trimming the re-oriented reads so that they start at the forward primer position.