I retrieved 2 directions (forward and reverse) 454 sequencing data (already in fastq format), and would like to use dada2 denoise-pyto function. However, it only supports single file. Is there any ways to combine the outcomes from both directions? As far as I understand, that is not a “paired-end” data like illumina.
(Matthew Ryan Dillon)
After some searches in the forum here, I processed my forward and reverse data separately using dada2 denoise-pyro. and then applied feature-table merge with --p-overlap-method sum for the tables and feature-table merge-seqs for the ref-seq files.
I think it was the best option to handle the data here.
But, to finish I’d like to go with an unrequested thought In the hope it will be helpful! What is puzzling me is how you end up in having two separate files. With a 454 run I would expect to receive a single fasta file and its companion quality file, which I would merge to get a fastq file.
Before proceeding with the analysis I think would be super helpful for you to understand which preprocessing steps were applied to the original sequences. Were your sequences separated by their orientation compared to a reference? (I honestly don’t remember if with 454 it is possible to get mixed-oriented sequences), or were they split by removing an adapter in the middle of the sequences (which would suggest a 454 paired end library?)
What are the length distributions for the two sequence files?
I first thinks for your suggestion and will reorient the reverse sequences. That is really helpful.
I retrieved this fastq 2 directions 454 sequencing data from SRA database. The authors used qiime 1 converted both fasta and quality files before the fastq files submission. Both directions are around 620 bp.
Roche had an protocol for forward and reverse directions sequencing, but each sequence was from an unique fragment. That is why it has both directions but not paired-end reads. Some people discussed about this in biostars years ago.
Sorry for keep you waiting
Just to clarify, to use rescript to reorient the sequences you have to import forward and reverse as separate sequences file. But given these are not paired end files technically speaking I was supposing you were doing this anyway!
Now, for the denoising bit, I have a concern I hope someone more deep into dada2 can help to resolve (@benjjneb, I hope?? ). If you revert the sequences before denoising step, is having the low quality stretch at the beginning instead of the tail of the sequences, going to break any dada2 pyro assumption on the data?
As alternative, you could denoise separately forward and reverse, then taxonomic assign the ASVs, and finally merge the two results after taxonomic assignment (the negative side of this, you may not be able to do phylogenetic test on these data).
Hope it helps
In principle, no that should be fine as the quality scores indicating those lower quality bases are at the beginning of the reads will guide the dada2 denoising.
However, the critical thing for dada2 is that the oriented/re-oriented reads all start at the same position. This may (probably will) require trimming the re-oriented reads so that they start at the forward primer position.