Demultiplexing/Transforming featuredata[alignedsequence]

mgu444 · June 21, 2019, 2:52pm

I received some data from a collaborator that originally came as a .fna file that was supposed to be illumina miseq sequences. I imported it into qiime2 and it's now one giant fasta file with every sequence in the same file. I've tried demultiplexing it (since I'm trying to run dada2 later whcih requires the files to be in individual fastq file format and then phyloseq after that), but have hit a wall trying to get the any kind of file conversion/transformation. Every time I try to do something it tells me that things are incompatible because the artifact is is in the form FeatureData[AlignedSequence]. I think I need to change the artifact or file into either an emp-single or emp-paired-end sequence, but I'm not really sure.

I wish I could provide more useful information, but I have no computer science-type background and honestly have no idea what my collaborator did to the data before I received it (the student who was working on the data graduated in 2014 so no one really knows anymore). Any help would be very much appreciated.

Thanks,
Matt

Mehrbod_Estaki · June 21, 2019, 8:56pm

Hi @mgu444,
Before diving deeper into this, are you able to get original raw .fastq files for this project? This is the most common way the data would have been originally passed on to your group and would save you a lot of hassle. You can see if a copy of those files are on your group's BaseSpace account or you can contact Illumina if unsure. Let us know if you absolutely can't get those and we'll see about carrying with these aligned-sequences.

thermokarst · June 21, 2019, 9:20pm

Worth pointing out:

Not only do they need to be demultiplexed --- more importantly, the sequences need to have quality scores. DADA2 requires the quality scores in order to perform error correction. Without it, you will need to use a different form of quality control.

mgu444 · June 24, 2019, 6:28pm

Hi,
The only file available on the collaborators cluster was that one .fna file containing every sequence in one fasta file.

mgu444 · June 24, 2019, 8:01pm

kerney_2013.qza (1.4 MB)
Attached is the .qza imported form of the .fna file. According to our collaborator, the data is paired-end and quality filtered to 1% expected error. It has been demuliplexed but is all in one fasta file. It is unclear whether rarefaction has been done. Since the quality filtering has already been done, is there a way we can still use DADA2, or do we need to use a different pipieline/program/etc?

Mehrbod_Estaki · June 25, 2019, 1:03am

Hi @mgu444,
In order for you to be able to use DADA2 or Deblur, you need quality scores, either as part of the original .fastq files or a separate file such as .qual files from Qiime1. Your .fna file does not have quality scores. Without these unfortunately you're limited only to OTU clustering methods. Have a look through this tutorial that shows exactly how you would begin with a .fna file such as the one you have. Also note that you have imported this file as [AlignedSequence] but I don't think these are actually aligned, the reads certainly don't look aligned to me when I looked at the first few reads. Re-import them as shown in the tutorial.

Also,

I'm not really sure what this means, you'll want to ask your collaborators for more thorough details of what has been done exactly to these reads to describe them in your methods section for a paper. Good reviewers would certainly ask for this.

The best scenario is still finding the original .fastq files, I would give it one last try and ask the collaborators and sequencing facility, however if you really can't locate these anymore then you can still move along as per above. Good luck!

system · July 26, 2019, 7:03am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.