Summarizing & denoising DNASequencesDirectoryFormat .qza files

AttilaTheBun · July 26, 2021, 8:49pm

Hello! I have a few issues that I believe are all relating back to the same thing:

I demultiplexed a .fna file independently outside of qiime2 and output multiple .txt formatted files as a result, like so:

The files all look like this when opened in vim:

The sequences themselves are single read and unaligned, so I imported them into .qza files using the following code as directed from the qiime2 import documentation.

qiime tools import \

--type 'FeatureData[Sequence]' \

--input-path path

--output-path path

Then following along with the moving pictures tutorial, I checked my new files, all of which look like this:

I merged all of the separate .qza files using this code from this forum topic:

qiime feature-table merge-seqs --i-data data files --o-merged-data dplx.qza

The resulting single .qza file mimics the type and format of the previous files

Now, I notice that this file, unlike that of the example in the moving pictures tutorial, is not of EMPSingleEndDirFmt format and is not responding to the "qiime demux summarize" command. I get the following error when I try to summarize the data:

My problem is this:
I do not know how to coerce my data to fit into one of the suggested types in the final error message posted, nor do I know how to tell if my data is emp-compliant as the moving parts tutorial suggests. I also seem to be missing the point of the DNASequencesDirectoryFormat, as the file and directory formats page makes it seem like a catch-all class.

Help getting my files into a format in qiime2 that will enable me to get the summary data I need and progress to denoising with dada2 would be invaluable.

jwdebelius · July 26, 2021, 9:56pm

Hi @AttilaTheBun,

First, your name is fantastic! Is it a reference to or ?

If you don't have the quality informtion (usually a .fastq or .fq format), you will not be able to denoise your data with dada2 or deblur. If you want to denoise in QIIME 2, you will need to get the quality information. This may depend on the type of sequencing you've done; you may need to check with your sequencing provider bout the best way to do this. If you can't get the quality information, your best option in qiime2 is to cluster into OTUs. (You might look into unoise, which is another denoising algorithm and doesnt AFAIK require quality information. But, its proprietary, closed source, and a license costs money and may require the quality info upstream.)

It doesn't look like your data is EMP compliant, since it lacks quality information. Essentially since all fo the common formats require quality info. But, maybe we can work through something. Can you explain how you demultiplexed your files and what the .txt files represent?

Best,
Justine

AttilaTheBun · July 27, 2021, 1:28pm

Hello @jwdebelius! Thank you for noticing, it's a reference to ! We are graced with them as happy hoppy pets and they're awesome.

I do have a .fastq file, but am unsure about how to apply it with my demultiplexed sequences. Should I be importing it in a file with my merged FeatureData[Sequence] type file, or do I need to splice it into a file for each separated, demultiplexed .txt file? Since I do have a .fastq file, maybe it is possible to coerce the data into a more suitable type?

We demultiplexed the data in Java by searching for primer matches with barcodes provided in an excel sheet. This sheet doesn't comply with the required format described in the moving parts tutorial, but we were able to create a script that searched through the .fna file with the barcodes as references fairly easily. The .txt files you can see in the original post each represent the sequences that match the .txt primer-name barcode. Thus, each file is a collection of raw sequence data that correspond to each barcode. I hope that makes sense.

jwdebelius · July 27, 2021, 3:09pm

Hi @AttilaTheBun,

Yes, yes they are !

With the multiplexed file, it depends on how it's set up. EMP is not the only way to go. I would look at importing your multiplexed file and then see what you can do with q2-cutadpt. If q2-cutadapt doesns't suit your needs, you could go with cutadapt (or even your in house script) and work with the manifest format.

Best,
Justine

AttilaTheBun · August 3, 2021, 9:21pm

@jwdebelius thank you so much; this worked. I found code here that helped me merge my .fna and .qual files into a single .fastq file. After that, I was able to compress the file into a .fastq.gz file and follow the forum post you sent using q2-cutadapt. I'm demultiplexed and ready to roll!

system · September 4, 2021, 3:21am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.