use dada2 to analysis 16S miTag extract from metagenomic data

I want to use DADA2 to do quailty control, and to do the community analysis.With my own script, i can get the 16S sequence with the quailty. However, when I set these data as input, I got some problem. Here is the error message:
There was a problem importing sample_hyh.csv:

/tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-kuws7b/FH60-5-MG-2_5_L001_R1_001.fastq.gz is not a(n) FastqGzFormat file:

Quality score length doesn't match sequence length for record beginning on line 5

The version I used is version 2023.5.0., and it was installed by conda.

The commands I ran shows follows:
qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path sample_hyh.csv
--output-path single-demux.qza
--input-format SingleEndFastqManifestPhred33
Here is the sample_hyh.csv
sample_hyh.csv (3.7 KB)
And this is one of my input data (I have use gzip -c to compress the sequence extract from metagenomic data):
FH61-3-MG-2_1.16SQ.fastq.gz (4.2 KB)
FH61-3-MG-2_1.16SQ.fastq (101.7 KB)

Really thank you for your help!

Hi @heyinghui

Your FASTQ file is written incorrectly. The quality score is written before the sequence.

@A00312:291:HKHKLDSX3:3:1216:13114:6198/1.1 /start=4 ...
FFFFFFFFFFFFFFFFFFFFFFFFFF:F:::,FFFFFF:FF::,F,F,F::F ...
+
AAAAAAAAAAAAAAAAAAAAAAAAAAGGCAATGTGCTGAGGCAGTATGGATT ...

It should be formatted as shown below, and described here. That is, sequence first, then quality score. I cropped your sequence for clarity.

@A00312:291:HKHKLDSX3:3:1216:13114:6198/1.1 /start=4 ...
AAAAAAAAAAAAAAAAAAAAAAAAAAGGCAATGTGCTGAGGCAGTATGGATT ...
+
FFFFFFFFFFFFFFFFFFFFFFFFFF:F:::,FFFFFF:FF::,F,F,F::F ...

I am wondering if the parser is getting confused by this and providing an incorrect error message? :man_shrugging:

Really than you for your help!
I modified my scripts, with the sequence in the second line and the quality in the fourth line.
However, I still got some error message:

There was a problem importing sample_hyh.csv:

/tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-hx0g0efi/FH60-5-MG-2_5_L001_R1_001.fastq.gz is not a(n) FastqGzFormat file:

Quality score length doesn't match sequence length for record beginning on line 5

The commands I ran was the same as before
And here is one of my newly input data:
FH60-3-MG-1_1.16SQ.fastq (3.4 MB)
FH60-3-MG-1_1.16SQ.fastq.gz (127.5 KB)

I have checked the length of sequence, it is the same as the length of the quality.

Hi @heyinghui,

I am not sure what the issue is, as I have been able to import both of these files into QIIME 2023.5 & 2023.7 without issue. Perhaps there is something wrong with your install or system?

Really thank you for your help!
I can import my data into QIIME 2023.5 now.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.