Importing Filtered Data from AFTERQC [Missing one or more files for Casava...]

Hello again,

I am trying to import the data that was previously quality-filtered using AfterQC.

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path /home/mpolat/ANALYSIS/Qiime2/salt_experiment/AfterQC/good
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path /home/mpolat/ANALYSIS/Qiime2/salt_experiment/qiime_analysis/demux-afterqc.qza

however, I got error saying that

There was a problem importing /home/mpolat/ANALYSIS/Qiime2/salt_experiment/AfterQC/good:

Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz'

I don't get any error when using unfiltered data. In case needed here are the sample lines from both filtered and unfiltered data.

Filtered Q30
@M03181:65:000000000-L75M9:1:1101:19832:1677 1:N:0:252
TGAATGATTCGGTGAAACTTTCGGACCGTGGTTCTTGCCACTTCGGTGGTGAGAATCGTGGAAAGTTATTTAAACCTCATCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCCGTAGG
+
FGFEGCCECFGEFFGFGGEGGGBDFFGEBEFFFGFGGGGGGD,CFCF@@FDFGAEFFFEGDCFGFBAFGGGGGGGFBE8,E<@,CFDFFGGGGGGGGGGGGG@FDGGDGGFFGFDEFFEE
@M03181:65:000000000-L75M9:1:1101:22030:1678 1:N:0:252
TGGGTGTGCTGGTGAAGTGTTCGGATTGGCAATTATCGGTGGCAACACCTTCTATTTCCGAGAAGTTCATTAAACCCTCCCACCTAGAGGAAGGAGAAGTCTTAACAAGGTTTCCGTAGG
+
GGGGGGGFGGFFGGGGGGFGGGGGGGGGGG,EFFFGF,+B,+C89C8:C,CE,,CE,<<,,++,4FFFAFF9,,54?A<BC,AFG8,,,,?,,+4,,BFEF,C,,B,,,,4CE??+8=++

Unfiltered
@M03181:65:000000000-L75M9:1:1101:22042:1664 1:N:0:252
TTGTACACACCGCCCGTCGCTCCTACCGATTGGGTGTGCTGGTGAAGTGTTCGGATTGGCAATTATCTGTTTCAACACCTTCTTTTTCCTATCAGTTCTTTACACCCTCCCTCCTAGTGGAATGAGAAGTCTTAACCATGTTTCCTTATTTGAACCTTCAGATTGCTTTCTCTTTTCCACATCTCCTATCCCACGAGCCTATCTCTCCTCTCTTTTGCCTTCTTCTGCTTTTTTTATTCCTATTCTTTTCTTTTTCTTTTTTCTCTTTCTTTCATTTCTTATTTTTACTTAGTATTCTTTT
+
BBCCCFGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGDFGGCFFGGGGGGGGGGGGG,CCF<F<9,C,,<,,6,88,:C,,<?,9,,,,,,,<CC,<C,,,544C4,8,4BE,,,,,,,,,,,,5,C9,A,,B,,,,9:@,?,9,,,8,,,8A?,4,,,,,,8A,A5?,,;,:,,,6,73@B,,,236,,6++++0@,,6,53,,3,796E,7,4,2,25+3,,5+:,,**,,,,,,,,,++++,/,0(=+3+.6)))+))/))1+)))++3+,+++(/))))+.1))))))
@M03181:65:000000000-L75M9:1:1101:19832:1677 1:N:0:252
TTGTACACACCGCCCGCCGCACCTACCGATTGAATGATTCGGTGAAACTTTCGGACCGTGGTTCTTGCCACTTCGGTGGTGAGAATCGTGGAAAGTTATTTAAACCTCATCATTTAGAGGAAGGTGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGG
+
CCCCCGGGGGGGGGGG+CCFEGGGGGGFF:FGFEGCCECFGEFFGFGGEGGGBDFFGEBEFFFGFGGGGGGD,CFCF@@FDFGAEFFFEGDCFGFBAFGGGGGGGFBE8,E<@,CFDFFGGGGGGGGGGGGG@FDGGDGGFFGFDEFFEEFGFGGG8BDGDEC@F

Hi @airbender97,

The error messaage,

Is telling you the post-QC files dont follow a Casva file name standard for wahtever reason.

Luckily for you, there's the manifest format for all your weird file name needs!

Try importing using a manifest and see if that solves your problem.

That said, I have 0 self control when it comes to this stuff, so please make sure that you're aware of how your upstream processing is going to affect your downstream work. Certain algorithms (i.e. DADA2) assume that you've done minimal pre-processing and they're getting reads which are neither quality filtered nor joined. You may need to consider alternatives (Deblur, OTU clustering) if you cant meet those assumptions.

Best,
Justine

1 Like

Hi @jwdebelius,

Thank you for your reply. I am trying to not interfere with the data as much as possible but I want to try what's on my mind at the same time :smiley:.

I am trying to filter the reads >Q30. If there is any option to filter out the reads using Qiime2 please let me know. I checked DADA2 carefully but couldn't find any function for filtering.

I tried manifest and unfortunately got another error.

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path /home/mpolat/ANALYSIS/Qiime2/salt_experiment/pe-manifest.txt
--output-path /home/mpolat/ANALYSIS/Qiime2/salt_experiment/qiime_analysis/demux-afterqc.qza
--input-format PairedEndFastqManifestPhred64V2

Here are the headlines of pe-manifest.txt. I followed the steps provided in the importing data file.

sample-id forward-absolute-filepath reverse-absolute-filepath
Aic1-1 /home/mpolat/ANALYSIS/Qiime2/salt_experiment/AfterQC/good/Aic1-1_S252_L001_R1_001.fq.gz /home/mpolat/ANALYSIS/Qiime2/salt_experiment/AfterQC/good/Aic1-1_S252_L001_R2_001.fq.gz
Aic1-3 /home/mpolat/ANALYSIS/Qiime2/salt_experiment/AfterQC/good/Aic1-3_S253_L001_R1_001.fq.gz /home/mpolat/ANALYSIS/Qiime2/salt_experiment/AfterQC/good/Aic1-3_S253_L001_R2_001.fq.gz
Aic1-5 /home/mpolat/ANALYSIS/Qiime2/salt_experiment/AfterQC/good/Aic1-5_S254_L001_R1_001.fq.gz /home/mpolat/ANALYSIS/Qiime2/salt_experiment/AfterQC/good/Aic1-5_S254_L001_R2_001.fq.gz

and the error is

An unexpected error has occurred:

Decoded Phred score is out of range [0, 62].

See above for debug info.

/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/q2_types/per_sample_sequences/_transformer.py:250: UserWarning: Importing of PHRED 64 data is slow as it is converted internally to PHRED 33. Working with the imported data will not be slower than working with PHRED 33 data.
warnings.warn(_phred64_warning)
/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/skbio/io/registry.py:554: ArgumentOverrideWarning: Best guess was: variant='illumina1.8', continuing with user supplied: 'illumina1.3'
warn('Best guess was: %s=%r, continuing with user'
Traceback (most recent call last):
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/q2cli/builtin/tools.py", line 852, in _import
artifact = qiime2.sdk.Artifact.import_data(
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 332, in import_data
return cls.from_view(type, view, view_type, provenance_capture,
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 360, in _from_view
result = transformation(view, validate_level)
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/q2_types/per_sample_sequences/_transformer.py", line 252, in _26
return _fastq_manifest_helper_partial(old_fmt, _write_phred64_to_phred33,
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/q2_types/per_sample_sequences/_util.py", line 270, in _fastq_manifest_helper
fastq_copy_fn(input_fastq_fp, str(output_fastq_fp))
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/q2_types/per_sample_sequences/_util.py", line 294, in _write_phred64_to_phred33
for seq in skbio.io.read(phred64_fh, format='fastq',
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/skbio/io/registry.py", line 1160, in read
return io_registry.read(file, format=format, into=into, verify=verify,
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/skbio/io/registry.py", line 506, in read
return (x for x in itertools.chain([next(gen)], gen))
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/skbio/io/registry.py", line 531, in _read_gen
yield from reader(file, **kwargs)
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
yield from reader_function(fhs[-1], **kwargs)
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/skbio/io/format/fastq.py", line 351, in _fastq_to_generator
phred_scores, seq_header = _parse_quality_scores(fh, len(seq),
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/skbio/io/format/fastq.py", line 522, in _parse_quality_scores
_decode_qual_to_phred(chunk, variant=variant,
File "/home/mpolat/miniconda3/envs/qiime2/lib/python3.8/site-packages/skbio/io/format/_base.py", line 34, in _decode_qual_to_phred
raise ValueError("Decoded Phred score is out of range [%d, %d]."
ValueError: Decoded Phred score is out of range [0, 62].

Hi @airbender97,

It looks like you're using the PairedEndFastqManifestPhred64V2 and the error says the decoded Phred score is out of range.
If you go back to the documentation, is there anyother manifest format you coudl try that might addres that Phred score differently?

Quality filtering is DADA2's second step. There's also an entire quality filtering plugin (q2-quality-filtering). Although the DADA2 plugin and quality filtering plugin shouldn't be combined because again, DADA2 makes assumptions that its not getting quality filtered data.

Best,
Justine

2 Likes

Hello again,

I tried PHRED33V2 and it worked... I had tried it before but I guess I did a typo. Thank you for your help :slight_smile: @jwdebelius

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.