Cutadapt not trimming 454 reads

Natali_Hernandez · July 20, 2021, 12:39pm

Hi,

I'm trying to trim the primers (515F 5′-GTG CCA GCM GCC GCG GTA A-3′ and 806R 5′-GGA CTA CVS GGG TAT CTA AT-3′) of single end 454 reads (downloaded from NCBI) with Cutadapt (qiime2-2020.11). I've used the following:

qiime cutadapt trim-single --i-demultiplexed-sequences single-end-marcellus2014.qza --p-front GTGCCAGCMGCCGCGGTAA...ATTAGAWACCCBDGTAGTCC --p-cores 1 --p-discard-untrimmed --o-trimmed-sequences single-end-trimmed-marcellus2014.qza --verbose --p-match-read-wildcards

The output shows no errors, however when comparing the interactive quality plot of trimmed vs not trimmed sequences I see no difference.

(trimmed)
(not trimmed)

What am I missing, why are the trimmed sequences still over 600 base pairs?

I'will be using DADA2 later (I can truncate to around 300 bp when the quality decreases then), I just want to make sure the sequences are primmer free.

Many thanks for the help in advance

llenzi · July 21, 2021, 8:56am

Hi @Natali_Hernandez,
I am really rusty on the 454 side, so apology in advance if I go for red herrings ...

I am wondering, because you used linked style adapter, cutadapt is looking for the reverse primer sequences at the end of the read. Did you try to use "--p-front GTGCCAGCMGCCGCGGTAA"
and maybe in a second step "--p-anywhere ATTAGAWACCCBDGTAGTCC" ?

Cheers,
Luca

Natali_Hernandez · July 21, 2021, 11:58am

Hi Luca,

Many thanks for the suggestion, I've tried and the reads are shorter but not quite there yet!

.

This is m first go with 454 sequences, so I am not sure if this is normal. I think I am looking at something similar to this one, as suggested there, I'll try RESCRIPt, I'm just unsure about --i-reference-sequences reference-sequences.qza , can this be any other sequences, i.e. not 454 but Illumina?

llenzi · July 21, 2021, 12:27pm

Hi @Natali_Hernandez,

Is the last picture the result of trimming by using '--p-front GTGCCAGCMGCCGCGGTAA' and trimming again the result with "--p-anywhere ATTAGAWACCCBDGTAGTCC"?

In a 454 dataset, it is normal to have a huge range variation with a dropping of quality on tail. There are also possible sequences due to concatenamers or noise, so I would not be surprised if you will end up on filtering the sequences by applying a min and max length. On the post you link, the original question was because they started the analysis with with 2 fastq files associated to a 454 run, and they had to re-orientate one of them to proceed. How many fastq files do you have? However, it is a very good point to double check the orientation of your read is what you expect, and if needed, reorient them with rescript (you can use the sequence file you have as database for taxonomy identification for that, eg Silva or GreenGenes).

Let us know!
Luca

Natali_Hernandez · July 27, 2021, 1:23pm

Hey Luca,

Yes, that is the result of trimming in two steps.

I only have one fastq per sample. I'm trying to do the re-orientation step using silva sequences (many thanks for the suggestion!). I've used this:

qiime rescript orient-seqs --i-sequences single-end-marcellus2014.qza --i-reference-sequences /Volumes/KINGSTON/silva-138-99-seqs.qza --o-oriented-seqs oriented-query-sequences.qza --o-unmatched-seqs unmatched-sequences.qza

And, I get this error:

(1/1) Invalid value for '--i-sequences': Expected an artifact of at least

type FeatureData[Sequence]. An artifact of type

SampleData[SequencesWithQuality] was provided.

Clearly my sequences are not in the correct format for rescript . So I figure I could export them using:

qiime tools export --input-path single-end-marcellus2014.qza --output-path single-end-marcellusFD.qza --output-format 'FeatureData[Sequence]'

But I get this error:
(qiime2-2021.4) MacBook-Air-de-Natali:MarcellusCluff2014 natali$ qiime tools export --input-path single-end-marcellus2014.qza --output-path single-end-marcellusFD.qza --output-format 'FeatureData[Sequence]'
Traceback (most recent call last):
File "/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/util.py", line 90, in parse_format
format_record = pm.formats[format_str]
KeyError: 'FeatureData[Sequence]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/miniconda3/envs/qiime2-2021.4/bin/qiime", line 11, in
sys.exit(qiime())
File "/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2cli/builtin/tools.py", line 63, in export_data
source = result.view(qiime2.sdk.parse_format(output_format))
File "/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/util.py", line 92, in parse_format
raise TypeError("No format: %s" % format_str)
TypeError: No format: FeatureData[Sequence]

What should be the way to convert a type SampleData[SequencesWithQuality] artifact to
FeatureData[Sequence] artifact?

Many thanks for the help!
Natali

llenzi · July 27, 2021, 2:41pm

Hi @Natali_Hernandez,

maybe you should do nothing ... at this stage!
What about denoise what you got and reorient the obtained ASVs before the taxonomy identification, I think the ASVs artifact should be ok for that plug in.

Luca

lizgehret · August 2, 2021, 5:44pm

Hi @Natali_Hernandez,

Just wanted to check in on this! Did @llenzi's latest suggestion work out for you, or are you still stuck?

Natali_Hernandez · August 4, 2021, 10:55am

Hi @llenzi and @lizgehret!

As suggested, I truncate the reads with denoise-pyro. The dada2 stats looked fine, then reoriented the ASVs. So, I think it's is all resolved.

Many thanks for the help!

system · September 4, 2021, 4:55pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.