50% unassigned sequences after taxonomy

Lu_Wang · August 6, 2020, 4:46am

Hi everyone,

I have tried to use my own classifiers to assign the rep-seqs under Qiime2, but the ratio of “unassigned” sequences is still about 50%, which can not be assigned to kingdom.

I found the question was that my sequences in rep-seqs.qzaIE_rep-seqs.qza (3.3 MB) is not listed forwardly. The sequences which list reversely cannot be assignedIE_taxonomy5.qza (1.2 MB) . I also searched the unassigned sequences in NCBI and could find them at least genus-level classification.

So, I suspect the problem is the repetitive sequences do not list at the right way. What can I do to converts the DNA sequence into its reverse and make the repetitive sequences all forwardly?

I input data by

qiime tools import \

--type SampleData[PairedEndSequencesWithQuality] \

--input-path /media/sf_Qiime/Rawdata/Potexperiment/test/IEManifest.csv \

--output-path /media/sf_Qiime/Rawdata/Potexperiment/test/IE.qza \

--input-format PairedEndFastqManifestPhred33

Thanks in advance!
Lucas

Nicholas_Bokulich · August 6, 2020, 3:52pm

Welcome @Lu_Wang!

From your description, it sounds like you are aware that the mixed orientation of your reads (mix of forward and reverse sequences) is causing the 50% unassigned rate. This is because the classify-sklearn method can only handle reads in a single orientation. They can be forward or reverse relative to the reference, but all query sequences must be oriented.

If your sequences are in mixed orientations, you can do one of two things:

use the classify-consensus-vsearch method to classify your sequences, as this can classify reads in either orientation.
RESCRIPt has an orient-seqs method (see tutorial below) that will orient sequences relative to a reference database. This will allow you to orient your query sequences by comparing them to a reference database (which should presumably be all in forward direction), and then use classify-sklearn to classify those sequences.

Let us know which method you use and what you find. Good luck!

Lu_Wang · August 7, 2020, 8:55am

Hi Nicholas_Bokulich

Thanks for your kind reply.

I also find another way to solve it. There are both forward and reverse sequences in my R1 or R2 files for one sample. So I try to select and cut sequences twice. I set R1 as the forward sequence first and cut the reverse sequences. And then I set R2 as the forward orientation and cut the reverse sequences. Finally, I combine them and got the all forward R1 and R2 file.

Here is the code I use:
qiime cutadapt trim-paired
--i-demultiplexed-sequences IE.qza
--p-adapter-f CTGTCTCTTATACACATCTGACGCTGCCGACGA
--p-front-f GTGCCAGCMGCCGCGG
--p-adapter-r CTGTCTCTTATACACATCTCCGAGCCCACGAGAC
--p-front-r CCGTCAATTCMTTTRAGTT
--p-cores 6
--p-discard-untrimmed
--o-trimmed-sequences IE-first.qza \

qiime tools export
--input-path IE-first.qza
--output-path IE1

qiime cutadapt trim-paired
--i-demultiplexed-sequences IE.qza
--p-adapter-f CTGTCTCTTATACACATCTCCGAGCCCACGAGAC
--p-front-f CCGTCAATTCMTTTRAGTT
--p-adapter-r CTGTCTCTTATACACATCTGACGCTGCCGACGA
--p-front-r GTGCCAGCMGCCGCGG
--p-cores 6
--p-discard-untrimmed
--o-trimmed-sequences IE1-second
qiime tools export
--input-path IE-second.qza
--output-path IE2

And use 'cat' order to combine the .fastq file in IE1 and IE2.

Here is the taxonomy resultexperiment_taxa-bar-plots6.qzv (461.9 KB) now

system · September 7, 2020, 2:55pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.