I'm quite perplexed by the taxonomy classification in high-throughput sequencing data. Could anyone provide assistance?

LXE_Soil_ecol · April 30, 2023, 10:04am

Hi, every Qiime2 users,
I am having trouble with the taxonomy classification of the high-throughput sequencing data recently. The following processes are running in the Qiime2:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path pe-64-manifest_Se-RE.txt
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33V2

qiime demux summarize
--i-data paired-end-demux.qza
--o-visualization demux.qzv

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trunc-len-f 240
--p-trunc-len-r 240
--o-table table_20230429.qza
--o-representative-sequences rep-seqs_20230429.qza
--o-denoising-stats denoising-stats_20230429.qza

qiime metadata tabulate
--m-input-file denoising-stats_20230429.qza
--o-visualization denoising-stats_20230429.qzv

qiime feature-table summarize
--i-table table_20230429.qza
--o-visualization table_20230429.qzv
--m-sample-metadata-file sample-metadata-Se-RP.txt
qiime feature-table tabulate-seqs
--i-data rep-seqs_20230429.qza
--o-visualization rep-seqs_20230429.qzv

qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-nb-classifier.qza
--i-reads rep-seqs_20230429.qza
--o-classification taxonomy_20230429.qza

qiime metadata tabulate
--m-input-file taxonomy_20230429.qza
--o-visualization taxonomy_20230429.qzv

qiime taxa barplot
--i-table table_20230429.qza
--i-taxonomy taxonomy_20230429.qza
--m-metadata-file sample-metadata-Se-RP.txt
--o-visualization Se-taxa-bar-plots.qzv
These processes had been used to analyze other data and get good results.

Unfortunately, the Se-tata-bar-plots.qzv presented via the qiime2 view showed that most of the sequences were unassigned species. Here is the results:

However, I used the RDP classifer (http://rdp.cme.msu.edu/classifier/classifier.jsp) to determine the taxonomy, and got a result like this:

Moreover, the sequencing company used their platform and got results like this:

Here are the foreword and revesre sequences results of one sample:
ESRECK_B_1.R1.fastq.gz (6.4 MB)
ESRECK_B_1.R2.fastq.gz (7.0 MB)

Could anyone do me a favor?

Best wishes

Long

crusher083 · April 30, 2023, 1:24pm

Hello,

Highly likely a different region of 16S is sequenced and you need to find out primers and make a custom classifier using:Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt

Cheers
V

SoilRotifer · April 30, 2023, 3:05pm

Hi @LXE_Soil_ecol, it looks like you are only viewing the domain-level taxonomy. Did you select the drop-down menu in the visualization to see the deeper taxonomic assignments?

For example:

-Mike

LXE_Soil_ecol · May 1, 2023, 12:29pm

Hi, Mike, thanks for your response. However, the result is similar at any level just like this:

LXE_Soil_ecol · May 1, 2023, 12:30pm

Hi, Valentyn,
Thanks for your suggestion. I will check my data with other database.

Best wishes

Long

SoilRotifer · May 1, 2023, 2:28pm

Hi @LXE_Soil_ecol,

Do you know if your reads are in a mixed orientation? That is, it could be that your reads are not oriented in the same direction as the reference database.

You can try feature-classifier classify-consensus-vsearch. If this provides better results than it'd suggest that the orientation the issue, as orientation does not matter for vsearch, but it does for classify-sklearn.

LXE_Soil_ecol · May 3, 2023, 3:02am

Thank you very much. I have tried the feature-classifier classify-concensus-vsearch, but it works too slowly on my computer. After it finishes running, I will check the results.

Best wishes

Long

SoilRotifer · May 3, 2023, 1:14pm

This is very strange as vsearch is quite fast. How large is your data set, and what vsearch settings are you using?

SoilRotifer · May 3, 2023, 4:02pm

Hi @LXE_Soil_ecol,

I was wrong about vsearch being fast. I forgot that vsearch is set to perform an exhaustive search by default. You can alter the "max accepts" and "max rejects" options to speed up the search, but you run the risk of accuracy of taxonomic assignments. But you can lower them enough to perform a quick sanity check.

LXE_Soil_ecol · May 5, 2023, 12:11am

Hi Mike, thanks for your comments. The vsearch on my computer is very slow. Three days finished only 13%. The code like this:
qiime feature-classifier classify-consensus-vsearch
--i-reference-reads silva-138-99-seqs.qza
--i-reference-taxonomy silva-138-99-tax.qza
--p-perc-identity 0.97
--p-min-consensus 0.51
--i-query rep-seqs_20230429.qza
--o-classification taxonomy_20230502.qza
--o-search-results testhits.qza --verbose
Could you give any suggestions?

SoilRotifer · May 5, 2023, 12:30am

How large is this data set?

If you want to simply sanity check that you are obtaining reasonable results with feature-classifier classify-consensus-vsearch you can speed up the process by applying the following additional flags:

--p-threads 8 \
--p-maxaccepts 8 \
--p-maxrejects 32 \

The more you increase --p-maxaccepts and --p-maxrejects the longer the process will take to run. By default the plugin prioritizes accuracy via an exhaustive search, which means longer run times. You can change this by sacrificing accuracy for speed by lowering the values of these two flags, as I show in the example above. Also, you can increase the number of cpu / threads to speed up the process too.

-Mike

LXE_Soil_ecol · May 12, 2023, 1:23am

Hi, Mike
thank you very much! The code I used is very slow. After 9 days, it only finished 37%. My problem have been resolved using a new database. The running processes like this:
qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-nb-weighted-classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy.qza
The database "silva-138-99-nb-weighted-classifier.qza" was downloaded from the qiime2 platform.

Best wishes

Long

SoilRotifer · May 12, 2023, 1:39am

Hi @LXE_Soil_ecol,

I am surprised that the weighted classifier worked and not the default classifier. As the weighted classifier is ~ 1MB larger than the standard classifier. Basically the same size. I am wondering the original classifier was corrupted during download? Or something else?

But hey, at least you got it to work!

LXE_Soil_ecol · May 12, 2023, 1:53am

Hi, Mike
Thanks a lot. The database of the default classifier works well at other sequencing data but did not work this time. I don't know why

Long

system · June 12, 2023, 7:54am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.