In the end, I will show you the results by running the classifier classify-consensus-vsearch with sequence length. But first, I want to show you the workflow to see if I'm missing any steps or if I'm doing something wrong. For example, I've seen that something is done with the primers, and I'm not sure if I need to do something with that.
Workflow:
I imported my data using a manifest.
!qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path data/manifest.tsv
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33V2
Check the quality.
!qiime demux summarize
--i-data paired-end-demux.qza
--o-visualization qualities.qzv
qualities.qzv (309.9 KB)
Filtering with DADA2.
!qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux.qza \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-trunc-len-f 150 \
--p-trunc-len-r 150 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza \
--verbose
Verbose.
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
Command: run_dada.R --input_directory /tmp/tmpnp9nty94/forward --input_directory_reverse /tmp/tmpnp9nty94/reverse --output_path /tmp/tmpnp9nty94/output.tsv.biom --output_track /tmp/tmpnp9nty94/track.tsv --filtered_directory /tmp/tmpnp9nty94/filt_f --filtered_directory_reverse /tmp/tmpnp9nty94/filt_r --truncation_length 150 --truncation_length_reverse 150 --trim_left 0 --trim_left_reverse 0 --max_expected_errors 2.0 --max_expected_errors_reverse 2.0 --truncation_quality_score 2 --min_overlap 12 --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 1 --learn_min_reads 1000000
Warning message:
package ‘optparse’ was built under R version 4.2.3
R version 4.2.2 (2022-10-31)
Loading required package: Rcpp
DADA2: 1.26.0 / Rcpp: 1.0.11 / RcppParallel: 5.1.6
-
Filtering ....................
-
Learning Error Rates
154827000 total bases in 1032180 reads from 17 samples will be used for learning the error rates.
154827000 total bases in 1032180 reads from 17 samples will be used for learning the error rates.
- Denoise samples ....................
....................
-
Remove chimeras (method = consensus)
-
Report read numbers through the pipeline
-
Write output
Saved FeatureTable[Frequency] to: table.qza
Saved FeatureData[Sequence] to: rep-seqs.qza
Saved SampleData[DADA2Stats] to: denoising-stats.qza
View DADA2.qzv
denoising-stats.qzv (1.2 MB)
rep-seqs.qzv (210.9 KB)
table.qzv (402.1 KB)
Feature with more frequency Blasts link.
10803188578946e196b365832c5f8f0
Classifier classify-consensus-vsearch-full-length.
!qiime feature-classifier classify-consensus-vsearch
--i-query rep-seqs.qza
--i-reference-reads silva-138-99-seqs.qza
--i-reference-taxonomy silva-138-99-tax.qza
--p-perc-identity 0.97 \
--p-threads 4
--o-classification taxonomyvsearchfull.qza
--o-search-results search-resultsfull.qza
Views Perc-identity 97-10.
Perc-identity 0.97 - All Feature ID are Unassigned.
search-resultsfull97.qzv (1.2 MB)
taxonomyvsearchfull97.qzv (1.2 MB)
Perc-identity 0.95 - All Feature ID are Unassigned.
search-resultsfull95.qzv (1.2 MB)
taxonomyvsearchfull95.qzv (1.2 MB)
Perc-identity 0.85 - Only 1/160 Feature ID are assigned.
search-resultsfull85.qzv (1.2 MB)
taxonomyvsearchfull85.qzv (1.2 MB)
Perc-identity 0.50 - Only 4/160 Feature ID are assigned.
search-resultsfull50.qzv (1.2 MB)
taxonomyvsearchfull50.qzv (1.2 MB)
Perc-identity 0.10 - All Feature IDs are assigned, but 99% only reach the d_bacteria taxon level.
search-resultsfull10.qzv (1.4 MB)
taxonomyvsearchfull10.qzv (1.2 MB)