Classify the phylum of d__Eukaryota;__

Hi

I already conducted my 18s amplicon sequencing

Here is the primer I use for PCR
FR-1 = ANCCATTCAATCGGTANT
Fun18s = CCATGCATGTCTAAGTWTAA

I use MinION MK1B for the sequencing, the library was prepared using the Oxford Nanopore Rapid Barcoding Kit (SQKRBK114.24).
MinKNOW were used for basecalling, adapter trimming and barcode trimming.
The sequencing runs were stopped after all the samples reached at least 1.5k reads.

I use this command to combine my fastq file in linux
cat *.fastq.gz > merged.fastq.gz

Then the fastq file is feed into fastQC and MultiQC in Galaxy (https://usegalaxy.org/)

The insight from the fastQC and MultiQC is use for later quality control steps in fastp and porechop also in Galaxy (https://usegalaxy.org/)

Porechop

  • Output format for the reads: fastq
  • Barcode binning setting
  • Percent identity for binning = 75
  • Minimum difference in identity = 5.0
  • Require two matches for binning = No
  • Discard unassigned reads = No

Adapter search setting

  • Minimum identity = 90
  • Number of alligned reads = 10000
  • Scoring scheme = 3,-6,-5,-2

End adapter setting

  • Number of bases =150
  • Minimum trim length = 4
  • Extra bases trimmed = 2
  • Minimum percent identity = 75

Middle adapter setting

  • Skip splitting = No
  • Discard reads with middle adapter = No
  • Minimum percent identity = 85
  • Extra trimming good side = 10
  • Extra trimming bad side = 100
  • Minimum length reads post-split = 1000

fastp
Single-end

Adapter trimming options
Filter options

  • Qualified quality phred = 9
    Length filtering option
  • Length required = 200
  • Maximum length = 1000
    Read Modification Options
  • PolyG tail trimming = Disable polyG tail trimming

Here is the result from the fastQC and MultiQC after the quality control

Next the fastq file in feed into Qiime2 Amplicon Distribution Environment

qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path manifest_18s_dr.tsv \
  --output-path single-end-demux.qza \
  --input-format SingleEndFastqManifestPhred33V2

qiime demux summarize \
  --i-data single-end-demux.qza \
  --o-visualization demux.qzv

qiime vsearch dereplicate-sequences \
  --i-sequences single-end-demux.qza \
  --o-dereplicated-table table.qza \
  --o-dereplicated-sequences rep-seqs.qza

qiime feature-table summarize \
  --i-table table.qza \
  --o-visualization table.qzv \
  --m-sample-metadata-file metadata_18s.tsv
qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv

Then i use the q2-feature-classifier

I downloaded and import the pre-formatted SILVA reference sequence and taxonomy files available at the Qiime2 Data resources
Silva 138 SSURef NR99 full-length sequences (MD5: de8886bb2c059b1e8752255d271f3010)
Silva 138 SSURef NR99 full-length taxonomy (MD5: f12d5b78bf4b1519721fe52803581c3d)

qiime feature-classifier extract-reads \
  	--i-sequences silva-138-99-seqs.qza \
  	--p-f-primer ANCCATTCAATCGGTANT \
  	--p-r-primer CCATGCATGTCTAAGTWTAA \
  	--o-reads silva-138-99-ref-seqs-region.qza

qiime feature-classifier fit-classifier-naive-bayes \
  	--i-reference-reads silva-138-99-ref-seqs-region.qza \
  	--i-reference-taxonomy silva-138-99-tax.qza \
  	--o-classifier silva-138-99-classifier.qza

qiime feature-classifier classify-sklearn \
  	--i-classifier silva-138-99-classifier.qza \
  	--i-reads rep-seqs.qza \
  	--o-classification taxonomy.qza

qiime metadata tabulate \
  	--m-input-file taxonomy.qza \
  	--o-visualization taxonomy.qzv

Next I proceed with closed reference clustering at 90%

qiime vsearch cluster-features-closed-reference \
  --i-table table.qza \
  --i-sequences rep-seqs.qza \
  --i-reference-sequences silva-138-99-seqs.qza  \
  --p-perc-identity 0.90 \
  --o-clustered-table table-cr-90.qza \
  --o-clustered-sequences rep-seqs-cr-90.qza \
  --o-unmatched-sequences unmatched-cr-90.qza

qiime feature-table summarize \
  --i-table table-cr-90.qza \
  --o-visualization table-cr-90.qzv \
  --m-sample-metadata-file metadata_18s.tsv
qiime feature-table tabulate-seqs \
  --i-data rep-seqs-cr-90.qza \
  --o-visualization rep-seqs-cr-90.qzv

Lastly I construct a taxa barplot

qiime taxa barplot \
  --i-table table-cr-90.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file metadata_18s.tsv \
  --o-visualization taxa-bar-plots-cr-90.qzv

Here is the taxa bar plot

Here are my questions

  1. How can I know the phylum of d__Eukaryota;__?
  2. What is the most correct way to eliminate the phyla that are not Fungi from the taxa bar plot (also the d__Bacteria;__)?
  3. Why is Fungi also classified as phylum? Can I further assign the d__Eukaryota;p__Fungi?
1 Like