Importing sequence data with lower-case nucleotide characters.

With the recent qiime2-2022.11 release, we can now import DNA and RNA sequence files that contain lower-case sequence characters. Upon import, these nucleotide bases will be converted to the standard upper-case IUPAC format, using the new MixedCase* import formats. A few examples of which are listed below:

  • MixedCaseAlignedDNAFASTAFormat
  • MixedCaseAlignedRNAFASTAFormat
  • MixedCaseDNAFASTAFormat
  • MixedCaseRNAFASTAFormat

Example use case: The Ribosomal Database Project (RDP)
We'll import a recent version of the RDP SSU reference files, which have been generally pre-formatted for QIIME, and are available here. Specifically, the RDPClassifier_16S_trainsetNo18_QiimeFormat.zip file.

In the past, it was not possible to natively import these files into :qiime2:, as they contain lower-case nucleotide characters. We can now do so, following the procedure below:

Download and unzip file :inbox_tray:
Note different platforms may use slightly different command for unzipping.

wget https://sourceforge.net/projects/rdp-classifier/files/RDP_Classifier_TrainingData/RDPClassifier_16S_trainsetNo18_QiimeFormat.zip

unzip RDP_Classifier_TrainingData/RDPClassifier_16S_trainsetNo18_QiimeFormat.zip

Import representative sequence file :arrow_backward:
Use the appropriate file paths to your download location.

 qiime tools import \
    --input-path RefOTUs.fa \
    --output-path rdp_ref_seqs.qza \
    --type 'FeatureData[Sequence]' \
    --input-format 'MixedCaseDNAFASTAFormat'

Import taxonomy file :arrow_backward:
Use the appropriate file paths to your download location.

qiime tools import \
    --input-path Ref_taxonomy.txt \
    --output-path rdp_ref_taxonomy.qza \
    --type 'FeatureData[Taxonomy]' \
    --input-format 'HeaderlessTSVTaxonomyFormat'

:building_construction: From here you can make use of RESCRIPt, for any further reference sequence and taxonomy curation. For now we'll just skip to making our RDP classifier. Be sure to cite, and be aware of the license associated with, RDP.

Let's train our RDP classifier *:train: *

qiime feature-classifier fit-classifier-naive-bayes \
    --i-reference-reads rdp_ref_seqs.qza \
    --i-reference-taxonomy rdp_ref_taxonomy.qza \
    --o-classifier rdp_classifier.qza 

There you go!

Happy :qiime2: -ing!

6 Likes