Hi QIIME2 friends -
I’m conducting a meta analysis on several datasets. All did 16S amplicon sequencing, but the hypervariable region sequenced varies between the datasets. Therefore I need to do close ref OTU picking. I would like to use HOMD as this is appropriate to the samples I’m analyzing.
My question is: Can anyone point me in the right direction for making closed reference database from the HOMD_16S_rRNA_RefSeq_V15.22.aligned.fasta and HOMD_16S_rRNA_RefSeq_V15.22.qiime.taxonomy files that can be used across all the different datasets? I will then merge the feature tables at the end to one big table and do downstream analyses from there.
Here is my pipeline so far.
#import data to qiime
##this will be dataset specific
qiime tools import --type ‘SampleData[PairedEndSequencesWithQuality]’ --input-path manifest.txt --output-path paired-end-demux.qza --input-format PairedEndFastqManifestPhred33V2
#check out the quality graphs of the samples
#need to drag and drop the output file to www.view.qiime2.org
qiime demux summarize
–i-data demux-paired-end.qza
–o-visualization demux.qzv
Trim amplicon primers
qiime cutadapt trim-paired
–i-demultiplexed-sequences paired-end-demux.qza
–p-front-f CCTACGGGNGGCWGCAG
–p-front-r GGACTACHVGGGTATCTAATCC
–p-match-read-wildcards
–p-match-adapter-wildcards
–o-trimmed-sequences trimmed.paired-end-demux.qza
Summarise the reads
qiime demux summarize
–i-data trimmed.paired-end-demux.qza
–o-visualization summary.trimmed.paired-end-demux.qza \
##merge paired end sequences
#min merge length is 250 and max merge length is 430
qiime vsearch join-pairs --i-demultiplexed-seqs trimmed.paired-end-demux.qza
–p-minmergelen 250
–p-maxmergelen 430
–o-joined-sequences merged.qza
##check out the quality graphs of the merged samples
#need to drag and drop the output file to www.view.qiime2.org
qiime demux summarize
–i-data merged.qza
–o-visualization summary.merged.qza
##quality filter sequences
qiime quality-filter q-score --i-demux merged.qza
–o-filtered-sequences filt.merged.qza
–o-filter-stats stats.filt.merged.qza
##dereplicate
qiime vsearch dereplicate-sequences --i-sequences filt.merged.qza
–o-dereplicated-table filt.merged.derep.table.qza
–o-dereplicated-sequences filt.merged.derep.seq.qza
I’m pretty sure from the dereplicated step I should go to qiime vsearch cluster-features-closed-reference but I’m not sure if I need to do any preprocessing of the reference dataset?
Thanks very much team!