Hi Colin,
Thanks for pointing to the relevant resources to do that.
I wanted to denoise my pair-ended sequences using DADA2 with pooling=TURE
, so I did it in R. One of the checkpoints in the DADA2 tutorial was to inspect the length of the merged sequences, excluding ones that deviate too much from the expected size range. This is why I wanted to know the expected amplicon sizes for the universal primer set I used.
While I was reading through the Github repo you linked, I also found a solution to my question. I'm posting it here in case someone else may find it useful.
-
Run digital PCR in TestPrimer. The SILVA database provides services on testing your primers. For the database, I chose
SSU r132
, non-redundant reference sequence collection (RefNR
). The primer mismatches were set at two levels of stringency, allowing zero or one mismatch (length of 0-mismatch zone at 3' end: 5). When the searching is done, download the sequences by clickingAdd sequences to cart
. -
Extract amplicon sequences in QIIME2. The downloaded sequences are full-length rRNA sequences. We can extract the amplicon sequences using the
extract-reads
command in QIIME2. For simplicity, I'll only show commands for the search with no mismatch.2.1 Convert rRNA sequences to DNA by replacing all of the uraciles with thymines
sed -i 's/U/T/g' no_mismatch_silva132_fullLen.fasta
2.2 Import sequences
qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path no_mismatch_silva132_fullLen.fasta \
--output-path no_mismatch_silva132_fullLen.qza2.3 Extract amplicon sequences
qiime feature-classifier extract-reads \
--i-sequences no_mismatch_silva132_fullLen.qza \
--p-f-primer AGAGTTTGATCMTGGCTCAG \
--p-r-primer GCWGCCWCCCGTAGGWGT \
--o-reads no_mismatch_silva132_amplicon.qza2.4 Export extracted sequences
qiime tools export \
--input-path no_mismatch_silva132_amplicon.qza \
--output-path exported_no_mismatch -
Plot expected amplicon sizes in R. We can use R package
seqinr
to import and do the length calculatiion for our fasta files.library("seqinr")
hits_0mm <- read.fasta(file = "exported_no_mismatch/dna-sequences.fasta")
length(hits_0mm) # number of sequences in the fasta file
length_0mm <- table(getLength(hits_0mm)) # get the length of all the sequences in the fasta file
barplot(length_0mm, main = "allow no mismatch", xlab="Amplicon lengths", ylab="Frequency")
Here's what I got: