V4 sequence after extact-reads from 16S in not a substring

I have 16S fasta seq

TACCTTGTTACGACTTCACCCCAATCATCTGTCCCACCTTCGGCGGCTGGCTCCTAAAAGGTTACCTCACCGACTTCGGGTGTTACAAACTCTCGTGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGCGGCATGCTGATCCGCGATTACTAGCGATTCCAGCTTCACGCAGTCGAGTTGCAGACTGCGATCCGAACTGAGAACAGATTTGTGGGATTGGCTAAACCTCGCGGTTTCGCTGCCCTTTGTTCTGTCCATTGTAGCACGTGTGTAGCCCAGGTCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCACCTTAGAGTGCCCAACTGAATGCTGGCAACTAAGATCAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAACCATGCACCACCTGTCACTCTGCCCCCGAAGGGGACGTCCTATCTCTAGGATTGTCAGAGGATGTCAAGACCTGGTAAGGTTCTTCGCGTTGCTTCGAATTAAACCACATGCTCCACCGCTTGTGCGGGCCCCCGTCAATTCCTTTGAGTTTCAGTCTTGCGACCGTACTCCCCAGGCGGAGTGCTTAATGCGTTAGCTGCAGCACTAAGGGGCGGAAACCCCCTAACACTTAGCACTCATCGTTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTCGCTCCCCACGCTTTCGCTCCTCAGCGTCAGTTACAGACCAGAGAGTCGCCTTCGCCACTGGTGTTCCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCCACTCTCCTCTTCTGCACTCAAGTTCCCCAGTTTCCAATGACCCTCCCCGGTTGAGCCGGGGGCTTTCACATCAGACTTAAGGAACCGCCTGCGAGCCCTTTACGCCCAATAATTCCGGACAACGCTTGCCACCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGTGGCTTTCTGGTTAGGTACCGTCAAGGTACCGCCCTATTCGAACGGTACTTGTTCTTCCCTAACAACAGAGCTTTACGATCCGAAAACCTTCATCACTCACGCGGCGTTGCTCCGTCAGACTTTCGTCCATTGCGGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCCGATCACCCTCTCAGGTCGGCTACGCATCGTCGCCTTGGTGAGCCATTACCTCACCAACTAGCTAATGCGCCGCGGGTCCATCTGTAAGTGGTAGCCGAAGCCACCTTTTATGTTTGAACCATGCGGTTCAAACAAGCATCCGGTATTAGCCCCGGTTTCCCGGAGTTATCCCAGTCTTACAGGCAGGTTACCCACGTGTTACTCACCCGTCCGCCGCTAACATCAGGGAGCAAGCTCCCATCTGTCCGCTCGACTTGCATGTATTAGGCACGCCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCGAGTATTAGTGTGAGTCGAGCGAACGGACGAGAAGCTTGCTTCTCTGATGTTAGCGGCGGACGGGTGAGTAACACGTGGATAACCTACCTATAAGACTGGGATAACTTCGGGAAACCGGAGCTAATACCGGATAATATTTTGAACCGCATGGTTCAAAAGTGAAAGACGGTCTTGCTGTCACTTATAGATGGATCCGCGCTGCATTAGCTAGTTGGTAAGGTAACGGCTTACCAAGGCAACGATGCATAGCCGACCTGAGAGGGTGATCGGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAAGAGGCCCGAAGGTCCCCCTCTTTGGTCTTGCGACGTTATGCGGTATTAGCCACCGTTTCCAGTAGTTATCCCCCTCCATCAGGCAGTTTCCCAGACATTACTCACCCGTCCGCCACTCGTCAGCAAAGCAGCAAGCTGCTTCCTGTTACCGTTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTCGAGTACACGTGTGTAGCCCAAATCATAAGGGGCATGATGATTTGACGTCATCCCC

After extraction of V4 region

qiime feature-classifier extract-reads  --i-sequences seq.qza  --p-f-primer GTGCCAGCMGCCGCGGTAA --p-r-primer GGACTACHVGGGTWTCTAAT  --p-trunc-len 250 --p-min-length 50 --p-max-length 450 --o-reads seq_V4.qza

I got

TACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCTCGCAGGCGGTTCCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGGGAACTTGAGTGCAGAAGAGGAGAGTGGAATTCCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAAC

which is not a substring from original fasta sequence

Is it correct work of plugin? Why it work in such way?

Thank you much for your attention

Actually it is. It is just the output is a reverse compliment of the input read. Below is the reverse compliment of your extracted read:

GTTCGCTCCCCACGCTTTCGCTCCTCAGCGTCAGTTACAGACCAGAGAGTCGCCTTCGCCACTGGTGTTCCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCCACTCTCCTCTTCTGCACTCAAGTTCCCCAGTTTCCAATGACCCTCCCCGGTTGAGCCGGGGGCTTTCACATCAGACTTAAGGAACCGCCTGCGAGCCCTTTACGCCCAATAATTCCGGACAACGCTTGCCACCTACGTA

This is acceptable, as the default for qiime feature-classifier extract-reads is to use the --p-read-orientation both option. Which means is will attempt sequence extraction from both orientations, and write out the data in the orientation in which the forward primer should have appeared at the 5' end. Often sequences can be in a mixed orientation within the sequence file.

That is, given your original sequence (which is a reverse compliment with respect to your primers) your primers will be found in the opposite orientation. That is your reverse primer will be found in the presented 5'-3' direction of GGACTACHVGGGTWTCTAAT, whereas your forward primer will be found as the reverse compliment TTACCGCGGCKGCTGGCAC. Note, you'll have to eyeball it or only search for a portion of the string in a text editor as there are IUPAC ambiguity codes in the primer sequence. Which won't match via a standard text string search.

Does this make sense?

3 Likes

Yes, thank you! Perfect answer. Got it :slightly_smiling_face:

2 Likes