TL;DR: I'm heading back to square one and asking my collaborators for the FASTQ files and any other associated information instead of the .fna file they gave me.
also consider your underlying biological question.
I've been quiet as I have been thinking this part out. You're absolutely right, 50k characters for this alignment doesn't make any sense. These are 16s sequences and I'd totally been ignoring that the whole point of amplicon sequencing in this context is that this area should be conserved.
I'd barreled through without thinking about my MSA as I'd been Having trouble with alignment: mafft, SEPP, & SINA and so took getting a valid file as success. After a couple other troubleshooting bouts, including subsampling the rep-seqs and taking that subset through to a rooted phylogenetic tree (which was not happy to play with filtered-tables from the full rep-seqs while I was trying to do alpha rarefaction).
After taking a look at my files and code with a microbiome person on my campus, she noticed that my rep-seqs file is really not particularly de-replicated, despite my code looking as expected. However, we couldn't go back much farther because the data I was given access to was already demultiplexed, filtered of chimeras, and the paired-end reads were merged. I don't know what other processes happened before I got the file. I have some data sleuthing to do.
Thank you for all of your help! I've certainly gotten a lot out of understanding what I'm doing from this exercise!