I'm not sure why that would be. Even on a laptop, there should be 2-4 threads available, and when I test nthreads=1,2,4 with this data on my own Mac laptop I get the expected near-linear speedup. Dunno.
The F/R reads must overlap enough to merge them (20 nts + biological_length_variation of overlap), but provided that requirement is met, it is better to trim off the very low quality tails that often happen in the reverse Illumina reads.
Exact sequence variant methods like DADA2 algorithm rely on repeated observations of identical sequences to call biological sequences. When sequences have a very low probability of being error-free over the whole read (as is the case when low-quality error-rich sequence tails are kept) the algorithm will fail to identify low-frequency variants, because there is only 1 or 0 error-free reads representing those variants. This is what caused you to identify more sequence variants when trimming the reverse read to 240 -- the algorithm was better able to detect the low-frequency variants from the reverse data.