From time to time I stumble on technical-translation topics for which I can’t find solid material online. The themes below could make good MA or PhD theses. I freely cede these ideas—please drop me an email if you use any of them.
I have the impression—at least in Spain—of a gradual avoidance of words of Arabic origin. For instance, acequia is increasingly replaced by canal; alhaja by joya; alberca → piscina; ojal → agujero; alacena → despensa, etc. It seems reasonable that such vocabulary would slowly recede as the historical context has changed and Arabic cultural influence is now limited. But are there other drivers? Perhaps these words sound old-fashioned and, to appear more “modern” (or “educated”), writers prefer Latin- or English-rooted terms.
The research question I’d like to test is the existence and magnitude of this trend, regardless of motives. Thanks to large web corpora and NLP tooling, this is now perfectly feasible.
Example to track: alcancía (Arabism) vs hucha (Latinate).
“Word” looks like an obvious unit, yet it’s surprisingly hard to define. In linguistics it’s often “the minimal meaningful unit that can stand alone and fulfils a syntactic function”. This is useful but shaky when faced with cases such as Spanish díjoselo (one orthographic word composed of several morphemes) versus se lo dijo (three orthographic words with the same meaning).
Different subfields focus on different facets: morphology on internal structure (roots, affixes, morpheme grouping), syntax on function and combinatorics, while orthography and software may apply other criteria. These perspectives don’t always align: what counts as a word for morphology may not for orthography—or for a word-processing program.
This theoretical ambiguity has practical consequences in translation, where pricing is often per word. But what exactly is counted? Is horse-race one or two words? What about can’t? How do we treat German separable/compound verbs like zurückkommen? Chinese characters? Spanish clitics? How do Microsoft Word and different CAT tools count?
A thesis could map these discrepancies and propose a standardised word-counting protocol for professional use.
© 2025 Alejandro Moreno Ramos, www.ingenierotraductor.com