Knowing a language’s phonemes can make it much easier for automated systems to learn to interpret speech. In the 2015 volume of Transactions of the Association for Computational Linguistics, MIT researchers describe a new machine-learning system that, like several systems before it, can learn to distinguish spoken words. But unlike its predecessors, it can also learn to distinguish lower-level phonetic units, such as syllables and phonemes.
Unlike the machine-learning systems that led to, say, the speech recognition algorithms on today’s smartphones, the MIT researchers’ system is unsupervised, which means it acts directly on raw speech files: It doesn’t depend on the laborious hand-annotation of its training data by human experts. So it could prove much easier to extend to new sets of training data and new languages.
The system could offer some insights into human speech acquisition. “When children learn a language, they don’t learn how to write first,” says Chia-ying Lee, who completed her PhD in computer science and engineering at MIT last year and is first author on the paper. “They just learn the language directly from speech. By looking at patterns, they can figure out the structures of language. That’s pretty much what our paper tries to do.”
Lee is joined on the paper by her former thesis advisor, Jim Glass, a senior research scientist at the Computer Science and Artificial Intelligence Laboratory and head of the Spoken Language Systems Group, and Timothy O’Donnell, a postdoc in the MIT Department of Brain and Cognitive Sciences. Read more