The fantasy of people talking to machines is a science fiction staple. In tales of the future, computers listened, replied and even translated alien dialects - obviously.
As we enter this century and my printer still thinks it has US letter not A4 paper, it is easy to underrate technology's progress. But computer power is rocketing and some of the future is coming into view. Imagine it is time for end-of-term reports. You pick up the phone, dial an extension number and dictate a report. Your words are transcribed by computer, checked against the recording and sent to parents.
This is an idea based on a system in US hospitals. Developed by Dictaphone, the system lets medics file reports to a wireless mouthpiece. With a minimum of human effort, their recordings are transcribed and the text standardised so a machine can analyse it and generate patient bills.
Not only are there savings in time, the millions of reports form a huge bank of symptoms, history and treatment information that bodes well for making medicine work better in future.
The reporting system is the result of uniting technology by Lernout and Hauspie (Lamp;H), the speech and language company that owns Dictaphone and Dragon Systems and makes call centre, translation and speech recognition systems. Alongside rival IBM, it produces some of the best and most popular speech recognition software available.
Lamp;H and IBM, with its Via Voice product, are well poised for the start of a century when the SUI (speech user interface) may succeed the mouse-orientated GUI (graphical user interface).
In Asia, Lamp;H's RealSpeak is aiding language teaching. The software breaks recorded words into phonemes, allowing computers to read on-screen text with a more human-sounding voice. Available as part of Voice Xpress dictation software, RealSpeak lets PCs read text ad assemble phonemes according to a language's rules and though it can take months to build a language it is far more natural than previous computer-speak.
In China, demand for learning English is high. Almost 98 per cent of the population are Cantonese and the region has few native English teachers. Louis Woo, president of Lamp;H Asia Pacific, explains: "If I'm a teacher and know my pronunciation is questionable, I can ask students to listen to a passage and they can hear it with a professional voice. They will read the passage and repeat it after."
This technology is taken a stage further on the Press Association site, where headlines are read by virtual newscaster Ananova. The character, called an avatar, was created using scanned images of blinks, lip movements and expressions, creating a library of facial effects that could be triggered by the computer. Given news to read, the Ananova machine matches RealSpeak sounds to a moving face.
An exhibit at the Dome offers a clue as to where this technology is heading: here visitors can make an avatar of themselves. One day such a character may answer the phone, appear to read an email or teach your lessons.
The bull's-eye for IBM and Lamp;H is making computers understand context. For example, Ananova still needs to be told how to say things (stories are tagged to ensure she doesn't smile while reading, say, a disaster story) and her intonation remains a bit flat. And a Net search that recognises context may sift only for the facts you seek.
But the facility is part-way there already. Some software makes translations between language pairs; French to Japanese, for example. Some results may need a little work, but an instant rough translation is often good enough. Another step is software that searches the Net for nuggets of information even if they are in a foreign language - good news for the majority of the world which speaks no English. But what is certain is the contest between SUI and GUI has begun.