Archive for August, 2007

European speech recognition saga: Episode 3

Munich, Germany Looks like our tour of Europe is far from being over as the news just got through that…

– The Munich public hospital network (5 sites – 3,500 beds) will be rolling out a speech recognition system to a total of 100 physicians and 32 workstations in Radiology.

– The AZ Sint-Jan Hospital in Brugge, Belgium has completed an interface between speech recognition and their electronic health record (EHR) system. Driven by the Radiology and the Pathology departments, the 900-bed hospital has deployed the new reporting solution across all specialties including Cardiology, Gynecology and Orthopedics.

– 60 radiologists at the Aberdeen Royal Infirmary (ARI) claim they are now in full control of the creation of medical report thanks to the implementation of a front-end speech recognition system.

No apparent summer break for speech recognition. News of more installations keep coming in every week. Now, this is getting really exciting…

What is a Speech Recognition Context ?

What is a Speech Recognition Context ?Developed to avoid situations like these, a ConText is the “fuel” that feeds a speech recognition engine. Specific to one language AND one field of expertise (e.g.: Radiology, UK.), ConTexts were invented by Philips to tailor a speech recognition system to a given professional environment. Let’s see what exactly is inside a speech recognition ConText and what is the methodology used to develop one.

Ingredient #1: ConText Lexicon. This is a list containing words and phrases specific to a particular usage and language, e.g. Radiology, UK. To create a valuable ConText Lexicon, approximately 100 million words from the language area are needed. In the case of a Radiology ConText, those words are best obtained from existing radiological reports in order to capture the generally applied terminology. Philips or their integration partner would typically gather the data required in form of totally anonymous reports from different healthcare organizations in a given country.

Ingredient #2: Background Lexicon. This is a dictionary containing between 300,000 and 800,000 words (depending on the language), whose usage is not considered frequent enough for inclusion in a specific ConText Lexicon. This background lexicon is used for reference when unknown words are added to the ConText during ConText Adaptation (process which updates an author’s language model and ConText Lexicon based on his correction of a “recognized report”, in order to improve the recognition rate.)

Ingredient #3: Default Language Model. This is the Context Lexicon plus a statistical model which represents word usage and sequences of words. It represents the way a group of persons use a language in a specific context, professional for instance. The language model is specific to an author and a Context.

Ingredient #4: Acoustic Reference. This is a collection of statistical data describing the vocal characteristics of an individual user. The production of a phoneme varies from a human being to another (variables include accents, age, pronunciation, etc.) and a language is not spoken in 2007 the way it was in the 1950’s. The Acoustic Reference will thereby “take a picture” of how a language is spoken at a given point in time. To develop an Acoustic Reference, say Swedish for instance, several hundred hours of spoken Swedish, covering all regions of the country are recorded and analyzed, resulting in an average model. Based on this average model, the speech recognition engine will then be able to interpret an author’s speech input and optimize the recognition rate regardless of his dialect, age, etc. This specific data, unique to each Author, is stored in an ARF (Acoustic Reference File).

The list of speech recognition Contexts developed by Philips to date can be found here.

Thought of the day: “Contextual Intelligence Matters…”

contextual intelligence matters

Ooooppss…doesn’t it?

The beauty of the network approach

The beauty of the network approach Why does professional speech recognition work so well as opposed to individual applications? Well, let’s think about it. What professional SR does is networking multiple physicians from a same specialty across what they have in common: their language patterns and medical vocabulary. This collegial approach makes a huge difference in itself, since the SR engine will be loaded with vocabulary specific to Pathology or Cardiology for instance. In specialties where medical terminology prevails in the reporting process (i.e.: Radiology as opposed to Psychiatry), great results are achieved right from the start.

Achieving similar results as a consumer would require patience let alone advanced organization skills. Let’s say I’m a soccer fan using speech recognition to comment game strategies, I’d better be part of a networked community sharing the exact same interest and using speech recognition for the exact same purpose…

Now, what about individual pronunciations? How does the engine work this out? Once a “speech recognized” report has been corrected and signed off, the speech recognition engine initiates what is probably the most important phase of all; it is called Adaptation. During adaptation, the SR engine makes all the required adjustment by comparing the recognized -draft- report and its corrected -final- version, matching a specific pronunciation with a specific word here, collecting an unknown word to be added to the lexicon there. And because this lexicon is shared with other users in the department, every new word is automatically and immediately made available to everyone else on the network. That’s nothing more than the whole “United we stand, divided we fall” concept at work.

Speaking hardware: what are the DSS and DSSPro standards?

Digital Dictation Hardware Healthcare facilities setting sail for speech recognition are typically advised to equip their physician staff with digital dictation devices that support the DSS or DSS Pro format. The reasons? Optimal sound quality and sampling rates; both key ingredients to a successful speech recognition experience.

A bit of history first. The .dss format was created by a voluntary organization called the International Voice Association (IVA), formed jointly by Grundig, Olympus and Philips back in 1994. DSS is maintained as a manufacturer-independent and international standard for professional speech processing that can be used – under certain conditions – by any manufacturer, as long as it is used in professional devices. This guarantees the user a secure investment in terms of the procurement, use and future compatibility of his systems.

DSS offers high audio quality and allows a high compression rate without noticeable loss of quality, as well as low energy consumption. The compression was to permit efficient memory usage and data transfer for digitized speech. The quality had to be retained so that even quietly spoken passages could be clearly understood and speech recognition could be applied. At the same time, everything had to be accomplished at a reasonable computational expense in order to keep power consumption in check because mobile dictation devices are frequently used for extended periods.

DSS is often called “MP3 for Speech”. As a compression algorithm for speech, DSS is comparable with the music format MP3. Although the sound quality differs only negligibly from the uncompressed original, .dss files are very small. This allows them to be transferred quickly to the PC and easily sent by e-mail. Because the technology only compresses the parts of speech that are truly important, the standard practically filters out the concentrated speech of a dictation without losing quality. A 10-minute dictation that requires only about 1 MB in the .dss format, requires up to 12 times as much memory with typical compression.

In March 2007, the IVA launched DSSPro, presenting it as being “far more than just a speech recording standard – DSSPro actually allowing far-reaching management functions for the workflow,” thanks to the following new functions:

  • Support of real-time file encryption during recording to protect confidential dictation data.
  • Higher 16 kHz sampling rate provides a more natural playback of human voice as well as optimized quality for speech recognition.

Popular digital dictation devices supporting the DSS format:

Philips SpeechMike range (PC microphone)
Philips Digital Pocket Memo 9360
(mobile recorder)
Olympus DS-3300 (mobile recorder)
Grundig Digta CordEx (PC microphone)

Popular digital dictation devices supporting the DSSPro format:
Philips Digital Pocket Memo 9600 (mobile recorder)
Grundig DigtaSonic xMic (PC microphone)
Olympus DS-4000 (mobile recorder)

Bilingual speech recognition doesn’t let physicians get lost in translation

Bilingual Speech Recognition In border regions like Eastern Ontario, Canada, two languages are spoken. When a patient comes in a hospital saying either “it hurts” or “j’ai mal”, healthcare organizations thereby have add the language factor to an already complex documentation workflow. As obvious as it may sound, isn’t documenting a case in the patient’s native language the very first step to accurate and quality healthcare? And since nothing seems to stop 21st century hospitals on their way to forging this long-awaited though modern healthcare, a hospital in bilingual Ottawa region has decided to take on the challenge using bilingual speech recognition; a North American – if not worldwide – first.

Ottawa based Hôpital Montfort is a 206 bed facility that boasts 100 physicians on its active medical staff and twelve medical Transcriptionists. After implementing an integrated document creation platform including digital dictation, transcription and distribution in 2007, the hospital is now set to implement an additional, bilingual speech recognition module to further accelerate the processing of reports based on the patient’s language. Based on the language set either within the physician’s profile or upon physician’s login, the speech recognition engine would launch the proper language ConText in the background. So if a French speaking patient comes in, the physician would dictate in French using a French-Canadian speech recognition ConText. The voice file would then be automatically routed to a French-speaking correction resource, and the final report issued in French.

On the other hand, and in order to ensure the instant availability of up-to-date patient results and demographics to all relevant medical staff., bi-directional HL7 interfaces have already been implemented between the dictation-transcription platform and the Hospital Information System (Admission, Discharge, Transfer and ADT). A similar interface has also been implemented with the hospital’s Patient Care Inquiry module, thus enabling instant viewing of pathology reports, once again, regardless of the language used.

The project has raised interest in other parts of the world, and a large hospital in France is currently looking to implement a similar solution.

More on European Speech Recognition Projects

Nick van Terheyden, MD Industry expert Nick van Terheyden, MD, Chief Medical Officer, Philips Speech Recognition Systems, wanted to comment on my previous thread about European speech recognition projects with a few additional figures. So it is with pleasure that I am posting van Terheyden’s thread today for another tour of Europe.

Speech recognition has definitely reached “tipping point” in the Old Continent with a rather impressive number of projects underway:

  • The Dutch are driving forward with an adoption rate estimated at 80% in several specialties across the Netherlands.
  • The Spanish are surging ahead with 50% of Spain’s radiologists using front-end speech recognition in the Valencia region.
  • The Norwegians are notably ahead with 100% of Norway’s Healthcare regions implementing speech recognition.
  • The Danes are delivering value at the Vejle County Hospital, where speech recognition is fully integrated with their electronic health record system for 1,400 users; an overall productivity rise of 5 to 7 percent that represents savings of several million Danish Kroner (1m DKK = 184,000 USD).
  • The French are forging forward with all 39 Public hospitals in Paris (15,000 physicians and transcriptionists in total) to be equipped with speech recognition by 2010.
  • And the Italians in all this? With no less than 22 hospitals in the idyllic Friuli-Venezia Giulia region having recently adopted front-end speech recognition, legions of physicians are just about to cross the RubiCon-Text. Alea jacta est.

Note: a ConText is a collection of acoustic data and vocabulary that reflects the spoken and written language used by professionals in a specific medical specialty, as developed by Philips Speech Recognition Systems.


Blog Stats

  • 95,622 hits