Latest revision as of 20:46, 31 August 2015

Qt Speech Module

This page contains notes about the development of a qt speech module. Currently it is about tts (text to speech). Speech recognition may be introduced, but is a lot less trivial at this point in time it seems.

https://codereview.qt-project.org/#/admin/projects/qt/qtspeech ssh://codereview.qt.io:29418/qt/qtspeech.git

Current State

There is a basic implementation on Mac/Win/Linux/Android. Linux uses speech-dispatcher. Windows uses sapi5. OSX uses Cocoa NSSpeechSynthesizer api.

Todo

Decide on either plugins or only having one backend per platform.

To implement on each platform:

iOS: backend needs a bit of thought about how old of iOS should be supported as noted here: https://codereview.qt-project.org/98704

Collection of resources and links that should help defining a cross-platform API:

API for language selection

QLocale seems like a good candidate for languages.

Voice selection

Summarize here which native API offers what. For example SAPI 5 has first names as voice identifiers.

Platform/API	Voice Properties	Link
Win SAPI 5	gender, age, name, lang, vendor - names such as "Microsoft Anna" or "Mike"	http://msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspx#API_for_Text-To-Speech http://msdn.microsoft.com/en-us/library/ms723601(v=vs.85).aspx
Mac Carbon	similar to cocoa api	https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Synthesis_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000211
Mac Cocoa NSSpeechSynthesizer	id, name, age, gender, language, locale - can be enumerated	https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpeechSynthesizer_Class/Reference/Reference.html
Linux SpeechDisp	name, language (2-letter), variant, voice type enum with Male1..3, Female1..3 and childMale, childFemale	http://cvs.freebsoft.org/doc/speechd/speech-dispatcher.html
espeak	lang (two letter [_region-subregion]), age(?), gender, name(=long language name)	espeack —voices
festival	name, gender, maybe more	http://www.cstr.ed.ac.uk/projects/festival/
Android	has a concept of engines, setLanguage locale based, isLanguageAvailable() but no language listing	http://developer.android.com/reference/android/speech/tts/TextToSpeech.html

@@ Line 1: / Line 1: @@
-=Qt Speech Module=
+= Qt Speech Module =
+This page contains notes about the development of a qt speech module.
+Currently it is about tts (text to speech).
+Speech recognition may be introduced, but is a lot less trivial at this point in time it seems.
-This page contains notes about the development of a qt speech module.<br /> Currently it is about tts (text to speech).<br /> Speech recognition may be introduced, but is a lot less trivial at this point in time it seems.
+https://codereview.qt-project.org/#/admin/projects/qt/qtspeech
+ssh://codereview.qt.io:29418/qt/qtspeech.git
-https://codereview.qt.io/#admin,project,qt/qtspeech,info<br /> ssh://codereview.qt.io:29418/qt/qtspeech.git
+== Current State ==
-==Current State==
+There is a basic implementation on Mac/Win/Linux/Android.
+Linux uses speech-dispatcher.
+Windows uses sapi5.
+OSX uses Cocoa NSSpeechSynthesizer api.
-There is a basic implementation on Mac/Win/Linux/Android.<br /> Linux uses speech-dispatcher.
+== Todo ==
-==Todo==
+Decide on either plugins or only having one backend per platform.
-Decide on either plugins or only having one backend per platform.
+To implement on each platform:
+* iOS: backend needs a bit of thought about how old of iOS should be supported as noted here: https://codereview.qt-project.org/98704
-Collection of resources and links that should help defining a cross-platform <span class="caps">API</span>:
+Collection of resources and links that should help defining a cross-platform API:
-===<span class="caps">API</span> for language selection===
+=== API for language selection ===
 QLocale seems like a good candidate for languages.
-===Voice selection===
+=== Voice selection ===
-Summarize here which native <span class="caps">API</span> offers what. For example <span class="caps">SAPI</span> 5 has first names as voice identifiers.
+Summarize here which native API offers what. For example SAPI 5 has first names as voice identifiers.
-{| class="infotable line"
+{|
-| Platform/API
+|Platform/API
-| Voice Properties
+|Voice Properties
-| Link
+|Link
 |-
-| Win <span class="caps">SAPI</span> 5
+|Win SAPI 5
-| gender, age, name, lang, vendor – names such as “Microsoft Anna” or “Mike”
+|gender, age, name, lang, vendor - names such as "Microsoft Anna" or "Mike"
-|
+|http://msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspx#API_for_Text-To-Speech http://msdn.microsoft.com/en-us/library/ms723601(v=vs.85).aspx
-http://msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspx#API_for_Text-To-Speech http://msdn.microsoft.com/en-us/library/ms723601(v=vs.85).aspx
 |-
-| Mac Carbon
+|Mac Carbon
-| similar to cocoa api
+|similar to cocoa api
-|
+|https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Synthesis_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000211
-https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Synthesis_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000211
 |-
-| Mac Cocoa NSSpeechSynthesizer
+|Mac Cocoa NSSpeechSynthesizer
-| id, name, age, gender, language, locale – can be enumerated
+|id, name, age, gender, language, locale - can be enumerated
-|
+|https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpeechSynthesizer_Class/Reference/Reference.html
-https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpeechSynthesizer_Class/Reference/Reference.html
 |-
-| Linux SpeechDisp
+|Linux SpeechDisp
-| name, language (2-letter), variant, voice type enum with Male1..3, Female1..3 and childMale, childFemale
+|name, language (2-letter), variant, voice type enum with Male1..3, Female1..3 and childMale, childFemale
-|
+|http://cvs.freebsoft.org/doc/speechd/speech-dispatcher.html
-http://cvs.freebsoft.org/doc/speechd/speech-dispatcher.html
 |-
-| espeak
+|espeak
-| lang (two letter [_region-subregion]), age(?), gender, name(=long language name)
+|lang (two letter [_region-subregion]), age(?), gender, name(=long language name)
-| espeack —voices
+|espeack —voices
 |-
-| festival
+|festival
-| name, gender, maybe more
+|name, gender, maybe more
-|
+|http://www.cstr.ed.ac.uk/projects/festival/
-http://www.cstr.ed.ac.uk/projects/festival/
 |-
-| Android
+|Android
-| has a concept of engines, setLanguage locale based, isLanguageAvailable() but no language listing
+|has a concept of engines, setLanguage locale based, isLanguageAvailable() but no language listing
-|
+|http://developer.android.com/reference/android/speech/tts/TextToSpeech.html
-http://developer.android.com/reference/android/speech/tts/TextToSpeech.html
 |}
-===<span class="caps">CSS</span> and/or <span class="caps">XML</span> in strings to be spoken===
+=== CSS and/or XML in strings to be spoken ===
-http://www.w3.org/TR/2011/WD-css3-speech-20110419/