QtSpeech: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
No edit summary
 
No edit summary
Line 1: Line 1:
=Qt Speech Module=
h1. Qt Speech Module


This page contains notes about the development of a qt speech module.<br /> Currently it is about tts (text to speech).<br /> Speech recognition may be introduced, but is a lot less trivial at this point in time it seems.
This page contains notes about the development of a qt speech module.<br />Currently it is about tts (text to speech).<br />Speech recognition may be introduced, but is a lot less trivial at this point in time it seems.


https://codereview.qt.io/#admin,project,qt/qtspeech,info<br /> ssh://codereview.qt.io:29418/qt/qtspeech.git
https://codereview.qt.io/#admin,project,qt/qtspeech,info<br />ssh://codereview.qt.io:29418/qt/qtspeech.git


==Current State==
== Current State ==


There is a basic implementation on Mac/Win/Linux/Android.<br /> Linux uses speech-dispatcher.
There is a basic implementation on Mac/Win/Linux/Android.<br />Linux uses speech-dispatcher.


==Todo==
== Todo ==


Decide on either plugins or only having one backend per platform.
Decide on either plugins or only having one backend per platform.


Collection of resources and links that should help defining a cross-platform <span class="caps">API</span>:
To implement on each platform:<br />* OS X: Pitch (patch in review currently &quot;here&amp;quot;:https://codereview.qt.io/99691)<br />* OS X: availableVoices (patch in review currently &quot;here&amp;quot;:https://codereview.qt.io/106097)<br />* Windows: availableVoices/setVoice/voice<br />* Linux: add all outputModule voices to availableVoices API<br />* Example widget: Remove voicetype combobox or add new api to implement it properly.


===<span class="caps">API</span> for language selection===
Collection of resources and links that should help defining a cross-platform API:
 
=== API for language selection ===


QLocale seems like a good candidate for languages.
QLocale seems like a good candidate for languages.


===Voice selection===
=== Voice selection ===


Summarize here which native <span class="caps">API</span> offers what. For example <span class="caps">SAPI</span> 5 has first names as voice identifiers.
Summarize here which native API offers what. For example SAPI 5 has first names as voice identifiers.


{| class="infotable line"
{|
| Platform/API
|Platform/API
| Voice Properties
|Voice Properties
| Link
|Link
|-
|-
| Win <span class="caps">SAPI</span> 5
|Win SAPI 5
| gender, age, name, lang, vendor names such as “Microsoft Anna” or “Mike”
|gender, age, name, lang, vendor - names such as &quot;Microsoft Anna&amp;quot; or &quot;Mike&amp;quot;
|
|http://msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspx#API_for_Text-To-Speech http://msdn.microsoft.com/en-us/library/ms723601(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspx#API_for_Text-To-Speech http://msdn.microsoft.com/en-us/library/ms723601(v=vs.85).aspx
|-
|-
| Mac Carbon
|Mac Carbon
| similar to cocoa api
|similar to cocoa api
|
|https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Synthesis_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000211
https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Synthesis_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000211
|-
|-
| Mac Cocoa NSSpeechSynthesizer
|Mac Cocoa NSSpeechSynthesizer
| id, name, age, gender, language, locale can be enumerated
|id, name, age, gender, language, locale - can be enumerated
|
|https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpeechSynthesizer_Class/Reference/Reference.html
https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpeechSynthesizer_Class/Reference/Reference.html
|-
|-
| Linux SpeechDisp
|Linux SpeechDisp
| name, language (2-letter), variant, voice type enum with Male1..3, Female1..3 and childMale, childFemale
|name, language (2-letter), variant, voice type enum with Male1..3, Female1..3 and childMale, childFemale
|
|http://cvs.freebsoft.org/doc/speechd/speech-dispatcher.html
http://cvs.freebsoft.org/doc/speechd/speech-dispatcher.html
|-
|-
| espeak
|espeak
| lang (two letter [_region-subregion]), age(?), gender, name(=long language name)
|lang (two letter [_region-subregion]), age(?), gender, name(=long language name)
| espeack —voices
|espeack —voices
|-
|-
| festival
|festival
| name, gender, maybe more
|name, gender, maybe more
|
|http://www.cstr.ed.ac.uk/projects/festival/
http://www.cstr.ed.ac.uk/projects/festival/
|-
|-
| Android
|Android
| has a concept of engines, setLanguage locale based, isLanguageAvailable() but no language listing
|has a concept of engines, setLanguage locale based, isLanguageAvailable() but no language listing
|
|http://developer.android.com/reference/android/speech/tts/TextToSpeech.html
http://developer.android.com/reference/android/speech/tts/TextToSpeech.html
|}
|}


===<span class="caps">CSS</span> and/or <span class="caps">XML</span> in strings to be spoken===
=== CSS and/or XML in strings to be spoken ===
 
http://www.w3.org/TR/2011/WD-css3-speech-20110419/

Revision as of 09:53, 24 February 2015

h1. Qt Speech Module

This page contains notes about the development of a qt speech module.
Currently it is about tts (text to speech).
Speech recognition may be introduced, but is a lot less trivial at this point in time it seems.

https://codereview.qt.io/#admin,project,qt/qtspeech,info
ssh://codereview.qt.io:29418/qt/qtspeech.git

Current State

There is a basic implementation on Mac/Win/Linux/Android.
Linux uses speech-dispatcher.

Todo

Decide on either plugins or only having one backend per platform.

To implement on each platform:
* OS X: Pitch (patch in review currently "here&quot;:https://codereview.qt.io/99691)
* OS X: availableVoices (patch in review currently "here&quot;:https://codereview.qt.io/106097)
* Windows: availableVoices/setVoice/voice
* Linux: add all outputModule voices to availableVoices API
* Example widget: Remove voicetype combobox or add new api to implement it properly.

Collection of resources and links that should help defining a cross-platform API:

API for language selection

QLocale seems like a good candidate for languages.

Voice selection

Summarize here which native API offers what. For example SAPI 5 has first names as voice identifiers.

Platform/API Voice Properties Link
Win SAPI 5 gender, age, name, lang, vendor - names such as "Microsoft Anna&quot; or "Mike&quot; http://msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspx#API_for_Text-To-Speech http://msdn.microsoft.com/en-us/library/ms723601(v=vs.85).aspx
Mac Carbon similar to cocoa api https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Synthesis_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000211
Mac Cocoa NSSpeechSynthesizer id, name, age, gender, language, locale - can be enumerated https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpeechSynthesizer_Class/Reference/Reference.html
Linux SpeechDisp name, language (2-letter), variant, voice type enum with Male1..3, Female1..3 and childMale, childFemale http://cvs.freebsoft.org/doc/speechd/speech-dispatcher.html
espeak lang (two letter [_region-subregion]), age(?), gender, name(=long language name) espeack —voices
festival name, gender, maybe more http://www.cstr.ed.ac.uk/projects/festival/
Android has a concept of engines, setLanguage locale based, isLanguageAvailable() but no language listing http://developer.android.com/reference/android/speech/tts/TextToSpeech.html

CSS and/or XML in strings to be spoken