QtSpeech: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
No edit summary
 
(Fix url to code review for QtSpeech)
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=Qt Speech Module=
= Qt Speech Module =
This page contains notes about the development of a qt speech module.
Currently it is about tts (text to speech).
Speech recognition may be introduced, but is a lot less trivial at this point in time it seems.


This page contains notes about the development of a qt speech module.<br /> Currently it is about tts (text to speech).<br /> Speech recognition may be introduced, but is a lot less trivial at this point in time it seems.
https://codereview.qt-project.org/#/admin/projects/qt/qtspeech
ssh://codereview.qt.io:29418/qt/qtspeech.git


https://codereview.qt.io/#admin,project,qt/qtspeech,info<br /> ssh://codereview.qt.io:29418/qt/qtspeech.git
== Current State ==


==Current State==
There is a basic implementation on Mac/Win/Linux/Android.
Linux uses speech-dispatcher.
Windows uses sapi5.
OSX uses Cocoa NSSpeechSynthesizer api.


There is a basic implementation on Mac/Win/Linux/Android.<br /> Linux uses speech-dispatcher.
== Todo ==


==Todo==
Decide on either plugins or only having one backend per platform.


Decide on either plugins or only having one backend per platform.
To implement on each platform:
* iOS: backend needs a bit of thought about how old of iOS should be supported as noted here: https://codereview.qt-project.org/98704


Collection of resources and links that should help defining a cross-platform <span class="caps">API</span>:
Collection of resources and links that should help defining a cross-platform API:


===<span class="caps">API</span> for language selection===
=== API for language selection ===


QLocale seems like a good candidate for languages.
QLocale seems like a good candidate for languages.


===Voice selection===
=== Voice selection ===


Summarize here which native <span class="caps">API</span> offers what. For example <span class="caps">SAPI</span> 5 has first names as voice identifiers.
Summarize here which native API offers what. For example SAPI 5 has first names as voice identifiers.


{| class="infotable line"
{|
| Platform/API
|Platform/API
| Voice Properties
|Voice Properties
| Link
|Link
|-
|-
| Win <span class="caps">SAPI</span> 5
|Win SAPI 5
| gender, age, name, lang, vendor names such as “Microsoft Anna” or “Mike”
|gender, age, name, lang, vendor - names such as "Microsoft Anna" or "Mike"
|
|http://msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspx#API_for_Text-To-Speech http://msdn.microsoft.com/en-us/library/ms723601(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspx#API_for_Text-To-Speech http://msdn.microsoft.com/en-us/library/ms723601(v=vs.85).aspx
|-
|-
| Mac Carbon
|Mac Carbon
| similar to cocoa api
|similar to cocoa api
|
|https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Synthesis_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000211
https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Synthesis_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000211
|-
|-
| Mac Cocoa NSSpeechSynthesizer
|Mac Cocoa NSSpeechSynthesizer
| id, name, age, gender, language, locale can be enumerated
|id, name, age, gender, language, locale - can be enumerated
|
|https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpeechSynthesizer_Class/Reference/Reference.html
https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpeechSynthesizer_Class/Reference/Reference.html
|-
|-
| Linux SpeechDisp
|Linux SpeechDisp
| name, language (2-letter), variant, voice type enum with Male1..3, Female1..3 and childMale, childFemale
|name, language (2-letter), variant, voice type enum with Male1..3, Female1..3 and childMale, childFemale
|
|http://cvs.freebsoft.org/doc/speechd/speech-dispatcher.html
http://cvs.freebsoft.org/doc/speechd/speech-dispatcher.html
|-
|-
| espeak
|espeak
| lang (two letter [_region-subregion]), age(?), gender, name(=long language name)
|lang (two letter [_region-subregion]), age(?), gender, name(=long language name)
| espeack —voices
|espeack —voices
|-
|-
| festival
|festival
| name, gender, maybe more
|name, gender, maybe more
|
|http://www.cstr.ed.ac.uk/projects/festival/
http://www.cstr.ed.ac.uk/projects/festival/
|-
|-
| Android
|Android
| has a concept of engines, setLanguage locale based, isLanguageAvailable() but no language listing
|has a concept of engines, setLanguage locale based, isLanguageAvailable() but no language listing
|
|http://developer.android.com/reference/android/speech/tts/TextToSpeech.html
http://developer.android.com/reference/android/speech/tts/TextToSpeech.html
|}
|}


===<span class="caps">CSS</span> and/or <span class="caps">XML</span> in strings to be spoken===
=== CSS and/or XML in strings to be spoken ===
 
http://www.w3.org/TR/2011/WD-css3-speech-20110419/

Latest revision as of 20:46, 31 August 2015

Qt Speech Module

This page contains notes about the development of a qt speech module. Currently it is about tts (text to speech). Speech recognition may be introduced, but is a lot less trivial at this point in time it seems.

https://codereview.qt-project.org/#/admin/projects/qt/qtspeech ssh://codereview.qt.io:29418/qt/qtspeech.git

Current State

There is a basic implementation on Mac/Win/Linux/Android. Linux uses speech-dispatcher. Windows uses sapi5. OSX uses Cocoa NSSpeechSynthesizer api.

Todo

Decide on either plugins or only having one backend per platform.

To implement on each platform:

Collection of resources and links that should help defining a cross-platform API:

API for language selection

QLocale seems like a good candidate for languages.

Voice selection

Summarize here which native API offers what. For example SAPI 5 has first names as voice identifiers.

Platform/API Voice Properties Link
Win SAPI 5 gender, age, name, lang, vendor - names such as "Microsoft Anna" or "Mike" http://msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspx#API_for_Text-To-Speech http://msdn.microsoft.com/en-us/library/ms723601(v=vs.85).aspx
Mac Carbon similar to cocoa api https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Synthesis_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000211
Mac Cocoa NSSpeechSynthesizer id, name, age, gender, language, locale - can be enumerated https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSSpeechSynthesizer_Class/Reference/Reference.html
Linux SpeechDisp name, language (2-letter), variant, voice type enum with Male1..3, Female1..3 and childMale, childFemale http://cvs.freebsoft.org/doc/speechd/speech-dispatcher.html
espeak lang (two letter [_region-subregion]), age(?), gender, name(=long language name) espeack —voices
festival name, gender, maybe more http://www.cstr.ed.ac.uk/projects/festival/
Android has a concept of engines, setLanguage locale based, isLanguageAvailable() but no language listing http://developer.android.com/reference/android/speech/tts/TextToSpeech.html

CSS and/or XML in strings to be spoken