专利汇可以提供Gestural motion and speech interface control method for 3d audio-video-data navigation on handheld devices专利检索,专利查询,专利分析的服务。并且A cognizant and adaptive method of informing a multi-modal navigation interface or a user's intent. This provides the user with the experience of exploring an immersive representation of the processed multimedia (audio-video-data) sources available that automatically adapts to her/his fruition preference. These results are obtained by first reconciling and aligning the User and the Device's frames of reference in tri-dimensional space and then dynamically and adaptively Smoothly Switching and/or combining both Gesture, Motion and Speech modalities. The direct consequence is a user experience that naturally adapts to the user choice of interaction and movement.,下面是Gestural motion and speech interface control method for 3d audio-video-data navigation on handheld devices专利的具体信息内容。
We claim:
This application is related to and claims priority from U.S. Provisional Patent Application No. 61/815,753 filed Apr. 25, 2013. Application 61/815,753 is hereby incorporated by reference in its entirety.
1. Field of the Invention
The present invention relates generally to navigation on interactive handheld devices and more particularly to tools that implement a user's 3D navigation experience capable of displaying an interactive rendition of 2D and/or 3D audio-visual data accessed by the device locally and/or streamed via remote systems.
2. Description of the Prior Art
Gestural interfaces have become increasedly present in the market during the last few years. Consumer electronics manufacturers such as Nintendo, Apple, Nokia, LG, and Microsoft have all released products that are controlled using interactive gestures. Many of them utilize the motion of the human body or those of handheld controllers to drive users' interaction with videogames, television menu's control and the like.
Most current videogame interfaces on mobile devices like smart phones and tablets already use touch gestures to allow players to execute movements in space or choose actions or commands that are then reflected on-screen. Other categories of hardware devices in the videogames market incorporate gesture driven interfaces such as game consoles like Microsoft xBox™ 360 which use specific hardware (kinect) capable of reading user body motion and/or posture or tracking gesture sequences through a sophisticated implementation of image recognition techniques and augmented (3D depth) camera acquisition. Newer devices, like the Leap Motion Controller generalize some of the motion-tracking paradigm while bringing it out of the videogames domain and into the everyday desktop applications (apps).
Panoramic Imagery Navigation applications, like Google Street View, have incorporated both paradigms of motion and gesture—used alternatively to explore geo-referenced street panoramas.
Speech commands are commonly used in standard applications such as Siri™ (for Apple devices), or Loquendo™ (for PC programs), or the Microsoft Inquisit™ speech recognition engine. When the user speaks a command, the speech recognition system is activated detecting the phonetics and performing the required action. These speech recognition systems usually must be trained to the user's voice.
It would be extremely advantageous to have a system that could take full advantage of the gestural and motion capabilities available to a device using onboard sensors in concert with buttons displayed on the device screen that provides complete navigational capabilities.
The present invention creates a cognizant and adaptive method of informing a multi-modal navigation interface or a user's intent. This provides the user with the experience of exploring an immersive representation of the processed multimedia (audio-video-data) sources available that automatically adapts to her/his fruition preference. Such sources may include any combination of:
A specific example is given here demonstrating the navigation capabilities of the present invention; the example was developed for Apple iPad™ device. The concepts presented here can easily be transferred to other environments. Such environments may include various types of mobile devices such as smartphones and tablets as well as other similar cases of gestural, motion and speech sensing, enabled hardware.
Possible embodiments include the navigation of tri-dimensional virtual worlds like computer graphic simulations or videogames through the combined use of gesture, motion and speech modes on a mobile device. This provides the user with a navigation experience that is cognizant of the device Positioning, Heading and Attitude (orientation relative to a frame of reference: Cartesian, spherical etc.).
These results are obtained by first reconciling and aligning the User and the Device's frames of reference in tri-dimensional space and then dynamically and adaptively Smoothly Switching and/or combining both Gesture, Motion and Speech modalities. The direct consequence is a user experience that naturally adapts to the user choice of interaction and movement.
As an example, using a tablet (like the Apple iPad™ or similar products on the market today) to experience a videogame, the present invention allows the player to alternatively or collaboratively apply Gesture, Speech and Motion modalities. While sitting on a sofa, for instance, the player may prefer to delegate all navigation actions to the Gesture Interface (for example a touch screen interaction in case of the Apple iPad™) but a sudden “call to action” in the flow of the game, or an audible stimulus (3D audio localization performed by the application) can create the need of an abrupt change of point of view and/or position in the virtual world. This can more efficiently be achieved by letting the player's natural movement (as detected by the Motion Interface) intervene in the simulation, and either collaborate with the Gesture Interface or, in some cases, take control of the navigation system altogether.
Several drawings are presented to illustrate features of the present invention:
Drawings and illustrations have been presented to aid in understanding the present invention. The scope of the present invention is not limited to what is shown in the figures.
The present invention relates to a system and method for combining and smoothly switching between gestural, motion and speech control of a handheld device.
The desired level of interaction described in the present invention is obtained by means of an advanced gesture interface system that performs the following tasks:
The elements being described here can be performed on audio-video-data sources obtained via the methods known in the art. Such sources might be available offline to be pre-processed and/or can be streamed and interpreted in real-time by the server and/or the client.
“Data Sources” are sources of 2D or 3D audio-video.
“World” is a multi-dimensional representation of audio-video-data sources that can manifest as:
“User” is a single or multiple entity, human or computer, locally or remotely interacting with the Device.
“Virtual User” is a single or multiple representation of the User in the World.
“Device” is a single or multiple handheld hardware device capable of one or any combination of: displaying, receiving, recording and processing multimedia sources as well as receiving direct or remote input from User or local or remote device and/or computer system.
“Device Vector” is the vector defined by the device Heading, Position and Attitude.
“Device Window” is a single or multiple instance of the device Viewing Frustum as determined by virtual (CG World) or real (Real World) camera lens data and by the Device detection and programming of its own (or attached) hardware camera parameters.
“Gesture Interface” This is the category of interactions performed by the user/s gestures (touch-motion-expression) with the hardware available on board and/or attached to the device. Examples include: touch screens, gesture detection or user/s body motion via additional devices like Leap Motion etc. or face expression detection via on-board camera or additional devices.
“Motion Interface” This is the category of interactions, performed by the user/s while moving in free space and holding the device, detected by the hardware available on-board and/or attached/connected to the device. Examples include motion tracking via: accelerometers, gyroscopes, magnetometers, GPS, camera tracking-image processing and other similar sensors. This is to determine, for example, the device heading, position and attitude while the user freely moves it up, down, left, right, back, forward and/or rotates it around its axes.
“Speech Interface” This is the category of interactions performed by the user/s via spoken language with the hardware available on board the device and/or attached/connected to the device along with speech recognition software.
The three modalities: Gesture, Motion and Speech Interfaces may act collaboratively permitting a Smooth Switching among their possible combination and variations.
The relative and absolute localization data about the device is determined by the vector (Device Vector) defined by its Heading, Position and Attitude information provided by the device's onboard and/or attached sensors (calculated from the raw sensor inputs).
Gesture, Motion and Speech Interfaces utilize user input as detected and classified in two principal Navigation Classes: Static and Displaced.
Static
The Static Navigation Class represents all the types of motions of the Device Vector that do not significantly alter its Position parameters (depending on frame of reference). Examples may include: look around in all directions, tilt up or down (all without lateral displacement).
Displaced
The Displaced Navigation Class represents all the types of motions of the Device Vector that significantly alter its Position parameters (depending on frame of reference). Examples may include: moving forward, back, left, right, up, down.
Users can switch from a pure Gestural use of the interface by performing relevant Motions or Speech (Static or Displaced—as rendered available by the system). Relevant Motions are motions of the Device Vector possibly determined by respective user movements (captured by the Device sensors) that exceed a programmed and user changeable threshold.
In a possible embodiment, the user can explore a given virtual world with the combined use of Gestures, Motion and Speech Interfaces of which we now give examples of as follows:
Gesture Interface
Gesture Interface (left or right fingers)—Look Around (Static Class)
Up-Down-Left-Right
Gesture Interface (left or right fingers)—Move (Displaced Class)
Walk-Run-Jump-Fly-Duck|Back-Forth-Left-Right
A typical example may be through the use of virtual buttons on a touch screen or a joystick (virtual or real). The user manipulates that buttons with the fingers to change the view toward the left or right, or up or down in the static class. In the dynamic class, the user again manipulates the fingers cause scene being viewed to displace its viewing location. The user, while really at a fixed location, causes the scene to appear so that the user has the impression that he or she is walking, running flying or the like.
Motion Interface
User Moves the Device in all Directions (Static Class)
Pivot around position in 360 degrees
Here the user actually moves the device in its physical space (usually by holding the device and changing its orientation. The viewed scene follows the motions of the user. Hence, if the user was holding the device pointing horizontally, and then lifts it upward over his head, the scene can go from showing what is in front horizontally to what is above. In the static class, the user does not change his coordinates in 3D space.
User Moves her/his Body in all Directions (Displaced Class)
Walk-Run-Jump-Fly-Duck|Back, Forth, Left, Right
Rotates around her/himself while standing and/or walking/running or the like. Here, the user changes his coordinates in physical space by waking, running or the like.
Speech Interface
Spoken Command—“Look Around” (Static Class)
“Look Up”-“Down”-“Left”-“Right”-“Look At”
Spoken Command—“Move” (Displaced Class)
‘Walk”-“Run”-“Jump”-“Fly”-“Duck”|“Back”-“Forth”-“Left”-“Right”
“GoTo” spoken directions like:
“Take me to”:
The speech interface can contain a large number of recognizable spoken commands that can be used in a way similar to the gestural commands, except spoken to move the scene in either the static or dynamic class.
Prevalence
To maintain order when the user is allowed to make any combination of gestural, motion or speech commands (or subsets of these), it is necessary to give the various interfaces priority values. These will be called prevalence. Prevalence is used to decide what to do when commands are received simultaneously on more than one of the interfaces. The preferred embodiment of the present invention uses the following prevalence:
This is one possible prevalence assignment. Other combinations are possible. Any choice of prevalence is within the scope of the present invention.
This Provides the User with Modal Variations Like (Shown in
M0—A calibration mode where no gestures, motions or speech commands are accepted.
M1—(Gestural Exclusive—All actions related to Navigation Classes are performed via Gestural Interface)
The user typically uses finger manipulations to change what is displayed as shown in
M2—(Motion Exclusive—All actions are performed via user physical motions while holding the device)
Here the user totally controls the device by physically moving it as shown in
M3—(Speech Exclusive—All actions are performed via spoken language)
Here static and displaced class commands are given exclusively by voice as shown in
M4—(Static gestural, static motion—commands may be entered both by gestures and by static motion.
Here gestures can be used to control walking, running and the like with static motions used to determine the direction of view as shown in
M5—(Static gestural, displaced motion—commands may be entered by both gesture and motion).
Here running, walking and the like are controlled by actually moving the device, while the direction of view is determined by gestures as shown in
M6—(Displaced motion, static speech—commands may be given by moving the device and speech).
Here, running, walking and the like are controlled by moving the device, while the direction of view is determined by speech command as shown in
M7—(Static speech, displaced gestural—commands may be given both by gestures and speech).
Here gestures are used to control running, waking and the like, while speech commands determine the direction of view as shown in
M8—(Static gestural, displaced speech—commands may be given by speech and by gestures).
Here, speech commands are used to determine running, walking and the like while gestures determine the direction of view as shown in
M9—(Static motion, displace speech—commands may be given by motion and speech).
Here, speech commands control running, walking and the like, while motions control the direction of view as shown in
As previously stated, when multiple interfaces are used for control, order is maintained through the use of prevalence. Here are some examples of prevalence:
Mode M4—Static gestural, static motion.
The user uses the Gesture Interface to alter her/his position through the World while, at the same time, uses the Motion Interface (pointing the device towards the desired direction) to determine her/his orientation in 3D space.
Mode M5—Static gestural, displaced motion.
The user uses the Motion Interface to alter her/his position through the World while, at the same time, uses the Gesture Interface (for example using a touch screen interaction) to determine her/his orientation in 3D space.
M8—Static gestural, displaced speech.
The user uses the Speech Interface to alter her/his position through the World while, at the same time, uses the Gesture Interface (for example using a touch screen interaction) to determine her/his orientation in 3D space.
The present invention provides a Smooth and Adaptive Automatic Switching among these modalities. The method provides the user with a reasonably seamless transition between situations where the interaction changes (in one of the two navigation classes [static-displaced]) from Gesture or Speech to Motion. All the while, the system monitors changes in user interaction and the device's relative position and attitude and provides a real-time dynamic adaptation to the world's (relative or absolute) coordinates returning (smoothly switching) the Static or Displaced Class control to the Gesture, Speech and Motion's Interfaces respectively.
The following steps are used to achieve the smooth and adaptive automatic switching as shown in
The purpose of this process is to perform a first alignment of Device and World's Coordinates and respective Frames of Reference in the following scenarios:
According to
According to
Sample pseudo code has been supplied to illustrate a possible embodiment of the present invention on a device that allows gestures, contains motion sensors, and can receive and process speech commands.
Embodiments, however, may be embodied in many different forms and should not be construed as being limited to the embodiment set forth herein. Rather, this preferred embodiment is provided so that this disclosure will be thorough and complete, and will fully convey the scope to those skilled in the art. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. It will be understood that although the terms first, second, third, etc., may be used herein to describe various elements, components, Classes or methods, these elements, components, Classes or methods should not be limited by these terms. These terms are only used to distinguish one elements, components, Classes or methods from another element, component, Class or method.
Several descriptions and illustrations have been provided to aid in understanding the present invention. One with skill in the art will understand that numerous changes and variations may be made without departing from the spirit of the invention. Each of these changes and variations is within the scope of the present invention.
Note:
The present example follows the preferred embodiment and applies the Prevalence method as described above in the Prevalence paragraph. This does not exclude further embodiments that use a different Prevalence assignment.
//Process World
init WorldView;
init DeviceViewFrustum;
(Virtual World View is established [locally or remotely] and displayed on device)
//Process User Interaction
NavigationState=M0
init Gesturelnterface;
init Motionlnterface;
init Speechlnterface;
init StaticNavigationClass;
init DisplacedNavigationClass;
init DetectionThreshold; //(Sensor threshold of intervention)
init StaticNavigationState=NULL;
init DisplacedNavigationState=NULL;
init GestureNavigationState=NULL;
init Motion NavigationState=NULL;
init Speech NavigationState=NULL;
//Sensors on the devices are continuously queried. Each of the possible input is associated with a set of actions that correspond to the Navigation Classes as explained above on page 8
get NavigationState
detect Userinteraction
//Detect User Interaction
The STATIC and DISPLACED navigation classes are queried simultaneously to check if each of the Gesture, Motion and Speech Interfaces (in order of priority) is using any of the available actions present in the two classes (see examples page 8).
//Update User Interaction State
a. Static Class->Interface used (Gesture-Motion-Speech).
b. Displaced Class->Interface used (Gesture-Motion-Speech).
c. Calculation of new device vector.
Set NavigationState=M(0-9)
if NavigationState≠M0 then
NewNavigationState=TRUE
UpdateDeviceWindow
//Switch to Different Modalities
In the current example a Gesture Prevalence method is described, as a consequence Smooth Switching between different modalities is explained considering such prevalence.
When a change is present and executed in the user fruition of the Classes and Interfaces (for instance going from a “gesture only” to one of the possible “gesture and motion” modalities), such change is detected and recorded in the updated user interaction.
When an action, previously performed in either the Static or the Displaced Classes using either the Gesture or Speech Interfaces instructions, is subsequently dynamically performed using the Motion interface, a new real-time query of the Device Position and Attitude continuously updates the Device Vector and a transition trajectory path is calculated from the last non-MotionInterface coordinates to the currently updated Device Vector.
To allow the method to give the user the sensation that it is “smoothly following” the change in the use of the interface, the process, when required, can perform a programmed animation of the transition trajectory from the point of switch (Gesture or Speech to Motion executed action) to its immediate destination in real-time (considering the device and eventual connection performance) finally adapting the device vector and window to the new request from the user.
get NavigationState
if NewNavigationState=TRUE
get STATIC and DISPLACED Interfaces States
if in STATIC or DISPLACED Classes, Speech and/or Motion Interface based actions=NULL
if in STATIC or DISPLACED Classes, Gesture and/or Motion Interface based actions=NULL
if in STATIC or DISPLACED Classes, Gesture and/or Speech Interface based actions=NULL
UpdateUserInteractionState
Several descriptions and illustrations have been presented to aid in understanding features of the present invention. One with skill in the art will realize that numerous changes and variations may be made without departing from the spirit of the invention. Each of these changes and variations is within the scope of the present invention.
标题 | 发布/更新时间 | 阅读量 |
---|---|---|
光纤微气腔光声池及制备方法和溶解气体检测方法 | 2020-05-08 | 807 |
多模态界面 | 2020-05-12 | 33 |
一种4G多模移动终端及其手动搜网方法 | 2020-05-12 | 52 |
用于睡眠监测的心肺信号感知与采集系统 | 2020-05-16 | 902 |
一种多模态认知检测与康复系统装置 | 2020-05-17 | 487 |
一种基于多模通信的遇险救生方法及系统 | 2020-05-21 | 19 |
平面波束形成和操纵的光学相控阵列芯片以及使用其的方法 | 2020-05-12 | 742 |
一种用于医用重离子加速器的真空监控系统 | 2020-05-19 | 389 |
基于北斗双天线系统的高大架体垂直度监测系统及方法 | 2020-05-15 | 474 |
一种加纵向挡板的卧式圆柱液罐内液体晃动模型建立方法 | 2020-05-20 | 299 |
高效检索全球专利专利汇是专利免费检索,专利查询,专利分析-国家发明专利查询检索分析平台,是提供专利分析,专利查询,专利检索等数据服务功能的知识产权数据服务商。
我们的产品包含105个国家的1.26亿组数据,免费查、免费专利分析。
专利汇分析报告产品可以对行业情报数据进行梳理分析,涉及维度包括行业专利基本状况分析、地域分析、技术分析、发明人分析、申请人分析、专利权人分析、失效分析、核心专利分析、法律分析、研发重点分析、企业专利处境分析、技术处境分析、专利寿命分析、企业定位分析、引证分析等超过60个分析角度,系统通过AI智能系统对图表进行解读,只需1分钟,一键生成行业专利分析报告。