专利汇可以提供A method of creating 3-D facial models starting from face images专利检索,专利查询,专利分析的服务。并且The method allows the creation of 3-D facial models, which can be used, for instance, for the avatar implementation, video-communication applications, video games, video productions, and for the creation of advanced man-machine interfaces. At least one image of a human face is provided together with a 3D facial model (M) having a vertex structure and comprising a number of surfaces chosen within the set formed by a face surface (V), surfaces of the right eye (OD) and left eye (OS), respectively, and surfaces of the upper teeth (DS) and lower teeth (DI), respectively. Among the vertices of the structure of the model (M) and on such at least one face image, respective sets of homologous points are chosen. The model structure (M) is then modified in such a way that the above respective sets of homologous points are made to coincide.,下面是A method of creating 3-D facial models starting from face images专利的具体信息内容。
This invention concerns the technique for the creation of 3-D facial models, which can be used for instance for the implementation of so-called avatars (anthropomorphous models) to be used in virtual environments, video-communication applications, video games, TV productions, and creation of advanced man-machine interfaces.
There are already some known technical solutions for the creation of a 3D model starting from the photograph of a person's face.
On this subject matter, reference can be made for instance to the product Character Creator of company Darwin 3D (see Internet site http:// www.darwin3d.com) as well as to the product Avatar Maker of company Sven Technologies (see Internet site http://www.sven-tec.com). The product "Character Creator" is based on the choice of a basic model resembling the photographed person. The face of the photograph is framed by an ellipse and the program uses what lies within the ellipse as a texture of the model. In the product "Avatar Maker" a dozen of points are marked on the face, and a basic model is then chosen to which the photograph texture is associated.
The main drawback of such known embodiments is that the structure of the generated model does not allow a subsequent animation. This is due to the fact that the model (usually generated as a "wire frame" model, i. e. starting from a mesh structure, as will also be seen in the sequel), cannot exactly fit the profile in the mouth region, thus preventing reproduction of lip movements. This also applies to other significant parts of the face, such as eyes and nose.
This invention aims at providing a method which allows the creation of facial models that can appear realistic both in static conditions and in animation conditions, in particular for instance as far as the opening and closing of eyelids and the possibility of simulating eye rotation are concerned.
According to the invention, this aim is attained through a method having the characteristics specifically mentioned in the appended claims.
Substantially the method according to the invention is based on the adaptation of a basic model of a face - typically a human face - having the physiognomy characteristics of the photographed person. The basic model (or "template") is represented by a structure, preferably of the type called "wire frame", formed by a plurality of surfaces chosen out of a set of five surfaces, namely:
The eye surfaces are separated from those of the face so as to allow, among other things, creation of opening and closing movements of eyelids, and a slight translation simulating the actual eye rotation. Similarly, it is possible to perform the animation of the model, as far as the speech is concerned, through the animation of the surfaces representing the upper and lower teeth.
The invention will be now described by way of a non-limiting example, with reference to the drawings attached hereto, in which:
Figures 1 and 2 show a basic model M of human face, which can be used in a possible embodiment of the invention. Model M is here represented both in the wire frame mode and in the solid mode. The latter differs from the wire frame essentially by the background painting of the triangles of the wire frame. The model M here represented is formed by five surfaces, namely:
It will be appreciated in particular that model M is a hollow structure, which may practically be assimilated to a sort of mask, the shape of which is designed to reproduce the features of the modelled face. Of course, though corresponding to an embodiment of the invention being preferred at present, the number of vertices and triangles to which reference has been previously made has a merely exemplary character and must in no case be regarded as a limitation case of the scope of the invention.
These considerations also apply to the choice of using five different surfaces to implement the basic model. As a matter of fact, the number of such surfaces might be smaller (for the implementation of simpler models) or larger (for the implementation of more detailed and sophisticated models), depending on the application requirements. The important feature is the choice of using, as the basic model, a model comprising a plurality of surfaces and in particular surfaces that, depending on the type of face to be modelled (for instance a human face), correspond to shapes which are substantially known in general terms and have a relative arrangement, which as a whole, also is already known.
As a matter of fact, although the typology of the human face is practically infinite, it is known that the surface of the face has a general bowl-like look, that the eyelids have generally just a "eyelid" surface, which is at least marginally convex, that the dental arches have an arc shape, etc. It is then known that the eyelids are located in the medium-upper region of the face surface, whereas the teeth surfaces are located in the lower region.
Furthermore, the fact of using distinct surfaces for the creation of the model allows applying to the model separation conditions, as those which make it possible to avoid, for instance, the interference of the teeth surfaces, so as to accurately model the congruency effect of the dental arches.
This characteristic might be even better appreciated in the rear views of figure 2.
The method according to the invention is substantially based on the solution of:
For this adaptation, use is made of respective sets of points which have been chosen in correspondence with as many so called "feature points": such points are defined in the section "Face and body animation" of the ISO/IEC standard 14496-2 (MPEG-4) and are represented in figures 3A to 3H.
In particular, in an embodiment of the invention being preferred at present, the method according to the invention is implemented by using the feature points identified in the MPEG-4 standard (as defined at the filing date of this invention) by the following indexes: 11.4, 2.1, 10.9, 10.10, 8.4, 8.1, 8.3, 8.2, 2.2, 2.3, 9.3, 9.2, 9.1, 4.1, 3.12, 3.8, 3.10, 3.14, 3.11, 3.13, 3.7, and 3.9. Each of such indexes corresponds with a vertex of the model structure.
Figure 4 synthesises the method according to the invention, so as this can be performed through the system shown in figure 8.
Such a system, denoted by 1 as a whole, includes a pick-up unit 2, for instance a digital camera or a functionally equivalent unit, such as a conventional camera capable of producing photographs which, after development and print, may be subjected to a scanning process. Starting from a subject, unit 2 can therefore generate a plane image I of the face to be modelled: this image is in practice an image of the type shown in figure 7A.
The image I so obtained is in the form of a digitised image, i.e. if a sequence of data that represent pixel by pixel the information (brightness, chromatic characteristics, etc.) relating to the same image.
Such a sequence of data is provided to a processing system 3 (essentially a computer) which performs - according to principles well known to a specialist, once the criteria of the embodiment of the invention described in detail in the following have been set forth - the operations listed below:
The operation of adaptation of the starting model M, previously described, to image I is based on a virtual optical projection of model M and image I, respectively, performed in a system the focus of which lies in the origin O of a three-dimensional Cartesian space x, y, z in which model M is placed in the positive half space along the Z axis and image I is placed in the negative half-space (see the diagram of Figure 4).
It will be appreciated that the fine adaptation of model M to image I is based on the assumption that model M is on the whole oriented, with regard to the plane XY of the above-described system, in a generally mirror-like position with regard to image I. Hence, model M is placed with a front orientation, if one requires adaptation to a front image I. On the contrary model M will be for instance laterally oriented, if it is required to achieve adaptation to a side image of the head of the person represented in image I.
This also substantially applies to the distance a between origin O and the centre of model M and distance λ between origin O and the plane of image I. To simplify the calibration process and avoid the introduction of unknown values by the user, at least distance α is set to an arbitrary value (for instance 170 cm), determined in advance by calculating the average of a set of possible cases. It must be still considered that value α depends not only on the distance of the subject from camera 2 at the time when image I was taken, but also on the parameters of the same camera.
Substantially, the method according to the invention consists of a series of geometrical transformations aimed at making the projection of the set of feature points of the model M of interest coincide with the homologous set of homologous points identified on image I.
Let then (xi.j, yi.j, zi.j) be the space co-ordinates of the vertex of model M associated to feature point ij (for instance, the left end of the face) and (Xi.j, Yi.j) be the co-ordinates in image I of the same feature point (referred to a local system on the plane of image I, with the origin coinciding with the upper angle of the image, in a possible embodiment).
After starting the process (step 100 in the flow chart of Figure 9), the first operational step (101 in Figure 9) is the computation of value λ.
Let X0 , Y0 be the co-ordinates of the centre of the face taken in image I. These co-ordinates are obtained by exploiting the four points placed at the end of the face (for instance, with reference to the present release of MPEG-4 standard, points 10.9 and 10.10: right end and left end of the face, and 11.4, 2.1: top of head and tip of chin). The following relation will then apply:
Distance λ is computed in such a way as to make the width of the projection of the model coincide with the width of the face in the photograph, according to the following relation:
Subsequently (step 102) the position of model M along the Y axis is modified so that its projection is vertically in register with the contents of image I. A value Δy, computed according to relation:
In this way the model is scaled vertically. After this operation, the size of its projection coincides with the area of the head reproduced in image I.
In a subsequent step 103, each co-ordinate Y of the vertices of model M is multiplied by a coefficient c computed as follows:
At this point (step 104) a global transformation is performed in the vertical direction on the model in order to make the position of some characteristic features of the face (for instance, the eyebrows) coincide with those of the person. The model is substantially altered along the Y axis, as shown in Figure 5.
Preferably, the global transformation is a non-linear transformation, preferably of second order, and most preferably it is based on a parabolic law, in particular of the type corresponding to a generic parabola (
In particular in Figure 5, the model shown in a recumbent position, so in a horizontal direction, corresponds to the model before the transformation according to the parabolic function previously described, whereas the model shown in a vertical position is the result of said transformation.
Thereafter (step 105, with an essentially cyclic structure, defined by a choice step 106, that finds out whether the sequence can be considered as complete) a series of transformations (translations, scalings and affine transforms) designed to correctly position the individual features characteristic of the face is performed. Preferably the operations involved are the following:
Preferably the adopted affine transforms correspond to a transform that may be set out according to a relation of the type:
The described formulas express a planar transformation driven by the displacement of three points:
As the last operations concerning the geometry of the model, two wire frames representing the eyes (sclera and iris) are positioned behind the eyelids, so as to allow their closing and to leave sufficient room for a displacement simulating the movements of the eyes (step 107). Standard teeth which do not interfere with the movements of the mouth (108) are then added to the model.
The sequence shown in Figures 6A-6C represents the evolution of model M (here represented according to the wire frame mode, to better highlight the variations) with reference to the front appearance of the basic model (Figure 6A), after the affine transforms (Figure 6B) and after completion with eyes and teeth (Figure 6C).
At this point the application of the texture to the model is performed (step 109) by associating to each vertex a bi-dimensional co-ordinate that binds it to a specific point of image I, according to a process known as "texture binding". The data relating to the texture binding are computed by simply exploiting projections parameters α and λ, defined at the start of the calibration described at the beginning of this description. Teeth have a standard texture, defined in advance.
In the case in which the model is created starting from several images, a further step is performed concerning the generation of the texture. Such step however is not specifically represented in the flow chart of Figure 9. As a matter of fact, the image containing the model texture is created by joining the information associated to the various points of sight.
Preferably, in order to better exploit the resolution of the image designed to contain the texture, the shape of the texture of all the triangles of the model is transformed into a right triangle of a constant size. The triangles so obtained are then coupled two by two in order to obtain a rectangular shape. The rectangles are then placed into the image according to a matrix arrangement so as to cover its surface. The size of the rectangles is a function of the number of triangles of the model and of the size of the image that stores the texture of the model.
Figure 10 shows an example of image containing the texture of the various triangles. Each rectangle (the polygons shown are not squares, and are formed by
Figure 11 illustrates a detail of the previous Figure 10, showing the actual area of the texture used by two triangles inside the rectangle 300. For each rectangle of size
It is worth noting that this process for texture generation is not specific for the models of human face, but can be applied in all the cases of creation of a 3-D model starting from several images.
The model obtained in this way may be then represented by using different common graphic formats (among which, in addition to the MPEG-4 standard previously cited, the standards VRML 2.0 and OpenInventor). All the models can be animated so as to reproduce the lip movements and the countenances. In the case in which several images of the person, taken from different points of sight, are available, it is possible to apply the method described to the different images so as to enhance the look of the model. The resulting model is obviously oriented according to the orientation of the image.
It is evident that, while keeping unchanged the invention principles set forth herein, the details of implementation and the embodiments can be varied considerably with regard to what has been described and illustrated, without departing from the scope of this invention, as will be defined in the following claims.
标题 | 发布/更新时间 | 阅读量 |
---|---|---|
一种三维服装体感支撑技术及方法 | 2020-05-11 | 331 |
用于显示交互式增强现实展示的系统、方法和介质 | 2020-05-12 | 975 |
互动响应方法及其相关计算机系统 | 2020-05-08 | 47 |
多模态对话代理 | 2020-05-08 | 576 |
化身建立方法及化身建立装置 | 2020-05-08 | 580 |
糖尿病治疗训练设备 | 2020-05-11 | 250 |
用于专业教学的增强现实系统以及方法 | 2020-05-12 | 411 |
定制的上下文媒体内容项生成 | 2020-05-12 | 886 |
在虚拟现实中防止监视和保护隐私的系统和方法 | 2020-05-11 | 198 |
多用途儿童成长型家具 | 2020-05-12 | 684 |
高效检索全球专利专利汇是专利免费检索,专利查询,专利分析-国家发明专利查询检索分析平台,是提供专利分析,专利查询,专利检索等数据服务功能的知识产权数据服务商。
我们的产品包含105个国家的1.26亿组数据,免费查、免费专利分析。
专利汇分析报告产品可以对行业情报数据进行梳理分析,涉及维度包括行业专利基本状况分析、地域分析、技术分析、发明人分析、申请人分析、专利权人分析、失效分析、核心专利分析、法律分析、研发重点分析、企业专利处境分析、技术处境分析、专利寿命分析、企业定位分析、引证分析等超过60个分析角度,系统通过AI智能系统对图表进行解读,只需1分钟,一键生成行业专利分析报告。