专利汇可以提供AUTOMATIC BIOMETRIC IDENTIFICATION BASED ON FACE RECOGNITION AND SUPPORT VECTOR MACHINES专利检索,专利查询,专利分析的服务。并且,下面是AUTOMATIC BIOMETRIC IDENTIFICATION BASED ON FACE RECOGNITION AND SUPPORT VECTOR MACHINES专利的具体信息内容。
The present invention relates in general to automatic biometric identification based on face recognition and support vector machines.
Biometric identification may be performed via an automated system capable of capturing a biometric sample or evidence from a user, extracting biometric data from the sample, comparing the biometric data with that contained in one or more reference templates, deciding how well they match, and indicating whether or not an authentication of identity or identification has been achieved.
Biometric identification based on face recognition is particularly useful for security applications and human-machine interfaces, and support vector machines (SVMs) are a class of learning algorithms for classification/regression that are particularly useful for high dimensional input data with either large or small training sets. Support vector machines suitable for identification problems work by mapping the input features to the SVM into a high-dimensional feature space, and computing linear functions on those mapped features in the high-dimensional feature space.
SVMs are generally trained through supervised learning, in which the best function that relates the output data to the input data is computed, and the goodness of this function is judged by its ability to generalize on new inputs, i.e., inputs which are not present in the training set. For a detailed description of learning methods for SVMs, reference may be made to
Currently, several methods are known that propose the use of SVMs, alone or in combination with other recognition techniques, for face recognition and/or detection.
For example,
Another SVM-based face recognition system is proposed in
A further analysis of the use of SVMs in the context of face recognition is disclosed in
Yet,
Additionally, in
The Applicant has noted that in the field of biometric authentication based on facial recognition with m-class SVMs (that perform classification of data into more than two classes) a problem exits, namely, for each authorized user a huge number of user's face samples are required for the training of the SVMs so as to achieve a good level of recognition, i.e. a low error rate. This can lead to an enrollment process (i.e., a process of collecting biometric samples from a user and subsequently computing and storing a biometric reference template representing the user's identity) for each authorized user, taking a large amount of time and computational resources.
Generally, two approaches can be used for training m-class SVMs, the one-versus-all approach, and, respectively, the pair-wise approach.
Specifically, in the one-versus-all approach, SVMs are trained, each SVM separating a single class from all the remaining classes. As such, an SVM exists for each user in the authorized clients' database that recognizes/discriminates the user from any other user in the database.
In the pair-wise approach, m(m-1)/2 SVMs are trained, each separating a pair of classes. The SVMs are disposed in trees, where each tree node represents an SVM. In
Both solutions are supervised learning procedures that need both positive and negative training examples, i.e., samples of the face of the user to be recognized, and, respectively, samples of faces of people different than the user to be recognized, and the limit of these solutions is that for a reliable recognition (i.e., a low error rate), an enormous number of negative examples are required. In the best case in terms of computational speed, in the one-versus-all approach, the number of negative examples has to be at least equal to the number of entries in the database minus one, all multiplied by a constant (for example, the number of possible head poses). Likewise, the second approach may become computationally very slow if the users' database increases. Of course, the algorithms performance depends on the available computational power, but generally these approaches may not scale well, with an enrollment process that may take several days (reference may, for example, be made to
The object of the present invention is therefore to provide an automatic biometric identification method and system based on face recognition and support vector machines, which mitigate the afore-mentioned problems.
This object is achieved by the present invention in that it relates to an automatic biometric identification method and system based on face recognition and support vector machines, as claimed in claims 1 and 28, respectively, and to a computer program product, as claimed in claim 29.
In broad outline, the Applicant has found that the afore-mentioned problems can be solved by exploiting a one-class SVM (OC-SVM) for recognizing the face of an authorized user. One of the main advantages of the use of an OC-SVM lies in the fact that, for the training of the OC-SVM, only positive examples of the user are to be used, while the recognition of the authorized user is based only on the trained OC-SVM. In this way, a very fast and significantly less resource consuming face recognition procedure can be performed, maintaining a high level of recognition.
For a. better understanding of the present invention, a preferred embodiment, which is intended purely by way of example and is not to be construed as limiting, will now be described with reference to the attached drawings, wherein:
The following discussion is presented to enable a person skilled in the art to make and use the invention. Various modifications to the embodiments will be readily apparent to those skilled in the art, and the generic principles herein may be applied to other embodiments and applications without departing from the scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein and defined in the attached claims.
In order to facilitate understanding of the present invention, introduced hereinafter is some mathematical notation relating to SVMs.
SVMs were firstly developed by
Briefly, SVMs belong to the category of maximum-margin classifiers, and they naturally perform binary classification (i.e., they have two output classes), by finding, in the feature space of the SVM, a decision hypersurface (usually a hyperplane) that splits the positive examples from the negative examples, the split being such as to have the largest distances from the hypersurface to the nearest of the positive and negative examples, generally making the classification correct for testing data that is near, but not identical to the training data.
Focusing on classification, SVMs receives as input an independent and identically distributed (i.i.d.) training sample S = (x1,y1),(x2,y2), ..., (xn, yn), of size n from a fixed but unknown distribution Pr(x,y) describing the learning task, wherein xi are vectors representing the input data to be classified (the observations), while yi, typically in the set {-1, + 1} are the class labels.
In their basic form, SVMs learn binary, linear decision rules in the form:
The decision function, also known as hypothesis, is described by a weight vector w and a threshold b. According to which side of the hypersurface the input vector x lies on, it is classified into class +1 or -1. The idea of structural risk minimization is to find a hypothesis h for which the lowest error probability can be guaranteed. With SVMs, Vapnik showed that this goal can be translated into finding the hypersurface with largest margin for separable data. In other words, for separable training sets, SVMs find the hypersurface h, which separates the positive and negative training examples, marked with "+" and "-", respectively, in
Computing the hypersurface is equivalent to solving the following quadratic optimization problem in the Lagrangian representation (for more details reference may be made to
Support vectors are those training vectors xi corresponding to positive Lagrangian coefficients αi > 0. From the solution of this optimization problem the decision rule can be computed as:
The training example (xtsv, ytsv) for calculating b must be a support vector with αtsv < C.
For both solving the quadratic optimization problem as well as applying the learned decision rule, it is sufficient to be able to calculate inner products between observation vectors. Exploiting this property, the use of kernel functions, denoted by K(x1, x2), was introduced for learning non-linear decision rules. Such kernel functions calculate an inner product in some high-dimensional feature space and replace the inner product in the formulas above.
Popular kernel functions are linear, polynomial, radial basis function (RBF), and sigmoid:
Therefore, depending on the type of the kernel function, SVMs can be linear classifiers, polynomial classifiers, radial basis function (RBF) classifiers, or two-layer sigmoid neural networks.
In OC-SVMs the support vectors characterizing the user's class are obtained only from positive training examples. In particular, such support vectors define a hypersphere that encloses all possible representations of the user. All observations (input vectors) lying outside this hypersphere are considered impostors' representations.
The problem the OC-SVM has to solve is the estimation of a model function h(x, w) which gives a closed boundary around the target class data (reference may for example be made to
In particular, in
Basically, the OC-SVM minimizes the structural error defined as:
To allow the possibility of outliers in the training set, and therefore to make the method more robust, the distance from objects xi to the center a is not strictly smaller than R2, but larger distances are penalized. This means that the empirical error is not zero, so slack variables ξ, ξi > 0, Vi are introduced and the minimization problem becomes:
Parameter C gives the tradeoff between the volume of the description and the errors. By introducing Lagrange multipliers and constructing the Lagrangian, the minimization of this error is a well-known quadratic programming problem, for which standard algorithms exist (reference may again be made to the above-referenced One-class classification).
By solving this problem the support vectors are obtained (which practically represent the user reference template) together with the following expression for the center a of the hypersphere:
As such, new objects are accepted by the description if the distance from the objects to the center a of the hypersphere is lower than or equal to the radius R.
Difficulties with one-class classification are related to the training set construction, the SVM input data representation, and the SVM parameters tuning.
These aspects are strictly related together and are important for a good classification, and the Applicant proposes a particular training set construction method, several working optimized representations of the input vectors, and a method for automatically configuring the SVM parameters.
Specifically, the present invention relates to a biometric authentication system based on face recognition and comprising two sub-systems: an enrollment or training sub-system responsible for OC-SVM training, and a verification or authentication sub-system responsible for identity verification. Each sub-system comprises several modules, some of which are in common between the two sub-systems and are used during both enrollment and verification.
The enrollment sub-system, designated as a whole by 1, comprises:
In particular, the biometric sample image acquisition module 2 supports multiple inputs, like: a live captured video, a saved video or multiple images of the user's biometric sample (either live or saved images). The live video or images of the user's face can be captured by any video camera, like a common webcam, a digital PDA, a cellular camera, etc. Any software that controls the interface with the video camera and the video acquisition can be used, e.g., the Intel Open Source Image Processing and Computer Vision library OpenCV, OpenCV Reference Manual, (downloadable at http://www.sourceforge.net/projects/opencvlibrary at the filing date of the present patent application).
The face detection and extraction module 4 performs face detection on each acquired video frame. This phase is also necessary for rejecting inappropriate frames, i.e. frames that do not contain a face.
The algorithm used for face detection implements the machine learning approach for visual object detection described in
The face detection algorithm is structured in three fundamental image processing steps:
The output of the face detection and extraction module 4 is the image or image frames of variable size containing the user's face with little background.
The method is further optimized by means of two procedures described in
Selection of critical visual features performed in the second step of the face detection algorithm includes eye and mouth detection on the selected face images. In order to perform this task different techniques can be applied. For example, a template matching technique can be used, based on masks (one for each element to detect) sliding on the overall face image. The implemented technique converts the original face image from the RGB color space to the YCrCb color space. From the YCrCb image, two maps are computed: a map for the chrominance component and another for the luminance component. These components are then combined using an AND function. Eventually, the histogram is computed on the resulting image, the two peaks of this histogram representing the estimated eyes position. Subsequently, a mouth map is computed for mouth detection. The procedure is analogous to the described one, only the sliding mask and the map are different. The resulting histogram peak represents the estimated mouth position.
The images selection module 5 performs an image selection during both enrollment and verification procedures. In particular, based on the output of the face detection and extraction module 4 (all the images that contain a face), the appropriate number of face images are uniformly extracted from the input video sequence, such as no two similar images are chosen for training. The number of images that will be effectively used for enrollment is the dimension of the OC-SVM training set and represents a configurable parameter of the system. The appropriate value for this parameter may be obtained performing a rigorous test and tuning phase of the proposed face recognition software. During authentication, all images or image frames that contain a face are selected for identity verification.
The image scaling and normalization module 6 performs a number of operations in order to allow the proposed face recognition method to work with features extracted from black and white images. In particular, the image scaling normalization module 6 performs the following operations:
The feature extraction module 7 is responsible for the construction of the training set. The proposed face recognition method can work with different sets of features extracted from normalized images. Even if the use of four types of such feature sets will be described hereinafter, other sets can also be used, since the invention provides an automatic procedure for computing the necessary SVM parameters for each new type of features set and, for increased recognition performance, for each user. Fundamentally, the value of the parameter sigma of the RBF kernel function used by the OC-SVM is estimated during the enrollment session, taking into consideration the actual feature set used and becomes a part of the user's reference template, together with the dimensionality of the feature vectors, the radius of the hypersphere and other kernel parameters.
In the followings four possible approaches for feature extraction are illustrated.
A first approach is the Fourier-Mellin transform (FMT), which produces a translation, rotation and scale invariant Fourier-Mellin feature set. This transform is also used in image recognition for image registration, and hence compensation of possible translations, rotations and scale changes. In the present invention, for the FMT features set, the Fourier-Mellin transform is applied to the gray-level images.
A Cartesian to log-polar conversion (block 13) and another Fourier transform (block 14) are implemented, and finally the feature vector is obtained by concatenating the coefficients of the resulting Fourier-Mellin spectrum (block 15).
Another approach for feature extraction is the bidimensional Fourier transform of the gray-level facial images, which produces a Fourier feature set.
The feature vector is formed by the most significant frequencies, i.e. the low frequencies. The Applicant has experimentally determined that the lowest 27 up to 30 frequencies of the Fourier spectrum contain from 82% up to 90% of the energy of a facial image, hence the most information. The low frequencies also contain the distinguishing information (these are also the frequencies that vary the most from one user to another).
In a possible implementation of the present invention, the feature vector could contain the concatenation of the continuous component, the real part of the coefficients of the lowest 27 frequencies of the spectrum, and the imaginary part of the coefficients of the lowest 27 frequencies of the spectrum, thus forming a feature vector of 55 real values.
Alternative settings are possible, but more tests conducted by the Applicant has proved that further increasing the size of the lower quadrant does not significantly increase the amount of useful information, while the discrimination capability of the classifier decreases.
Other methods can be used for features extraction (like Hu moments, Zernike moments approaches described in
A further approach for feature extraction is represented by the use of local binary pattern (LBP) histograms extracted from gray-level facial images.
The
This particular feature extraction algorithm requires for a slightly different OC-SVM training. In fact, each region contributes to the construction of a feature vector. Hence, if 64 face regions are defined, 64 feature vectors are obtained, and an OC-SVM is trained for each region. During authentication, each region will produce a matching percentage, which will be weighted by the weight assigned to the region, and the final score will be the weighted sum of the matching percentage obtained by each feature vector, computed with the appropriate OC-SVM.
A further approach for feature extraction is to use directly the pixel intensity values from the normalized intensity images (gray-level features). The images are scaled down to a fixed sized by applying bilinear interpolation (e.g., 40x40 pixels if the original face images size is around 128x128). The resulting images are transformed to feature vectors (by concatenating the rows of the sampled image matrices), which are subsequently used for OC-SVM training (user enrollment) or testing (user authentication).
With reference to
The mathematics behind the OC-SVM is fundamentally similar to the previously described one (the paradigms are the Structural Risk Minimization and representation of the problem in a high dimensional features space through the use of an appropriate kernel function). Briefly, the OC-SVM computes or learns a function h which defines a hypersphere which encloses the positive examples/observations (representing the target class), while all other observations are not necessary for the complete definition of the hypersphere.
Therefore, the OC-SVM variables that need to be set are the kernel function, the value of the constant C, the training set size, and the support vectors' size. The output of the OC-SVM consists in the value of the parameter sigma, the number of support vectors, the support vectors themselves, the weights (or coefficients, also known as Lagrangian multipliers) of each support vector, and the threshold distance to be used during the authentication phase, which threshold distance is practically the radius of the hypersphere which encloses all positive examples.
The Applicant has found that the use of an RBF kernel function is particularly advantageous for face recognition based on OC-SVM because it outperforms both sigmoid and polynomial kernel functions. The OC-SVM with an RBF kernel represents a Gaussian radial basis function classifier (reference may for example be made to
Additionally, the Applicant has found that the value of the variance s of the kernel function may advantageously be set equal to the average Euclidean distance between the training vectors (or training feature sets) representing the user's class, i.e.:
Other settings are possible, but, with the Gaussian kernel, this is the configuration that gives a higher recognition performance, since it represents a good tradeoff between the false acceptance and false rejection error rates.
The value of the constant C is instead determined empirically for each type of feature vector and represents a tradeoff between the recognition and the error rates. Practically, C and s define the size and the shape of the hypersphere that encloses the user's class. With lower s, the region describing the user's class is tighter around the examples (it is called a banana-shaped region), but the false rejection rate can increase, while with a higher s
During testing, the Applicant has also noted that in real cases, the OC-SVM of the present invention is very strict/severe (the user class description is tighter than the spherically shaped region), and hence a need exists to adjust the radius R of the hypersphere on a per user basis to accommodate for minor changes in user's face images (pose, illumination), so as to avoid false rejections.
For this purpose, a new user training set containing new user's face images is provided, and a new value for the radius, hereinafter referred to as acceptance threshold thr, is computed according to the following formula, thus practically performing a client test of the computed OC-SVM:
Therefore, with OC-SVM, a client test is sufficient to find out the threshold value. An impostor test is also performed only as a confirmation of the correctness of the threshold setting. Hence, in a practical implementation, the acceptance threshold optimization could be performed during the enrollment session. The face image for the impostor test is fixed per gender, since for all impostors the OC-SVM exhibits approximately the same score. This tuning process is an automated process and helps to set the working point of the authentication system (i.e., the tradeoff between FAR and FRR). This procedure also helps to determine the quality of the user's reference template and the classifier's discrimination capacity using this template. A feedback is offered to the user and sometimes, if the recognition scores are unsatisfactory, the enrollment procedure is repeated.
For the LBP feature sets, the training procedure is slightly different: more OC-SVMs are trained per user, each one corresponding to a single region of the image. The actual training procedure is analogous with the procedure described previously, but more templates are produced, one template for set of equally weighted regions. These templates are stored together with the corresponding weights, and they represent the user's global reference template.
Finally,
In particular, the verification sub-system, designated as a whole by 20, includes:
In particular, after the appropriate feature vector extraction phase described in detail previously, a classification procedure based on the OC-SVM trained for the user is applied to each feature vector extracted from a test image in a test set, and, in the end, the authentication decision is taken by implementing a decision fusion scheme. In the simplest case, the fusion rule is majority voting.
In more detail, the verification consists in computing the distance of each feature vector in the test set with respect to the center a of the hypersphere obtained during enrollment, dividing this distance by the radius R, and comparing this value with the user's acceptance threshold thr. A features vector is accepted as representing the user if the distance is lower than the threshold, i.e.:
In a simple majority voting scheme, the frequency of the positive classifications is measured and the user is declared authenticated if this frequency is greater than 0.5, otherwise the user is declared an impostor. Other fusion rules can be implemented if required.
For the LBP features set, each region of the user's face obtains a distance, which is matched against the region's SVM acceptance threshold. The percentage of the correct answers is displayed by each SVM. The final authentication decisions is taken by applying a weighted sum on these percentages, where regions' weights multiply the percentages obtained by the regions (the eye region and the mouth regions are assigned major weights).
The advantages of the present invention are clear from the foregoing. In particular, not only the enrollment, but also the verification procedure is significantly simplified by using an OC-SVM, since only one template is used for user authentication. Consequently, the procedure is faster and its duration does not depend on the size of the users' database. For increased reliability of the authentication decision, at least 50 images of the user's face selected from the input video sequence (and hence 50 features vectors) may be used, which is also the size of the training set. Using all the images of the test video yield similar performance, but the authentication procedure becomes slower.
In conclusion, since only the user's class (or the target class) representations are necessary for enrollment, the present invention is significantly less resource consuming than traditional SVM-based approaches. As a consequence, it is faster and highly portable on systems with limited computational power, like embedded systems (e.g., handheld devices, cellular phones, smart phones, etc.). Moreover, the present invention is designed with a modular approach, where a number of common modules are implemented (biometric sample acquisition module, the image processing module, the enrollment or training module, the verification module). The modularity of the solution permits a high degree of distribution of the computational tasks, a fundamental feature for client/server architectures.
标题 | 发布/更新时间 | 阅读量 |
---|---|---|
一种基于数字信息传输的可升降式面部识别装置 | 2020-05-08 | 101 |
一种客车油箱防盗方法及系统 | 2020-05-08 | 310 |
一种人脸识别方法及人脸识别设备 | 2020-05-08 | 723 |
一种美容仪 | 2020-05-08 | 991 |
一种人脸识别方法及装置 | 2020-05-08 | 314 |
一种可检测人体距离的智能门锁 | 2020-05-11 | 683 |
一种儿童人脸识别考勤设备 | 2020-05-11 | 510 |
一种用于自动人脸识别的智能远程招聘广告机 | 2020-05-08 | 9 |
一种学生宿舍用门禁装置 | 2020-05-11 | 906 |
一种社区公厕的识别警示装置 | 2020-05-08 | 306 |
高效检索全球专利专利汇是专利免费检索,专利查询,专利分析-国家发明专利查询检索分析平台,是提供专利分析,专利查询,专利检索等数据服务功能的知识产权数据服务商。
我们的产品包含105个国家的1.26亿组数据,免费查、免费专利分析。
专利汇分析报告产品可以对行业情报数据进行梳理分析,涉及维度包括行业专利基本状况分析、地域分析、技术分析、发明人分析、申请人分析、专利权人分析、失效分析、核心专利分析、法律分析、研发重点分析、企业专利处境分析、技术处境分析、专利寿命分析、企业定位分析、引证分析等超过60个分析角度,系统通过AI智能系统对图表进行解读,只需1分钟,一键生成行业专利分析报告。