序号 专利名 申请号 申请日 公开(公告)号 公开(公告)日 发明人
1 基于潜在语义索引的上网行为分析方法、系统和介质 CN202011571954.X 2020-12-27 CN112686050B 2023-12-05 张强; 喻波; 王志海; 魏力; 谢福进
发明提供一种基于潜在语义索引的上网行为分析方法、系统和介质。所述方法包括:步骤S1、基于用户历史上网日志来确定所述用户的上网行为的潜在语义特征矩阵;步骤S2、利用所述潜在语义特征矩阵来计算所述用户的上网行为的行为链;以及步骤S3、根据所述行为链构建LSTM深度神经网络模型,以检测所述用户的异常上网行为。该方法能够有效对上网日志数据进行分析,对网址的特征进行相关性计算,刻画用户上网行为画像,构建上网异常行为链,并采用机器学习深度挖掘潜在特征,识别出数据中的异常行为,并能够不断迭代、优化和持续改进,从而实现了及时应急响应和处置。
2 基于潜在语义索引的上网行为分析方法、系统和介质 CN202011571954.X 2020-12-27 CN112686050A 2021-04-20 张强; 喻波; 王志海; 魏力; 谢福进
发明提供一种基于潜在语义索引的上网行为分析方法、系统和介质。所述方法包括:步骤S1、基于用户历史上网日志来确定所述用户的上网行为的潜在语义特征矩阵;步骤S2、利用所述潜在语义特征矩阵来计算所述用户的上网行为的行为链;以及步骤S3、根据所述行为链构建LSTM深度神经网络模型,以检测所述用户的异常上网行为。该方法能够有效对上网日志数据进行分析,对网址的特征进行相关性计算,刻画用户上网行为画像,构建上网异常行为链,并采用机器学习深度挖掘潜在特征,识别出数据中的异常行为,并能够不断迭代、优化和持续改进,从而实现了及时应急响应和处置。
3 一种基于潜在语义索引的垃圾邮件判断、分类、过滤方法及系统 CN200810044485.9 2008-05-30 CN101594313A 2009-12-02 程红蓉; 何兴高; 曾志华; 周俊怡; 刘伟; 党建军
发明涉及文本处理技术领域,尤其是一种垃圾邮件的判断、分类、过滤方法及系统。此系统分成两个子系统,包括:潜在语义空间生成子系统和邮件判断、分类、过滤子系统。其中潜在语义空间子系统包括:中文和英文分词模;词语文档矩阵生成模块、权重计算模块、词语文档矩阵奇异值分解模块、语义空间更新模块。邮件判断、分类、过滤子系统的包括:待判断的邮件中文和英文分词模块、将文本邮件映射到潜在语义空间、语义空间中文档向量之间的相似度计算、根据相似度判断、分类、过滤邮件。使用本发明提供的实施例可以对垃圾邮件进行判断,从而过滤垃圾邮件,并且能够快速、高效的过滤垃圾邮件。
4 基于潜在语义索引的作文跑题检测评分系统及方法 CN202310674533.7 2023-06-08 CN116719902A 2023-09-08 何经武; 曾凡
发明涉及自然语言处理技术领域,具体为基于潜在语义索引的作文跑题检测评分系统及方法,数据收集和预处理模用于收集作文和其对应的题目,并对采集的文章进行预处理;术语矩阵模块用于建立术语‑文档矩阵,对结果进行归一化;潜在语义索引模块用于根据潜在语义索引来对矩阵进行奇异值分解,并计算文章与范文之间的相似度;跑题检测模块用于使用TF‑IDF*算法和潜在语义索引模块的相似度结果,综合对文章是否跑题进行检测;评分模块用于根据检测结果和预设的评分标准进行对比,给出作文的分数;输出模块用于向用户输出文章跑题检测结果和文章的评分结果,本发明使用TF‑IDF*算法提高文章跑题检测的准确性。
5 Perturbing latent semantic indexing spaces US11393883 2006-03-31 US07580910B2 2009-08-25 Robert Jenson Price
A text processing method is provided that includes the following steps. First, an abstract mathematical vector space is generated based on a collection of documents. Respective documents in the collection of documents have a representation in the abstract mathematical vector space and respective terms contained in the collection of documents have a representation in the abstract mathematical vector space. Then, the abstract mathematical vector space is perturbed to produce a perturbed abstract mathematical vector space that is stored in an electronic format accessible to a user. Perturbing the abstract mathematical vector space may include modifying the representation of a document with a newly computed representation for that document, or modifying the representation of a term with a newly computed representation for that term.
6 Regularized latent semantic indexing for topic modeling US13169808 2011-06-27 US08533195B2 2013-09-10 Jun Xu; Hang Li; Nicholas Craswell
Electronic documents are retrieved from a database and/or from a network of servers. The documents are topic modeled in accordance with a Regularized Latent Semantic Indexing approach. The Regularized Latent Semantic Indexing approach may allow an equation involving an approximation of a term-document matrix to be solved in parallel by multiple calculating units. The equation may include terms that are regularized via either l1 norm and/or via l2 norm. The Regularized Latent Semantic Indexing approach may be applied to a set, or a fixed number, of documents such that the set of documents is topic modeled. Alternatively, the Regularized Latent Semantic Indexing approach may be applied to a variable number of documents such that, over time, the variable of number of documents is topic modeled.
7 Regularized Latent Semantic Indexing for Topic Modeling US13169808 2011-06-27 US20120330958A1 2012-12-27 Jun Xu; Hang Li; Nicholas Craswell
Electronic documents are retrieved from a database and/or from a network of servers. The documents are topic modeled in accordance with a Regularized Latent Semantic Indexing approach. The Regularized Latent Semantic Indexing approach may allow an equation involving an approximation of a term-document matrix to be solved in parallel by multiple calculating units. The equation may include terms that are regularized via either l1 norm and/or via l2 norm. The Regularized Latent Semantic Indexing approach may be applied to a set, or a fixed number, of documents such that the set of documents is topic modeled. Alternatively, the Regularized Latent Semantic Indexing approach may be applied to a variable number of documents such that, over time, the variable of number of documents is topic modeled.
8 Selective latent semantic indexing method for information retrieval applications US11505654 2006-08-17 US07630992B2 2009-12-08 Jacob Gilmore Martin; Earl Rodney Canfield
A term-by-document (or part-by-collection) matrix can be used to index documents (or collections) for information retrieval applications. Reducing the rank of the indexing matrix can further reduce the complexity of information retrieval. A method for index matrix rank reduction can involve computing a singular value decomposition and then retaining singular values based on the singular values corresponding to singular values of multiple topics. The expected singular values corresponding to a topic can be determined using the roots of a specially formed characteristic polynomial. The coefficients of the special characteristic polynomial can be based on computing the determinants of a Gram matrix of term (or part) probabilities, a method of recursion, or a method of recursion further weighted by the probability of document (or collection) lengths.
9 Selective Latent Semantic Indexing Method for Information Retrieval Applications US11505654 2006-08-17 US20070233669A2 2007-10-04 Jacob Martin; Earl Canfield
A term-by-document (or part-by-collection) matrix can be used to index documents (or collections) for information retrieval applications. Reducing the rank of the indexing matrix can further reduce the complexity of information retrieval. A method for index matrix rank reduction can involve computing a singular value decomposition and then retaining singular values based on the singular values corresponding to singular values of multiple topics. The expected singular values corresponding to a topic can be determined using the roots of a specially formed characteristic polynomial. The coefficients of the special characteristic polynomial can be based on computing the determinants of a Gram matrix of term (or part) probabilities, a method of recursion, or a method of recursion further weighted by the probability of document (or collection) lengths.
10 Automatic recommendation of products using latent semantic indexing of content US09653917 2000-09-01 US06615208B1 2003-09-02 Clifford A. Behrens; Dennis E. Egan; Yu-Yun Ho; Carol Lochbaum; Mark Rosenstein
Techniques for using latent semantic structure of textual content ascribed to the items to provide automatic recommendations to the user. A user inputs a selected item and, in turn, a latent semantic algorithm is applied to the user selection and the textual content of the items in a database to generate a conceptual similarity between the selection and the items. A set of nearest items to the selected item is provided as a recommendation to the user of other items that may be of particular interest or relevance to the user's original selection based upon the conceptual similarity measure.
11 Automatic recommendation of products using latent semantic indexing of content US10600669 2003-06-20 US20040039657A1 2004-02-26 Clifford A. Behrens; Dennis E. Egan; Yu-Yun Ho; Carol Lochbaum; Mark Rosenstein
Techniques for using latent semantic structure of textual content ascribed to the items to provide automatic recommendations to the user. A user inputs a selected item and, in turn, a latent semantic algorithm is applied to the user selection and the textual content of the items in a database to generate a conceptual similarity between the selection and the items. A set of nearest items to the selected item is provided as a recommendation to the user of other items that may be of particular interest or relevance to the user's original selection based upon the conceptual similarity measure.
12 Selective latent semantic indexing method for information retrieval applications US11505654 2006-08-17 US20070124299A1 2007-05-31 Jacob Martin; Earl Canfield
A term-by-document (or part-by-collection) matrix can be used to index documents (or collections) for information retrieval applications. Reducing the rank of the indexing matrix can further reduce the complexity of information retrieval. A method for index matrix rank reduction can involve computing a singular value decomposition and then retaining singular values based on the singular values corresponding to singular values of multiple topics. The expected singular values corresponding to a topic can be determined using the roots of a specially formed characteristic polynomial. The coefficients of the special characteristic polynomial can be based on computing the determinants of a Gram matrix of term (or part) probabilities, a method of recursion, or a method of recursion further weighted by the probability of document (or collection) lengths.
13 Information retrieval and text mining using distributed latent semantic indexing US10427595 2003-05-01 US20040220944A1 2004-11-04 Clifford A. Behrens; Devasis Bassu
The use of latent semantic indexing (LSI) for information retrieval and text mining operations is adapted to work on large heterogeneous data sets by first partitioning the data set into a number of smaller partitions having similar concept domains A similarity graph network is generated in order to expose links between concept domains which are then exploited in determing which domains to query as well as in expanding the query vector. LSI is performed on those partitioned data sets most likely to contain information related to the user query or text mining operation. In this manner LSI can be applied to datasets that heretofore presented scalability problems. Additionally, the computation of the singular value decomposition of the term-by-document matrix can be accomplished at various distributed computers increasing the robustness of the retrieval and text mining system while decreasing search times.
14 System and method for hierarchical segmentation with latent semantic indexing in scale space US10034523 2001-12-28 US20040205461A1 2004-10-14 James H. Kaufman; Dulce Beatriz Ponceleon; Malcolm Slaney
A system and method for automatically generating a hierarchical table of contents or outline for indexing a document and identifying clusters of related information in the document. The document may comprise text, audio, video, or a multimedia presentation. The invention employs a unique and novel combination of latent semantic indexing techniques to identify related blocks and major topic changes within the document with scale space segmentation techniques to respectively identify self-similar blocks within the document and to thus find topic changes of various sizes at block edges. The invention then produces a visual presentation of the semantic structure of the document.
15 System and method for hierarchical segmentation with latent semantic indexing in scale space US10034523 2001-12-28 US07137062B2 2006-11-14 James H. Kaufman; Dulce Beatriz Ponceleon; Malcolm Slaney
A system and method for automatically generating a hierarchical table of contents or outline for indexing a document and identifying clusters of related information in the document. The document may comprise text, audio, video, or a multimedia presentation. The invention employs a unique and novel combination of latent semantic indexing techniques to identify related blocks and major topic changes within the document with scale space segmentation techniques to respectively identify self-similar blocks within the document and to thus find topic changes of various sizes at block edges. The invention then produces a visual presentation of the semantic structure of the document.
16 Information retrieval and text mining using distributed latent semantic indexing US10427595 2003-05-01 US07152065B2 2006-12-19 Clifford A. Behrens; Devasis Bassu
The use of latent semantic indexing (LSI) for information retrieval and text mining operations is adapted to work on large heterogeneous data sets by first partitioning the data set into a number of smaller partitions having similar concept domains. A similarity graph network is generated in order to expose links between concept domains which are then exploited in determing which domains to query as well as in expanding the query vector. LSI is performed on those partitioned data sets most likely to contain information related to the user query or text mining operation. In this manner LSI can be applied to datasets that heretofore presented scalability problems. Additionally, the computation of the singular value decomposition of the term-by-document matrix can be accomplished at various distributed computers increasing the robustness of the retrieval and text mining system while decreasing search times.
17 Computerized cross-language document retrieval using latent semantic indexing US734291 1991-07-17 US5301109A 1994-04-05 Thomas K. Landauer; Michael L. Littman
A methodology for retrieving textual data objects in a multiplicity of languages is disclosed. The data objects are treated in the statistical domain by presuming that there is an underlying, latent semantic structure in the usage of words in each language under consideration. Estimates to this latent structure are utilized to represent and retrieve objects. A user query is recouched in the new statistical domain and then processed in the computer system to extract the underlying meaning to respond to the query.
18 视频节目的搜索方法和装置 CN201611028444.1 2016-11-18 CN106570196B 2020-06-05 李贤
发明公开了一种视频节目的搜索方法,其特征在于,包括:接收用户输入的描述视频节目的描述词条;根据预设的潜在语义索引模型的索引矩阵的构建方式,构建所述描述词条的查询向量;根据所述潜在语义索引模型,计算所述索引矩阵的每一列向量与所述查询向量的余弦相似度;对计算获得的余弦相似度进行从大到小的排序,并选取排序号属于排序区间的余弦相似度的列向量对应的视频节目提供给所述用户。相应地,本发明还公开了一种视频节目的搜索装置。采用本发明实施例,能挖掘出文档的潜在语义,提高搜索视频节目的准确度。
19 视频节目的搜索方法和装置 CN201611028444.1 2016-11-18 CN106570196A 2017-04-19 李贤
发明公开了一种视频节目的搜索方法,其特征在于,包括:接收用户输入的描述视频节目的描述词条;根据预设的潜在语义索引模型的索引矩阵的构建方式,构建所述描述词条的查询向量;根据所述潜在语义索引模型,计算所述索引矩阵的每一列向量与所述查询向量的余弦相似度;对计算获得的余弦相似度进行从大到小的排序,并选取排序号属于排序区间的余弦相似度的列向量对应的视频节目提供给所述用户。相应地,本发明还公开了一种视频节目的搜索装置。采用本发明实施例,能挖掘出文档的潜在语义,提高搜索视频节目的准确度。
20 视频节目的搜索方法和装置 PCT/CN2016/113642 2016-12-30 WO2018090468A1 2018-05-24 李贤

发明公开了一种视频节目的搜索方法,包括:接收用户输入的描述视频节目的描述词条和所述视频节目所属的视频类别;选取与所述视频类别相对应的潜在语义索引模型,并根据所述语义索引模型的索引矩阵的构建方式,构建所述描述词条的查询向量;根据所述潜在语义索引模型,计算所述索引矩阵的每一列向量与所述查询向量的余弦相似度;对计算获得的余弦相似度进行从大到小的排序,并选取排序号属于排序区间的余弦相似度的列向量对应的视频节目提供给所述用户。相应地,本发明还公开了一种视频节目的搜索装置。采用本发明实施例,能挖掘出文档的潜在语义,提高搜索视频节目的准确度和搜索效率。

QQ群二维码
意见反馈