专利汇可以提供Method using a programmed digital computer system for translation between natural languages专利检索,专利查询,专利分析的服务。并且A Computerized translation method with universal application to all natural languages is provided. With this method, parameters are changed only when source or target languages are changed. The computerized method can be regarded as a self-contained system, having been developed to accept input texts in the source language, and look up individual (or sequences of)textwords in various dictionaries. On the basis of the dictionary information, sequences of operations are carried out which gradually generate the multiplicity of computer codes needed to express all the syntactic and semantic functions of the words in the sentence. On the basis of all the codes and target meanings in the dictionary, plus synthesis codes of such meanings, translation is carried out automatically. Procedures which generate and easily update main dictionaries, idiom dictionaries, high frequency dictionaries and compound dictionaries are integral parts of the system.,下面是Method using a programmed digital computer system for translation between natural languages专利的具体信息内容。
This invention relates to a method utilizing a digital computer for translating between natural languages.
Attempts have been made to utilize digital computers for translating from one language to another, i.e., from a source language to a target language. The translation systems involve a programmable digital computer system along with a program for effecting the translation. The approaches used were theoretical. The theoretical language approach for syntactical analysis has not been acceptable because it starts out from linguistic assumptions instead of considering the capabilities of the programmable digital computer and approaching the translation from the computers point.of view.
The idea of machine translation was conceived in 1946 by Warren Weaver and A.D.Booth. Many attempts to achieve a machine translation system and put it into operation have been made. The projects were directed toward developing linguistic theories encompassing the whole natural language and then going to the computer. This approach inevitably failed because the human mind cannot encompass the totality of the laguage.
The following is a brief resume in the approaches and the theory behind them:
The Fulcrum theory approach developed from 1959 to 1967 by the Bunker-Ramo Corporation, USA, was directed toward solving, with a relatively small dictionary, the problems occuring in a limited Russian text. No attempt was made to introduce resolution of multiple meanings; instead, several meanings were printed in the output, separated by slashes.
A predictive syntax system was developed by the National Bureau of Standards and Massachusetts Institute of Technology in 1960 to 1964. This approach failed because it considered only one limited path to the sentence. This system was never implemented on a larger scale, but was used just within a limited experimental environment.
Transformational grammar was another approach. However,this approach turned out to be absolutely incompatible with computer translation requirements. Only small experimental systems have been developed on the basis of this theory, . and they had to be discontinued before any significant translation was produced.
As compared to the prior art it is an object of the invention to enable accurate, fast, almost instantaneous translation of large volumes of texts from one source language i.e. natural language into another natural language, the so-called target language.
Instead of a natural language an artificial language as par example Esperanto can act as source or target language as the case may be.
To achieve this purpose the subject matter of the present invention is a method using a programmable digital computer system, the steps comprising
Advantageously the method comprises a step wherein a special subroutine of the dictionary lookup process searches a list of words (lexical list) to determine which words require the incorporation of a special lexical subroutine into the dictionary file and.a separate lexical control program analyzes the words of the source language sentence for those cases where only the results of the syntactic/ semantic analysis of the sentence or the membership of a word in a grammatical or semantic class can determine wehther or not a lexical subroutine must be called in at the time of translation of a specific word or expression in order to determine the meaning of that word or expression by examining the syntactic relationships which have been established for said word or expression and utilizing syntactic and/or semantic rules which apply only to that word or the class of words to which it belongs.
There is andventageously provided a method of resolving semantic ambiguities, wherein each word in the stem (single-word) dictionary is assigned a unique limited semantic (LS) number, and the specialized multi-word expressions of the LS ccmpound dictionary (LS-expressicns) which are composed of the LS number representation of their individual words in the stem dictionary, are then automatically grouped into a dictionary record according to the principal word of the expression, one existing for each unique principal word, and the record is then searched whenever the principal word occurs in the sentence to be translated, in order to determine if a match exists between a group of contiguous words in the text and any of the expressions . in that record,'the longest match or the highest-priority expression being used to determine the specialized meaning of that group of words, which differs from the sum of the meaning of the individual words; a subset of anLS expression is the conditional limited semantic (CLS) expression, which permits the inclusion of all definable syntactic and semantic-rules, as well as simple programming instructions which can be used to change the information stored in the bits and bytes of the sentence analysis area, in the LS expression itself (i.E. in the dictionary), so that the dictionary expression is only matched when the syntactic and/or semantic conditions expressed in the rule(s) have been met by the wprd(s). in the text, this fact being determinable only after a complete syntactic and semantic analysis of the sentence has been carried out by the linguistic analysis programs.
In the method advantageously all source language words can be supplied with semantic category codes in a variable length format, which are then interrogated by by the source language analysis programs, the lexical routines, and the CLS dictionary lookup routines as an aid in resolving semantic ambiguities; these codes are expressed in a hierarchical taxonomy (set of tree structures) in such a way that the coding of a category which exists at a lower level of a semantic tree generates the automatic coding of the appropriate higher-level codes by the system itself.
The invention involves a method of operating a digital computer to translate from a s.ource natural language, e.g. Russian, to a target natural language, e.g. English. The method involves three phases. The dictionary look-up phase establishes the target language meaning of each-word or expression in the source text. This dictionary look-up phase attaches grammatical codes and target language equivalents to each word and expression in the source language. The syntactical analysis phase identifies syntactical information on the basis of the grammatical information attached to the words and expressions and also utilizes the inflection of the word and the position of the word in the source text. The synthesis phase takes the meaning and syntactical information of all the words of a sentence in the source text and forms a sentence in the target language.
More specifically, the method begins by loading the source text into the memory of a computer. Each source text word is then transformed into a converted source text word. The converted source text word consists of the source text word and coded information. The coded information may include a memory offset adress linkage which provides access to a memory location that contains syntactical information and translation for the source text word. The converted source text words which derive from a' source text sentence are then synthesized into a target language translation of that sentence. The synthesis correctly establishes both word meaning and word position in the target language sentence.
An important aspect of the invention is the separate treatment given high frequency versus low frequency words. In order to maximize the effective capacity of the core memory of the computer, the low frequency words carry their translation information along with them, while each of the high frequerice words carries a memory offset address linkage which allows easy access to its translation information which is stored in the core memory. Thus, the translation information for frequently used words is held in an easily accessible place in the computer rather than along with every occurrence of the word as is done for low frequency words.
While the above description portrays a human analogy of how the claimed invention functions, it must be understood that, in fact, the actual operation of the process by the computer is quite different. From the . time that the source text is converted to machine -readable input data until the time that the machine-readable output data is converted to human-readable translation text, the claimed process proceeds under the control of a computer program. While it is convenient to describe the steps of the program as if they were being performed by a human translator, in fact; nothing of the kind is happening. Rather, the computer is carrying out a series of unthinking, abstract mathematical operations on the abstract values stored in the memory of the computer. The program functions . independently of the meaning or significance of the data on which it is acting. The fact that the program is formed in a high level programming language, which makes the program appear to give significance to the machine operation, does not change the fact that the machine is actually carrying out a series of abstract steps which have nothing to do with translating between natural languages.If a different kind of information were fed into the computer, the program used in this invention could conceivably perform a function totally different from translating.
The invention comprises also the process by which information is extracted from the computer including printing out the translation i.e.- the step converting the target language sequence from computer intelligible binary coded signals back to visual indicia.
It is of great advantage that by means of the present invention any source language can be translated into a target language by applying the proposed method and by making use of same equipment, just by changing language dependent parts of the equipment, as dictionaries and linguistic programs which includes the whole syntactic and semantic rules of both languages.
Another advantage of the method according to the invention is the consistancy of the translation,i.e. a particular expression is translated from the source language into the target language with the same environment, if the environment changes the invention correspondingly changes the meaning of the expression with the above mentioned consistancy. Also the recognition of all the syntactic and scmantic features of the source language are reflected in the target language is of advantage.
Specific embodiments of the invention will now be described by way of example with reference to the accompanying drawings, in which:
As shown in Figure 1 texts are input to the translation system either on-line via a terminal or off-line via key-to-disk or key-to-tape devices. (Input via punch cards is also possible). The input text and the translation dictionaries and programs reside on disk files accessible by the computer, which performs the translation. The results are then printed out in the desired format on the printer taking into consideration any format codes in the input text.
Post-editing of the output text and any resulting dictionary update can take place, if desired, via on-line interactive terminal (or it can be performed off-line by post-editors and dictionary codes).
Diagrams and tables which must be included in the same form in the translated output can be incorporated in the text via a photocomposition device.
Figure 2 illustrates the program which loads the input text into computer memory (LØADTXT) identifies the high-frequency words and the idioms in the text and assigns each word of the text a unique serial number. It immediately attaches a translation to the idioms and ensures that the information for all the high-frequency words is immediately available in computer core memory during the translation process. The remaining (low-frequency)words are then sorted into alphabetic order to facilitate the process of.main dictionary look-up (MDL). Main dictionary look-up attaches to each low-frequency word the necessary grammatical information as well as a cross-reference to the record of multi-word limited semantic (LS) compounds containing that word. After main dictionary look-up all the words in the text are re-sorted into their original sequence.
According to Figure 3, the program INITCALL controls the actual translation process. It first calls the program GETSENTN, which establishes a sentence analysis area in computer memory, consisting of 160 bytes of fixed-length information for each word; with cross- references to additional variable-length information areas.
At first the sentence analysis area contains only the dictionary information for each word. During the translation process, additional information is added by the source language analysis programs (refered to as "passes") which resolve syntactic and semantic ambiguities, establish clause boundaries, describe basic syntactical relationship between words, identify subject/ predicate relationships and analyze the function of any prepositions.
Limited semantic (LS) and conditioned Limited semantic (CLS) dictionary look-up takes place at appropriate points in the analysis, conditioned limited semantic look-up being possible only after the entire sentence has been analyzed.
The remaining four steps, which synthezize the target language on the basis of the total information contained in the sentence analysis area, are: the translation of prepositions, the solving of word-specific problems by lexical routines, the actual synthesis of the target language words, and the rearrangement of the sentence into the word order appropriate to the target language.
The-final step-in the translation process is printing of the translated output by programm TRPRINT.
标题 | 发布/更新时间 | 阅读量 |
---|---|---|
一种短文本主题确定方法 | 2020-05-08 | 602 |
用于用户与计算机系统之间通信的方法和系统 | 2020-05-13 | 368 |
一种在统一识别框架下小型化手写体文本识别器的方法 | 2020-05-08 | 205 |
自然语言处理模型训练方法、任务执行方法、设备及系统 | 2020-05-08 | 619 |
一种基于区块链网络的信息处理方法及装置 | 2020-05-14 | 415 |
自然语言处理方法、自然语言处理装置及智能问答系统 | 2020-05-13 | 248 |
基于用户辅助修正下的实体对特定关系抽取方法 | 2020-05-13 | 766 |
一种基于OCR的文字图像识别方法及装置 | 2020-05-08 | 37 |
一种基于跨领域推荐思想的声音直播主播价值评定方法 | 2020-05-11 | 177 |
从web API规范生成聊天机器人 | 2020-05-12 | 137 |
高效检索全球专利专利汇是专利免费检索,专利查询,专利分析-国家发明专利查询检索分析平台,是提供专利分析,专利查询,专利检索等数据服务功能的知识产权数据服务商。
我们的产品包含105个国家的1.26亿组数据,免费查、免费专利分析。
专利汇分析报告产品可以对行业情报数据进行梳理分析,涉及维度包括行业专利基本状况分析、地域分析、技术分析、发明人分析、申请人分析、专利权人分析、失效分析、核心专利分析、法律分析、研发重点分析、企业专利处境分析、技术处境分析、专利寿命分析、企业定位分析、引证分析等超过60个分析角度,系统通过AI智能系统对图表进行解读,只需1分钟,一键生成行业专利分析报告。