首页 / 专利库 / 人工智能 / 进化算法 / Online learning method

Online learning method

阅读:29发布:2021-08-03

专利汇可以提供Online learning method专利检索,专利查询,专利分析的服务。并且An online learning method to correct the amount of operation required for the object of the control based on the error in the output value with respect to the target value, wherein a partial learning, carried out by a partial learning means, is combined with an overall learning, carried out by an overall learning means.,下面是Online learning method专利的具体信息内容。

An online learning method to correct the amount of operation required for the object of the control based on the error in the output value with respect to the target value, characterized in that a partial learning, carried out by a partial learning means, is combined with an overall learning, carried out by an overall learning means.An online learning method according to claim 1, characterized in that said partial learning means learns the amount of operation based on the correction value of partial changes in the system and that the overall learning means learns the amount of operation based on the correction value for overall changes in the system.An online learning method according to claim l or 2, wherein the learning rate for the overall learning means is set to be greater than the learning rate for the partial learning means.An online learning method according to at least one of the preceding claims l to 3, characterized in that when the amount of correction for the overall changes exceeds a certain percentage change, learning by the foregoing partial learning means is halted.An online learning method according to at least one of the preceding claims l to 4, wherein the foregoing partial learning means is equipped with a learning module that can learn and compute control parameters based on the operating state of the object of said control, and a means to perform online learning of control parameters values based on the error between the target value and computed value; and wherein, based on the error between the target value and the output value, the foregoing overall learning means multiplies a correction coefficient with the control parameter.An online learning method according to claim 5, wherein the foregoing object of the control is an engine, the amount of operation is the amount of fuel injection, the target and output values are air/fuel ratios, the partial learning means is a means to learn the amount of correction over time, the overall learning means is a means to learn the amount of correction based on environmental changes, wherein the overall learning means uses the output values as-is when the engine is operating steadily, but uses a statistical processing of the output values when the engine is operating in abnormal (extreme) conditions.An online learning method according to claim 6, wherein the partial learning means uses the output values as-is when the engine is operating steadily, but uses a statistical processing of the output values when the engine is operating abnormally.An online learning method according to claim 6 or 7, wherein a warm-up correction coefficient is used during the engine warm-up period, and overall learning is performed after the warm-up period has been completed.An online learning method according to at least one of the preceding claims l to 8, wherein the foregoing learning module incorporates a fuzzy neural network, a Cerebellar Model Arithmetic Computer (CMAC) or a similar means.An online learning method according to at least one of the preceding claims 3 to 9, characterized in that the relation between the learning rate GZ for the overall learning and the learning rate GB for the partial learning is from 10 to 100.An online learning method according to claim 9 or 10, characterized in that a gain G of a learning rate fulfills the following equation:G = (H' - H)/(T - H)wherein H represents the pre-corrected amount of correction,H represents the post-corrected amount of correction, andT represents a target value.An online learning method according to at least one of the preceding claims 9 to 11, characterized in that the fuzzy neural network processes by using six layers, whereas the first to fourth layers are the pre-condition layers while the fifth and sixth layers are the post-condition layers.
说明书全文

The present invention relates to an online learning method to correct the amount of operation required for the object of the control based on the error in the output value with respect to the target value.

With regard to engines of the prior art that inject fuel into their air intake passages, fuel injection systems have employed air/fuel ratio sensors that detect the air/fuel ratio (A/F) in the exhaust gases following combustion, and then this feedback was used to control the amount of fuel that is injected, thereby allowing improvements in such areas as engine performance and gas economy. Using this method, it is possible to compute corrections for the amount of air intake, but if it were possible to correct for the actual amount of air intake and to control the amount of fuel injection based on that amount of air intake, then it would be possible to adjust the current air/fuel ratio to the target air/fuel ratio. However, in actual applications, the amount of fuel injection and the amount of air intake vary due to a number of factors which result in a gap between the current air/fuel ratio and the target air/fuel ratio. The reasons are: first, not all of the fuel that is injected into the air intake tube reaches the combustion chamber, a part of the fuel adheres to the air intake tube walls, and the amount of fuel that so adheres varies according to such factors as the engine operating state and the temperature of the air intake tube walls, and these factors change the amount of fuel that adheres according to an evaporation time constant. The operating state of the engine also causes variation in the amount of fuel that adheres to the walls of the air intake tube, and further, the amount of air intake varies according to environmental factors such as the air temperature the barometric pressure, or based on changes that occur in the engine itself over time.

To resolve the foregoing problems, Japan Patent Application Hei 9-271188 proposed the incorporation of a learning module, which, based on the operating state of the engine, could learn the estimated amount of air intake and estimated amount of fuel injection, and learn online the estimated amount of air intake and the estimated amount of fuel injection based on the difference between the target air/fuel ratio and the actual air/fuel ratio in order to perform the feed-forward control of the fuel injection, thereby comprising a control method which could deal with transition states, environmental changes, and changes over time.

However, with the foregoing control method of the prior art, if there were changes in engine operating environment such as the air temperature or barometric pressure, learning could only be performed on the taught data (engine operating conditions) that was obtained, and large difference could develop between the part of the control characteristics that was learned and the part that was yet unlearned. This will be explained with reference to Figure 12. Figure 12 shows the results of learning the volume efficiency (Ve) based on the engine RPM (N) and the throttle aperture (θ). Figure 12 (A) shows the results prior to ascending Mt. Fuji to station 5, and (B) shows the results after descending the mountain. To wit, the ranges of the throttle aperture that were used in ascending and descending the mountain differed, therefore the map based on learning during the descent of the mountain differed from the map that was prepared before the ascent of the mountain, and it was not possible to compute an accurate estimated air intake volume and to exercise appropriate engine control.

Even apart from engine control, similar issues arise with regard to partial learning and character changes in robotic controls to deal with variations in their "senses".

Accordingly, it is an objective of the present invention to provide an online learning method as indicated above which reduces learning discrepancies and facilitates a more actuate control of the object of control.

According to the present invention, this objective is solved for an online learning method as indicated above in that a partial learning, carried out by a partial learning means, is combined with an overall learning, carried out by an overall learning means.

In that case, it is advantageous when said partial learning means learns the amount of operation based on the correction value of partial changes in the system and that the overall learning means learns the amount of operation based on the correction value for overall changes in the system.

When the learning rate for the overall learning means is set to be greater than the learning rate for the partial learning means, it is possible to minimize the effects of overall change upon the partial learning.

Moreover, when the amount of correction of the overall changes exceeds a certain percentage change, learning by the foregoing partial learning means is halted so that it is possible to combine partial learning with overall learning in a way that they do not interfere with each other.

Further, when the foregoing object of the control is an engine, the amount of operation is the amount of fuel injection, the target and output values are air/fuel ratios, the partial learning means is a means to learn the amount of correction over time, the overall learning means is a means to learn the amount of correction based on environmental changes, wherein the overall learning means uses the output values as-is when the engine is operating steadily, but uses a statistical processing of the output values when the engine is operating in abnormal (extreme) conditions and the partial learning means uses the output values as-is when the engine is operating steadily, but uses a statistical processing of the output values when the engine is operating abnormally, partial learning may be performed on changes over time and may be combined with overall learning that corrects for environmental changes to make possible the reduction of the learning discrepancy and to allow more accurate control of the object of control.

In order to facilitate the control during the warm-up period, it is advantageous when a warm-up correction coefficient is used during the engine warm-up period, and overall learning is performed after the warm-up period has been completed.

A consecutive learning is possible when the foregoing learning module incorporates a fuzzy neural network, a Cerebellar Model Arithmetic Computer (CMAC) or a similar means.

Other preferred embodiments of the present invention are laid down in further dependent claims.

In the following, the present invention is explained in greater detail with respect to several embodiments thereof in conjunction with the accompanying drawings, wherein:

  • Figure 1 is a component diagram of an embodiment of the online learning method invention.
  • Figure 2 is an engine component diagram showing the invention applied to the control of fuel injection on an engine.
  • Figure 3 is a block diagram of the fuel injection control taking place inside the control apparatus of Figure 2.
  • Figure 4 is a block diagram showing the structure of the model base control unit of Figure 20.
  • Figure 5 is a simplified diagram of the fuzzy neural network used to determine the estimated volume efficiency in the volume efficiency computing unit of Figure 3.
  • Figure 6 is a figure showing the form of the rule map of Figure 5.
  • Figure 7 is a block diagram showing the components of the amount-of-correction computing unit for making environmental corrections as shown in Figure 3.
  • Figure 8 is a flow chart to explain the learning method used in the embodiment of Figure 3.
  • Figure 9 is a flow chart to explain another learning method for the embodiment of Figure 3.
  • Figure 10 shows graphs of control results obtained during the ascent and descent of Mt. Fuji when only partial learning was performed, and when it was combined with overall learning.
  • Figure 11 shows graphs showing the output results for air intake at the point of reaching the summit according to Figure 9.
  • Figure 12 shows graphs showing the volume efficiency using conventional online learning: (A) is prior to the ascent to station 5 of Mt. Fuji and (B) is after descending the mountain.

The implementation of this invention will be described below with reference to the figures, which describe an embodiment of an online learning method according to this invention. This embodiment is equipped with a learning module 52 which can compute learnable control parameters based on the operating state of the object 54 of control, and which can exercise feed forward control on the object 54 being controlled, such as an engine, robot, etc.

A control unit 53 computes the amount of operation from the operating conditions, control parameters, and the overall correction coefficient and outputs to the object 54 of control. The learning module 52 computes the control parameters from the operating conditions and feeds them to the control unit 53. At that time a computer unit 55 for determining the amount of correction for partial changes computes the amount of correction to minimize the difference between the target value and the actual output value to correct the control parameters, and in addition, the learning module 52 (partial learning) learns the corrected control parameters.

On the other hand, the overall correction coefficient is memorized in the overall correction unit 57, and the overall change correction computer unit 56 computes the amount of correction to minimize the difference between the target values and the actual output values to correct the overall correction coefficient, and in addition, the overall correction unit 57 performs feed forward learning (overall learning) of the corrected overall correction coefficient. The overall correction coefficient is used by the control unit 53 to correct the amount of operation.

In this invention, the learning rate GZ for the overall learning is greater than the learning rate GB for the partial learning, for example, GZ/GB = 10 to 100, which will minimize the effects that the overall changes have on the partial changes. Here, when the target value is T and the pre-corrected amount of correction is H, the post-correction amount of correction is H', and the learning rate (gain) is G, then:H' = (T-H) G + H accordingly, the learning rate (gain) is:G = (H'- H)/(T - H).

Figures 2 through 11 relate to an embodiment of this invention applied to the control of fuel injection in an engine. Figure 2 is a component diagram of the engine of this embodiment, wherein the four cycle engine 1 is composed of a cylinder body 2, crankshaft 3, piston 4, combustion chamber 5, air intake tube 6, air intake valve 7, exhaust valve 8, spark plug 10, and ignition coil 11. A throttle valve 12 is positioned inside the air intake tube 6, and an injector 13 is mounted upstream of the throttle valve 12. Further, a box containing an engine control unit 15 (ECU) is mounted to the wall surface of the air intake tube 6. The injector 13 is connected to the fuel tank 19 through a pressure adjustment valve 16, an electric motor driven fuel pump 17, and a filter 18.

The detection signals from a variety of sensors that detect the operating condition of the engine 1 are fed into the control unit 15. To wit, the sensors include a crank angle sensor 20 which detects the rotational angle of the crankshaft 3 (means to detect engine RPM), an engine temperature detection means 21 detects the temperature of the cylinder body 2 or the coolant temperature (to wit, the engine temperature), a throttle aperture detection means 23 detects the aperture of the throttle valve 12, an air intake negative pressure detection means 24 detects the pressure inside the air intake tube, and an air intake tube wall temperature detection means 25 detects the temperature of the air intake walls. The control unit 15 performs computations based on the detected values from these sensors, and transmits control signals to the injector 13, the fuel pump 17 and the ignition coil 11.

Figure 3 is a component diagram of the fuel injection control performed inside the control unit 15 of Figure 2. A model base control unit 26 computes and emits the amount of fuel injection for the engine 1 from the engine rpm, the throttle aperture, the target A/F ratio, the volume efficiency and the overall correction coefficient. The volume efficiency computing unit 27 computes the volume efficiency from the engine RPM and the throttle aperture and transmits it to the model base control unit 26. At this time, the correction computer 28, which corrects for changes over time, computes an amount of correction for changes over time that seeks to minimize the difference between the target A/F ratio and the actual A/F ratio to thereby correct the volume efficiency, and then, based on this corrected volume efficiency (learning signal), learning (described below) is performed in the volume efficiency learning unit 27 (partial learning).

On the other hand, the environmental correction unit 29 memorizes the environmental correction coefficient, and the correction computation unit 30 for the environmental correction computes the amount of the environmental correction by seeking to minimize the difference between the target AN ratio and the actual exhaust A/F ratio, and the corrected environmental correction coefficient (learning signal) is fed back to the environmental correction unit 29 where learning is performed (overall learning).

When the A/F ratio is unstable, such as directly after starting the engines when the sensors are not adequately active and when combustion is unstable, a switch 32 switches to the warmup correction unit 31, which computes a warmup correction coefficient based on the engine temperature, the barometric pressure when the engine is started (air intake negative pressure) and the ambient temperature (air intake wall temperature). After warmup, the switch 32 switches to the environmental correction unit 29 side. At this time, the initial value for the warmup coefficient is used as the environmental correction coefficient for the environmental correction unit. Thus, the environmental correction coefficient or the warmup correction coefficient are used by the model base control unit 26 shown in Figure 4 to correct the amount of air intake. This environmental correction coefficient may also include the effects of temperature.

Figure 4 is a block diagram showing the structure of the model base control unit 26. This model base control unit is equipped with a fuel system model 33 to estimate the amount of fuel intake based on the engine RPM, the throttle aperture and the air intake tube wall temperature; an air system model 34 which estimates the amount of intake air from the engine RPM, throttle aperture, and volume efficiency; an estimated air/fuel ratio computing unit 36 which computes the estimated air/fuel ratio from the engine RPM and throttle aperture to set the target A/F ratio in the target A/F ratio setting unit 35, and it computes the estimated air/fuel ratio from the amount of air intake and amount of fuel intake as corrected by the overall correction coefficient; and the internal F/B (feedback) computing unit 37 that receives the output from the engine and fuel system module 33 and then computes the amount of fuel injection based on the differences between the target A/F ratio and the estimated A/F ratio.

Figure 5 is a simplified component diagram of the fuzzy neural network inside the volume efficiency computing unit 27 of Figure 3 which determines the volume efficiency (the ratio of the volume of air entering the cylinder to the cylinder's volume). Since the volume efficiency cannot be determined with a mathematical formula, a fuzzy neural network is utilized to model the volume efficiency. The fuzzy neural network is of layered construction, composed of six processing layers. The first through fourth layers are the precondition area while layers five and six are the post-condition layers. The engine RPM and throttle aperture values fed into the precondition layers are appropriately modified according to specific rules using fuzzy logic estimating, and then the values obtained from the precondition area are used in the post-condition area to determine the estimated volume efficiency using the center of gravity method.

As shown in Figure 6, the above mentioned rules are composed of three operating conditions A11, A21, A31 and A12, A22 and A32, which correspond to the engine RPM and throttle aperture, respectively, and conclusions R1 through R9. Figure 6 shows the rules in map form. The throttle aperture for the operating conditions A11, A21 and A31 are shown on the vertical axis while the operating conditions A12, A22, A32 with respect to the engine RPM are shown on the horizontal axis. These engine RPM and throttle apertures form a two dimensional space which is divided into 9 areas that define various operating conditions for which there are conclusions R1 through R9.

In this case, for the foregoing operating condition A11, the engine RPM is in the "low RPM range", for the operating condition A21, the engine RPM is in the "middle RPM range" and for the operating condition A31, the engine RPM is in the "high RPM range." For the throttle aperture, at operating condition A12, the aperture is "low", at operating condition A22, the aperture is at "middle", and at operating condition A32 the aperture is "high". Also, conclusions R1 through R9 represent the estimated volume efficiency levels that correspond to the magnitude of the engine RPM and throttle aperture. Nine rules are created from the operating conditions and conclusions are such as "the estimated volume efficiency is 60% when the engine RPM is in the middle RPM range and the throttle aperture is in the middle range", or "the estimated volume efficiency is 90% when the engine RPM is in the high range and the throttle aperture is in the high range", etc.

In the first four layers described above, the processing with respect to the engine RPM is separated from the processing for the throttle aperture. In the first layer, the engine RPM signal and throttle aperture signal are received respectively as input signal xi (i = 1 or 2), and then in the second through fourth layers as the respective input signals xi and the contribution rates aij for the operating conditions A11, A21, A3 and A12, A22 and A32. Specifically, the contribution rate aiy is computed as a sigmoid function f(xi) according to formula 1 below.

In the formula, wc and wg relate to the central values and slope of respective sigmoid functions. After determining the contribution rate aij in the fourth layer using the foregoing sigmoid function, in the fifth layer, Formula 2 is used determine the degree of correspondence µi of the nine conclusions R1 through R9 with inputs for the contribution rate of the engine RPM and throttle aperture. Further, Formula 3 is used to normalize the µi, and in layer 6, Formula 4 is used to normalize the degree of conformity with respect to the various conclusions obtained from Formula 3, and to output various fuzzy rules fi (i.e. the output values corresponding to the various conclusions R1 through R9) which undergo load averaging to determine an estimated volume efficiency Ve. In Figure 5, wf is a coefficient of coupling that corresponds with the output value fi.

The volume efficiency computing unit 27 is equipped with learning ability and, in its initial operation stages, it can make direct comparisons of volume efficiencies that were determined experimentally and the volume efficiencies output from the fuzzy neural network; thus the central value and slope functions for the sigmoid function coefficients wc and wg, and the coupling coefficient wf are corrected and learned by the fuzzy neural network to minimize the error between the two volume efficiencies. After that, the foregoing coefficients are refreshed based on the volume efficiency (including engine RPM and throttle aperture data) corrected for the A/F discrepancy, thereby enabling the fuzzy neural network to perform online learning.

Figure 7 shows a block diagram of the amount-of-correction computing unit 28 performing corrections over time and the correction computing unit 30 for the environmental corrections. These units are composed of. an environmental correction amount is computed by the computing unit 40 to make corrections that minimize the difference between the target A/F ratio and the exhaust gas A/F ratio; a computing unit 41 which computes an average A/F ratio of a specific time period from the exhaust A/F readings; a steady state determination unit 43 which determines, based on engine RPM and throttle aperture, whether or not the engine is running in a steady state; and a switch 44 which causes output to the amount-of-correction computing unit 40 when the engine is in a steady state, and output to the correction-amount computing unit 44 when the engine is in an extreme state. In the present embodiment, the average A/F ratio is computed during extreme state operations, but it is of course possible to also perform other statistical processing. In addition, since the corrections over time have a much lower gain than do the environmental corrections, the amount correction computing unit 28 results may be used as-is for extreme operating condition AN ratio.

Figure 8 is a block diagram to explain the learning method used in the embodiment shown in Figure 3.

First in step S1, the barometric pressure and ambient temperature are memorized prior to engine starting; in step S2 the engine temperature is used to make the determination of whether the warmup period has been completed or not -- if not yet warmed up, in step 3, the barometric pressure and temperature are used to set the warmup correction coefficient to provide the correction for warmup operations. After the completion of the warmup, in step S4, the rate of change of the throttle aperture and the engine RPM are used to determine whether or not the engine is operating in a normal or steady state (or in an extreme state). If operating in a steady state, in step S5 the feedback of the exhaust AN ratio is performed, and if not in normal or steady operations, in step S6 there is feedback of an average AN ratio over time (e.g. 10 seconds). In step S7 environment correction coefficient learning (overall learning) is performed, and in step S8 online learning (partial learning) is performed regarding the volume efficiency. Thereafter, steps S4 through S8 are repeated. In the present embodiment, as explained for Figure 1, the learning rate GZ for the overall learning correction coefficient is large while that of the online learning rate GB for the partial learning is smaller. For example, the GZ/GB ratio should range from 10 to 100. This keeps the effects of the environmental changes (overall changes) upon the changes over time (partial learning) to an absolute minimum.

Figure 9 is a flow chart to explain another example of a learning method in the embodiment of Figure 3. In this embodiment, in step S9, when the environmental correction coefficient's rate of change exceeds a certain value K, the partial learning in step S8 is halted, and the environmental correction coefficient is learned in step S10. This method makes it possible to prevent mutual interference between overall learning and partial learning.

Figure 10 is a graph showing the control results produced while ascending Mt. Fuji to station 5 and descending while performing partial learning only, and while performing both overall learning and partial learning. In the case of partial learning only, there was a larger degree of waveshape hunting and in addition, it was not possible to reach the target A/F of 12.0. When both overall learning and partial learning were performed, the waveshape was smooth, and the A/F was controlled well to about 12.0. Figure 11 shows the air intake output results when the vehicle had ascended to station 5 as in Figure 10. When only partial learning was performed, a gap developed between the partial learning output results A and the output results B that should be obtained, but when overall learning and partial learning were combined, the two matched.

An embodiment of this invention was described above, but the invention is not confined to this embodiment, any number of changes can be made. For example, a fuzzy neural network was used as the learning module in the foregoing embodiment, but it would also be possible to use a neural network or CMAC (Cerebellar Model Arithmetic Computer), etc. as a computing model with learning capabilities. The advantage of using a CNIAC over a step layer type of neural network is that it has the capability of performing additional learning and the learning is performed at a higher speed.

As has been explained above, according to the invention, the combination of the partial learning with the overall learning reduces learning discrepancies and makes possible more accurate control of the object of control.

Further, it is possible to minimize the effects of overall change upon the partial learning.

Moreover, it is possible to combine partial learning with overall learning in a way that they do not interfere with each other.

In addition, when the invention is applied to engine control, partial learning is performed on changes over time and is combined with overall learning that corrects for environmental changes to make possible the reduction of the learning discrepancy and to allow more accurate control of the object of control.

It is further possible to exercise control during the warmup period.

Moreover, according to the invention, Cerebellar Model Arithmetic Computer (CMAC) or a similar mean, the adoption of a fuzzy neural network as the learning model allows effective learning.

高效检索全球专利

专利汇是专利免费检索,专利查询,专利分析-国家发明专利查询检索分析平台,是提供专利分析,专利查询,专利检索等数据服务功能的知识产权数据服务商。

我们的产品包含105个国家的1.26亿组数据,免费查、免费专利分析。

申请试用

分析报告

专利汇分析报告产品可以对行业情报数据进行梳理分析,涉及维度包括行业专利基本状况分析、地域分析、技术分析、发明人分析、申请人分析、专利权人分析、失效分析、核心专利分析、法律分析、研发重点分析、企业专利处境分析、技术处境分析、专利寿命分析、企业定位分析、引证分析等超过60个分析角度,系统通过AI智能系统对图表进行解读,只需1分钟,一键生成行业专利分析报告。

申请试用

QQ群二维码
意见反馈