Hierarchical error correction |
|||||||
申请号 | US13405965 | 申请日 | 2012-02-27 | 公开(公告)号 | US08914712B2 | 公开(公告)日 | 2014-12-16 |
申请人 | Ravindraraj Ramaraju; Ajay J. Joshi; Bobak A. Nazer; | 发明人 | Ravindraraj Ramaraju; Ajay J. Joshi; Bobak A. Nazer; | ||||
摘要 | A data processing device can perform error detection and correction in two stages: in the first stage, error detection is performed for the load data using the in-line error detection information. If a first type of error is detected in the data segment, the error is corrected using the in-line error detection information. If a second type of error is detected error correction is performed using the residual sum. | ||||||
权利要求 | What is claimed is: |
||||||
说明书全文 | The present disclosure relates to electronic devices, and more particular to data error correction for electronic devices. In order to carry out their designated functions, electronic devices sometimes employ a processor that executes program instructions. In the course of carrying out the program instructions, the data processor stores and retrieves data from various memory devices, such as a processor cache. However, the electronic device is sometimes subject to conditions, such as environmental variations or hardware failure, that introduce errors in the data. Accordingly, electronic devices can employ error control modules to detect, and in some cases correct, the errors in data being retrieved from memory. The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. To illustrate, the two stage error correction can be performed for data being retrieved from a cache, wherein the cache includes a number of cache lines, and each cache line includes a set of ways. Each data segment in a cache line is associated with its own in-line error correction information (referred to as single-error-correct double-error-detect, or SEC-DED bits) sufficient to detect up to two errors in the bits of the way and correct a single bit error. In addition, for each cache line a residual sum is calculated, wherein the residual sum is a sum of the data segment bits of the cache line over a finite field. In response to a request to retrieve a cache way for provision to a data processor, the SEC-DED bits are used to detect errors in the cache way. When a two-bit error is detected, the residual sum is employed to correct the errors. In one embodiment, the residual sum can be combined directly with the data segments of the selected cache line in order to correct the errors. In other embodiments parity bits (referred to as double-error-correct triple-error-detect, or DEC-TED bits) sufficient to detect up to three errors in the bits of the cache line and correct a two-bit error are determined based on the residual sum. The DEC-TED bits are then employed to correct the two-bit error detected in the retrieved cache line. The cache controller 104 is a module configured to receive load and store requests from the processor 101, wherein a load request represents a request to retrieve information (the load data) from a location designated by an address (the load address) and a store request represents a request to store information (the store data) designated by an address (the store address). In response to a load or store request, the cache controller 104 provides control signaling to execute the request. The cache controller 104 can also provide additional functionality, such as address translation, request arbitration, and the like, to facilitate execution of the load and store requests. The error control module 104 is configured to perform error detection and correction operations in response to received control signaling associated with load and store requests. In particular, for a store request, the error control module 104 is configured to determine error detection and correction information based on the store data, including SEC-DED bits, residual sums of cache line data, and DEC-TED bits, as described further herein. For a load request, the error control module 104 is configured to detect and, if errors are detected, correct errors in load data. To facilitate its error detection and correction operations, the error control module includes SEC-DED module 110, DEC-TED module 112, and DEC-TED storage module 114. SEC-DED module 110 is configured to perform operations associated with computation of SEC-DED bits based on store data, and perform error detection and correction on load data based on associated SEC-DED bits. DEC-TED module 112 is configured to perform operations associated with correction of two-bit errors in load data, including determination of residual sums and, in some embodiments, determination of DEC-TED bits or other error detection and correction information based on the residual sums. The DEC-TED storage module 114 is a set of storage elements configured to store information for use by the DEC-TED module 112, such as residual sums, DEC-TED bits, or other error detection and correction information. Although depicted as a separate memory for purposes of illustration, in one embodiment the DEC-TED storage module 114 is a part of the cache memory 106. Cache 106 is a memory module configured to store information, and retrieve stored information, based on store and load requests respectively. The cache 106 is arranged according to a set of cache entries, such as cache entry 107, wherein each cache entry includes a set of ways, such as way 108. The cache 106 is configured to retrieve information at the granularity of a way. That is, in response to a load request, the cache 106 is configured to retrieve the information at the way indicated by the load address. In an embodiment, cache 106 can also store information at the granularity of a way. Each cache way includes at least two portions: a data portion that stores the data to be retrieved in response to a load request, and an error detection portion that stores SEC-DED bits associated with the data stored at the data portion. In an embodiment, each the data portion of each cache way is 64 bits (referred to as a double word), and each cache line includes 7 ways. In operation, the cache controller 102 receives load and store requests from the processor 101. In response to a store request, the cache controller 102 provides the store data to the error control module 104 which determines SEC-DED bits based on the store data. The error control module 104 provides the store data and the associated SEC-DED bits to the cache 106 for storage at the way indicated by the store address. As described further herein, the error control module 104 can also determine a residual sum for the cache line indicated by the store address. As used herein, a residual sum refers to a finite field sum of all data segments in a cache line. A finite field (also referred to as a Galois field) is a field containing a finite number of elements. Accordingly, a finite field sum of binary numbers can be calculated by combining the numbers according to an exclusive-OR (XOR) operation. In an embodiment, the residual sum for a store operation is determined by combining the all the data segments of the cache line associated with the store operation according to an XOR operation, resulting in a 64-bit residual sum. As described further herein, the properties of the residual sum are such that it can be employed to correct detected two-bit errors directly, or can be employed to determine error correction information which in turn is used to correct two-bit errors. In response to a load request, the cache controller 102 provides the load address to the cache 106 via the error control module 104. In response, the cache 106 retrieves the load data from the data portion of the way indicated by the load address and also retrieves the SEC-DED bits associated with the load data. The SEC-DED module 110 performs error detection using the SEC-DED bits. If no errors are detected, the error control module 104 provides the load data to the cache controller 102 for provision the processor 101. If a single-bit error is detected, the SEC-DED module 110 corrects the error and provides the corrected load data to the cache controller 102. If a two-bit error is detected, the error control module 104 employs the residual sum associated with the cache line of the load data to correct the two-bit error. This can be better understood with reference to At stage 253 the residual sum for the double words is determined by the DEC-TED module 112 by combining all the double words according to an XOR operation. The resulting 64 bit residual sum is stored at the DEC-TED storage module 114. In an embodiment, the error control module 104 computes the residual sum for data stored at a cache line each time data is stored at a way of the cache line. Accordingly, the DEC-TED storage module 114 will store the most up-to-date residual sum for the data stored at each cache line of the cache 106. At stage 254 the DEC-TED module 112 determines error detection information, such as SEC-DED bits, for the residual sum. The error detection information is stored with the residual sum at the DEC-TED storage module 114. At stage 255, a load request for way 233 is received. In response, SEC-DED module 110 performs error detection for data retrieved from data segment 233, using the SEC-DED bits stored with the data segment. For purposes of illustration, it is assumed that the SEC-DED module 110 detects a two-bit error in the data segment retrieved from 233. In addition, for each data segment the SEC-DED module 110 performs in parallel error detection and, for single-bit errors, error correction using the corresponding SEC-DED bits, resulting in error corrected information for each data segment. Further, the SEC-DED module in parallel determines an error corrected residual sum by performing error detection and single-bit error correction for the residual sum associated with cache line, using the corresponding error control information stored at DEC-TED storage module 114. At stage 257 the DEC-TED module 112 combines the corrected information from each data segment (other than data segment 233, for which the two-bit error has been detected) and the corrected residual sum according to an XOR operation. Because of the properties of the residual sum, the result of the XOR operation is a corrected representation of the double word 230. To illustrate, assume a residual sum Z is the result of combining data segments A, B, C, D, and E according to an XOR operation. The properties of the residual sum are such that combining B, C, D, E, and Z according to an XOR operation will yield the result A. Thus, if the data segment A is determined to have errors, an error free representation of data segment A can be recovered by combining B, C, D, E, and Z according to an XOR operation. At stage 353 the residual sum for the double words is determined by the DEC-TED module 112 by combining all the double words according to an XOR operation. At stage 354 the DEC-TED module 112 uses a parity matrix to determine DEC-TED parity bits for the residual sum. At stage 355 the parity bits are stored at DEC-TED storage module 114. In a preferred embodiment, the parity bits are stored in the cache 106 along with the associated cache line in the cache entry 107. At stage 356 a load request for way 333 is received. In response, SEC-DED module 110 performs error detection for the data retrieved from data segment 333, using the SEC-DED bits stored at the way. For purposes of illustration, it is assumed that the SEC-DED module 110 detects a two-bit error. In addition for each data segment other than data segment 333, the SEC-DED module 110 performs in parallel error detection and, for single-bit errors, error correction using the corresponding SEC-DED bits, resulting in error corrected information for each data segment except data segment 333. At stage 358 determines a residual sum for the error corrected information by combining the error corrected information according to an XOR operation. At stage 359 the DEC-TED module 112 uses a parity matrix to determine a set of parity bits based on the residual sum. In an embodiment, the DEC-TED module uses the same parity matrix as used at stage 354. At stage 360 the DEC-TED module 112 combines, according to an XOR operation, the parity bits associated with the residual sum of the cache line with the parity bits associated with the error corrected information for each way other than way 233. The result is a set of DEC-TED parity bits. Accordingly, at stage 361 the DEC-TED module 112 uses the DEC-TED parity bits produced at stage 360 to correct the double error for the data retrieved from data segment 333. In an embodiment, one or more of the stages illustrated at At stage 453 the residual sum for the double words is determined by the DEC-TED module 112 by combining all the double words according to an XOR operation. At stage 454 the DEC-TED module 112 uses a parity matrix to determine parity bits for the residual sum. At stage 455 the parity bits are stored at DEC-TED storage module 114. At stage 456 a load request for way 433 is received. In response, SEC-DED module 110 performs error detection for the data retrieved from data segment 433, using the SEC-DED bits stored at the way. For purposes of illustration, it is assumed that the SEC-DED module 110 detects a two-bit error. In addition for each data segment, the SEC-DED module 110 performs in parallel error detection and, for single-bit errors, error correction using the corresponding SEC-DED bits, resulting in error corrected information for each data segment. At stage 458 determines a residual sum is determined by combining the data retrieved from data segment 433 and the error corrected information from other data segments according to an XOR operation. At stage 459 the DEC-TED module 112 uses the retrieved corresponding parity bits of a cache line of the cache entry 107 determined at stage 454 and performs error detection on the residual sum determined at stage 458. Because of the properties of the residual sum, the error detection will indicate the bit positions of the erroneous bits of the data retrieved from data segment 433. Accordingly, at stage 460 the DEC-TED module 112 inverts the data at the corresponding bit positions of way 433, thereby correcting the data. For example, stage 459 can indicate that an error was detected at bit positions 5 and 18 of the residual sum determined at stage 458. Accordingly, at stage 460, the DEC-TED module 112 inverts the data at bit positions 5 and 18 of the data retrieved from way 433, thus correcting the data. In an embodiment, one or more of the stages illustrated at If an error is detected at block 508, the method flow proceeds to block 512 and the error control module 104 determines whether the detected error is a single-bit or double-bit error. In response to detecting a single bit error, the method flow moves to block 514 and the error correction module 104 corrects the error using the corresponding error correction value determined at block 502. The method flow proceeds to block 510 and the corrected data is provided to the cache controller 102. If, at block 512, the error correction module 104 determines the error is a double-bit error, the method flow proceeds to block 516 and the error correction module 104 corrects the error using the residual sum for the cache entry 107 determined at block 504. The method flow proceeds to block 510 and the corrected data is provided to the cache controller 102. It can be appreciated from the operational description of a hierarchical error correction of a cache line in a cache entry that the data segment of a cache line can be replaced with the way and all the operations described for data segments be applied over multiple ways of a cache entry. And also can be noted that the operational description is not limiting to data segments of a particular way but can be performed over data sets in a cache entry, wherein the data sets might be interleaved as way or column. Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed is not necessarily the order in which they are performed. Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims. |