专利汇可以提供Multi-processor device with groups of processors and respective separate external bus interfaces专利检索,专利查询,专利分析的服务。并且The present invention intends to provide a high-performance multi-processor device in which independent buses and external bus interfaces are provided for each group of processors of different architectures, if a single chip includes a plurality of multi-processor groups. A multi-processor device of the present invention comprises a plurality of processors including first and second groups of processors of different architectures such as CPUs, SIMD type super-parallel processors, and DSPs, a first bus which is a CPU bus to which the first processor group is coupled, a second bus which is an internal peripheral bus to which the second processor group is coupled, independent of the first bus, a first external bus interface to which the first bus is coupled, and a second external bus interface to which the second bus is coupled, over a single semiconductor chip.,下面是Multi-processor device with groups of processors and respective separate external bus interfaces专利的具体信息内容。
What is claimed is:
This application is a continuation of application Ser. No. 11/970,732 filed Jan. 8, 2008 now U.S. Pat. No. 8,200,878. The disclosure of Japanese Patent Application No. 2007-11367 filed on Jan. 22, 2007 including the specification, drawings and abstract is also incorporated herein by reference in its entirety.
The disclosure of Japanese Patent Application No. 2007-11367 filed on Jan. 22, 2007 including the specification, drawings and abstract is incorporated herein by reference in its entirety
The present invention relates to optimal bus configurations and layouts of components of a multi-processor device in which a plurality of groups of processors are implemented in a single LSI.
In multi-processor devices in which multiple processors of the same architecture and multiple processors of different architectures such as CPU and DSP are implemented over a single semiconductor chip, bus configurations as below have been used. In one configuration, all multiple processors are coupled to a single this, as described in Non-Patent Document 1 mentioned below. In another configuration, to couple multiple processors using the same protocol to a bus, local buses are provided for each CPU and the local buses are coupled together with a bridge, as described in Non-Patent Document 2 mentioned below.
In the case where all multiple processors are coupled to a single bus, the processors are coupled to the same bus, whether the LSI multi-processor device is equipped with one external bus interface or multiple external bus interfaces.
In the case where multiple local buses are coupled together with a bridge, one processor is coupled to a local bus, the respective local buses are coupled to a single bus master, and a single bus is coupled to an external bus interface.
[Non-Patent Document 1]
[Non-Patent Document 2]
However, if multiple processors including different architectures are coupled to a single bus, as different-architecture processors generally differ in processing performance and speed, the following problem was posed: the operation of high-speed processors is impaired by low-speed processors and the performance of high-speed processors is deteriorated. If the multi-processor device includes CPUs and processors that are mainly for data processing, such as DSPs and SIMD type super-parallel processors, due to that DSPs and SIMD type super-parallel processors handle a large amount of data, the following problem was posed: the CPUs have to wait long before accessing the bus and the benefit of the enhanced performance of the multi-processor device is not available well.
With regard to a problem of coherency between caches, the coherency is ensured for multiple processors of the same architecture, but the cache coherency between different-architecture processors is not ensured practically and an inconsistency problem was presented.
If a multi-processor oriented OS is run, it is often enabled only for processors of the same architecture, as different-architecture processors are supplied by different developers and an OS designed for these processors is hardly made. Therefore, separate OSs must be provided for different-architecture processors. A situation where processors on which different OSs are connecting to a single bus means that the processors are coupled to a bus master IP connection which is unknown to the OSs on the same bus. A problem was posed in which enhanced performance such as scheduling of the multi-processor oriented OS is impaired.
Even when multiple local buses are coupled together with a bridge, the respective local buses are coupled to a single bus master and, therefore, a combination of a CPU and a local bus is considered as a single CPU. This posed the same problem as the above problem with the situation where different-architecture processors are connecting to the same bus.
Due to that the processors are coupled to the same bus , whether the LSI multi-processor device is equipped with one external bus interface or multiple external bus interfaces , the following problem was presented. A bus portion to which an external bus interface is coupled is blocked by a request for access to the external bus from another bus and cannot yield desired performance. A bus portion to which an external bus interface is not coupled experiences performance deterioration when access to the external bus interface from another bus occurs.
Therefore, the present invention has been made to solve the above problems and intends to provide a high-performance multi-processor device in which independent buses and external bus interfaces are provided for each group of processors of different architectures.
In one embodiment of the present invention, a multi-processor device comprises, over a single semiconductor chip, a plurality of processors including a first group of processors and a second group of processors, a first bus to which the first group of processors is coupled, a second bus to which the second group of processors is coupled, a first external bus interface to which the first bus is coupled, and a second external bus interface to which the second bus is coupled.
According to one embodiment of the present invention, when a plurality of groups of processors are implemented on a single semiconductor chip, independent buses and external bus interfaces are provided for each group of processors of different architectures . By this configuration, each group of processors can operate independently and, therefore, coordination and bus contention between processors are reduced. It is possible to realize at low cost a high-performance multi-processor system consuming low power.
[Embodiment 1]
The CPUs operate internally at 533 MHz at maximum. The operating frequency of each CPU is converted by a bus interface inside the CPU, so that the CPU is coupled to the CPU bus 10 at 266 MHz at maximum. The secondary cache 12 and the DDR2 I/F 13 operate at 266 MHz at maximum.
The LSI device of the present invention has an internal peripheral bus 14 (a second bus) in addition to the CPU bus 10 on the same semiconductor chip. To the internal peripheral bus 14, a peripheral circuit 15 including ICU (interrupt controller), ITIM (interval timer), UART (Universal Asynchronous Receiver Transmitter: clock asynchronous serial I/O), CSIO (clock synchronous serial I/O), CLKC (clock controller), etc., a DMAC 16 (DMA controller), a built-in SRAM 17, SMP-structure matrix type super-parallel processors (SIMD type super-parallel processors 31, 32, a second group of processors), an external bus controller 18 (a second external bus interface), and a CPU 19 of another architecture are coupled. The internal peripheral bus 14 is coupled to an external bus 2 via the external bus controller 18, thereby forming an external bus access path for connection to external devices such as SDRAM, ROM, RAM, and IO.
The internal peripheral bus 14 operates at 133 MHz at maximum and the DMAC 16, built-in SRAM 17, and peripheral circuit 15 also operate at 133 MHz at maximum. The SIMD type super-parallel processors operate internally at 266 MHz at maximum. The operating frequency of each super-parallel processor is converted by a bus interface inside it to couple the processor to the internal peripheral bus 14. Likewise, the CPU 19 operates internally at 266 MHz at maximum and this operating frequency is converted by a bus interface inside it to couple it to the internal peripheral bus 14. Because there is a difference in processing performance and speed between the processor clusters, as described above, these processor clusters are controlled using separate clocks and differ in frequency and phase.
The CPU bus 10 and the internal peripheral bus 14 are coupled through the secondary cache 12. Therefore, the CPUs CPU1 through CPU8 not only can get access to the external bus 1 through the secondary cache 12 and via the DDR2 I/F 13, but also can access resources on the internal peripheral bus 14 through the secondary cache 12. Thus, the CPUs CPU1 through CPU8 can get access to another external bus 2 via the external bus controller 18, though this path is long and the frequency of the internal peripheral bus is lower thus resulting in lower performance of data transfer. The modules that are coupled to the internal peripheral bus 14 can get access to the external bus 2 via the external bus controller 18, but cannot get access to the external bus 1.
The CPUs CPU1 through CPU8 are of the same architecture. For coherency between primary and secondary caches, the contents of the primary and secondary caches are coherency controlled so as to be consistent and there is no need to worry about malfunction of the CPUs. Even in a case where a multi-processor oriented OS is used, high performance can be delivered, because eight CPUs of the same architecture and the secondary cache 12 are only connecting to the CPU bus 10 and the external bus 1 is accessible from only the CPUs CPU1 through CPU8. Especially, the SIMD type super-parallel processors operate at lower speed than the CPUs and handle a large amount of data when they process data. Consequently, these processors are liable to occupy the bus for a long time. However, this does not affect the data transfer on the CPU bus 10, because the SIMD type super-parallel processors have access to the external bus 2 through the internal peripheral bus 14.
From the viewpoint of the SIMD type super-parallel processors, the CPUs primarily use the path of the external bus 1 from the CPU bus 10. Therefore, there is no need to release the internal peripheral bus 14 for the CPUs during data transfer and efficient data transfer can be performed. This effect is significant especially because of the multi-processor consisting of a plurality of CPUs. In this embodiment example of the invention, there are eight CPUs in the multi-processor device. However, in a case where 16, 32, or more processors share the same bus with the SIMD type super-parallel processors oriented to data processing, data processing latency occurs. If the present invention is applied to such a case, its effect will be more significant.
The CPU 19 is a small microprocessor whose operating speed and processing performance are lower than the CPUs CPU1 through CPU8, but it consumes smaller power and occupies a smaller area. This CPU can perform operations such as activating the peripheral circuit 15 and checking a timer, which do not require arithmetic processing performance such as power management using CLKC. Therefore, even if the CPU 19 shares the same bus with the SIMD type super-parallel processors, it does not pose a problem in which the performance of the SIMD type super-parallel processors is deteriorated.
[Embodiment 2]
By laying out the components of the multi-processor device as illustrated in
There is a difference in operating frequency and arithmetic processing capability between the internal peripheral bus region 21 and the CPU bus region 20 and, consequently, these regions have different power consumptions. Low-impedance wiring is required in the CPU bus region 20 with higher clock frequency and larger power consumption. Relatively high impedance is allowable in the internal peripheral bus region 21 with lower clock frequency and smaller power consumption. Low-impedance wiring in the region with larger power consumption can be implemented by wiring of wide lines or closely spaced wiring. As adverse effect of this, wired voltage supply/GND lines 22 occupy more area in the wiring layer and wiring of other signal lines and the like is hard to do. As a result, the LSI device area increases and cost increases, and additional roundabout wiring of signal lines increases wiring capacity, which in turn increases power consumption. If these regions are scattering and mixed, low-impedance wiring has to be performed throughout the device area to ensure stable operation. However, this makes the device area larger and the cost higher.
In the layout where the device area is divided into the CPU bus region 20 with larger power consumption and the internal peripheral bus region 21 with smaller power consumption, as shown in
In
In the present embodiment, the external bus 1 and the external bus 2 are disposed apart from each other at the top and bottom edges of the chip. Because the external bus controller 18 or the DDR2 I/F 13 has high driving capability, they consume large power and are prone to produce power-supply noise or the like. However, in the layout of the present embodiment, the external bus controller 18, DDR2 I/F 13, and CPUs which carry large current are disposed apart from each other. Local concentration of power does not take place and therefore heat generation is uniform throughout the chip. The external bus controller 18, DDR2 I/F, and CPUs are sensitive to noise and temperature change. However, as they are placed apart from each other, influence of noise and heat generation on each other is reduced.
By thus disposing the modules with larger power consumption, which are sensitive to noise, apart from each other, mutual noise interference is reduced. Hence, the multi-processor device can be designed with an estimate of a smaller margin for noise. Since power consumption is uniform throughout the device and there is no local power concentration, wiring of voltage supply lines can be simplified. Besides, there is no local heat generation and the device can be designed with an estimate of a smaller margin for temperature change. Therefore, it is possible to realize at low cost the LSI device occupying a small area and consuming low power, while assuring stable operation.
[Embodiment 3]
As regards the positional relationship between the CPU bus controller module, in most cases of layout using an automatic wiring tool, buses are wired between each CPU and the CPU bus controller module as shown in
In bus wiring to the built-in SRAM 17, a buffer circuit 24 is placed at a branch point from the internal peripheral bus 14. Doing so can prevent a decrease in the speed of the internal peripheral bus 14 and an increase in its power consumption due to extended wiring of the internal peripheral bus 14. Insertion of the buffer circuit 24 poses no problem, because high-speed access to the built-in SRAM 17 is not required.0
[Embodiment 4]
The CPUs CPU1 through CPU8 are of the same architecture. For coherency between the primary and secondary caches, the contents of the primary and secondary caches are coherency controlled so as to be consistent and there is no need to worry about malfunction of the CPUs. Even in a case where a multi-processor oriented OS is used, high performance can be delivered, because eight CPUs of the same architecture, the secondary cache 12, and the bus bridge 25 are only connecting to the CPU bus 10 and the external bus 1 is mostly accessed from the CPUs CPU1 through CPU8, but infrequently accessed from the modules coupled to the internal peripheral bus 14.
Other configuration details and effects are the same as for Embodiment 1 and, therefore, description thereof is not repeated.
[Embodiment 5]
[Embodiment 6]
In the present invention, a bus clock which is presented in
When a clock divided by n and Sync. are compared, the number of times of switching (switching frequency) is the same for both, but the phase of a clock divided by n must be exactly aligned with the phase of the CPU clock, whereas this is not required for Sync. Therefore, using Sync. eliminates a need for an unnecessarily large buffer and a buffer for generating a delay which introduces inefficiency, thus making it possible to realize at low cost the LSI device occupying a small area and consuming low power.
As regards the quality of the clock of the CPU bus 10, in the case of
While the relationship between the CPU clock and CPU bus clock was explained in the present embodiment, the same is true for the relationship between the clock of the SIMD type super-parallel processors and the clock of the internal peripheral bus 14 as well as the relationship between the clock of the CPU 19 and the clock of the internal peripheral bus 14.
[Embodiment 7]
标题 | 发布/更新时间 | 阅读量 |
---|---|---|
半导体封装件 | 2020-05-08 | 395 |
半导体封装件及其制造方法以及制造再分布结构的方法 | 2020-05-11 | 714 |
一种半导体芯片及其制造方法 | 2020-05-08 | 580 |
半导体芯片的测试方法、装置及系统 | 2020-05-08 | 189 |
弹簧电极 | 2020-05-08 | 719 |
半导体器件与具有半导体器件和印刷电路板的接触器组件 | 2020-05-08 | 621 |
安装装置及温度测定方法 | 2020-05-08 | 645 |
具有用于热性能的相变材料的半导体器件的封装 | 2020-05-08 | 369 |
连接组件 | 2020-05-08 | 677 |
一种半导体芯片测试卡控温系统 | 2020-05-11 | 247 |
高效检索全球专利专利汇是专利免费检索,专利查询,专利分析-国家发明专利查询检索分析平台,是提供专利分析,专利查询,专利检索等数据服务功能的知识产权数据服务商。
我们的产品包含105个国家的1.26亿组数据,免费查、免费专利分析。
专利汇分析报告产品可以对行业情报数据进行梳理分析,涉及维度包括行业专利基本状况分析、地域分析、技术分析、发明人分析、申请人分析、专利权人分析、失效分析、核心专利分析、法律分析、研发重点分析、企业专利处境分析、技术处境分析、专利寿命分析、企业定位分析、引证分析等超过60个分析角度,系统通过AI智能系统对图表进行解读,只需1分钟,一键生成行业专利分析报告。