专利汇可以提供VIDEO DECODING APPARATUS, VIDEO DECODING METHOD, AND INTEGRATED CIRCUIT专利检索,专利查询,专利分析的服务。并且A video decoding apparatus includes: a decoding unit which derives a flag regarding a motion vector from an encoded video stream; a comparing unit which determines whether or not motion vectors of adjacent blocks are equal to each other; a block combining unit which combines the adjacent blocks determined as being equal in motion vector, into one motion compensation block on which motion compensation is to be performed; a motion vector generating unit which generates a motion vector; a reference image obtaining unit which obtains a reference image corresponding to the motion compensation block from reference image data stored in a memory; a motion compensating unit which generates a prediction image corresponding to the motion compensation block; and an adder which reconstructs an image using the prediction image generated by the motion compensating unit.,下面是VIDEO DECODING APPARATUS, VIDEO DECODING METHOD, AND INTEGRATED CIRCUIT专利的具体信息内容。
This is a continuation application of PCT International Application No. PCT/JP2012/004154 filed on Jun. 27, 2012, designating the United States of America, which is based on and claims priority of Japanese Patent Application No. 2011-192066 filed on Sep. 2, 2011. The entire disclosures of the above-identified applications, including the specifications, drawings and claims are incorporated herein by reference in their entirety.
The present disclosure relates to a video decoding apparatus and a video decoding method for decoding an encoded video stream encoded using motion estimation.
In recent years, with the development of multimedia technology, every kind of information such as video, still images, audio, and text is distributed as a digital signal. For video encoding in particular, the video encoding technologies are used as the standards, such as: H.261 and H.263 standardized by the International Telecommunication Union Telecommunication Standardization Sector (ITU-T); and Motion Picture Experts Group (MPEG)-1, MPEG-2, and MPEG-4 standardized by the International Organization for Standardization/the International Electrotechnical Commission (ISO/IEC).
Moreover, H.264 (MPEG-4 AVC) and VP8, for example, have attracted attention in recent years as the encoding methods that have higher compression rates and are capable of extensive applications including mobile data terminals typified by smartphones and network distribution. H.264 (MPEG-4 AVC) is a standard developed by the ITU-T together with the ISO. VP8 is not a standard, but is a manufacturer-specific video encoding specifications developed by Google Inc.
In general, in video encoding, the amount of information is compressed by reducing redundancies in the time direction and the spatial direction.
In inter-picture prediction encoding performed to reduce the redundancy in the time direction, a motion vector is firstly calculated through motion estimation performed on a block-by-block basis with reference to an image preceding or following a current image to be encoded. Next, a prediction image is generated based on a block indicated by the motion vector. Then, the motion vector and a value of difference between the obtained prediction image and the current image are encoded to generate an encoded video stream.
To decode the encoded video stream that is encoded according to the method described above, motion compensation and reconstruction are performed. By motion compensation, the motion vector indicating reference image data on an image previously decoded is decoded and the prediction image calculated based on the motion vector is generated. By reconstruction, an original image is reconstructed by adding the prediction image generated by motion compensation to a difference value obtained from the encoded video stream.
Motion compensation is usually performed on a macroblock-by-macroblock basis, the macroblock having a size of 16×16 pixels. To be more specific, since one or more motion vectors are obtained for each macroblock, a decoding apparatus can reconstruct an image by reading data on an image region indicated by the motion vectors from reference image data (from a frame memory, for example) and adding the read data to a value of difference from an original image obtained from the encoded video stream.
Moreover, the motion vector (referred to as the “MV” hereafter) is not usually encoded as it is. As shown by Equation 1 below, the MV includes a motion vector predictor (referred to as the “PMV” or the “MV predictor” hereafter) and a motion vector difference (referred to as the “MVD” or the “MV difference” hereafter). Thus, the PMV and the MVD are separately encoded.
MV=PMV+MVD Equation 1
In the case of the PMV according to H.264 (see Non Patent Literature (NPL) 1), an adjacent block having the MV to be used or a method of deriving the PMV is predetermined based on the size of a block on which motion compensation is to be performed (this size is also referred to as the macroblock type). Moreover, it is predetermined for the MVD to be encoded into the encoded video stream based on the macroblock type (for example, according to H.264, the MVD is encoded in the case of an inter-macroblock that is not a skip macroblock).
In the case of the PMV according to VP8 (see NPL 2), the encoded video stream includes an encoded flag indicating: which adjacent block has the MV to be used; whether or not the value of the MV is 0; or whether or not the MVD is included in the encoded video stream. Thus, the MVD is encoded in the encoded video stream based on the value of the flag.
It should be obvious that the MV may be encoded as it is in the encoded video stream. Therefore, how the MV is included into the encoded video stream is defined by the corresponding video encoding standard.
Here, a decoding circuit that executes decoding described above is usually configured to temporarily store the decoded image data into an external memory. For this reason, to perform image decoding using motion estimation, the reference image data needs to be read from the external memory.
For example, when MPEG-2 is employed, a macroblock is divided into two regions and motion estimation can be performed for each of the divided regions (hereafter, a region on which motion estimation is to be performed is referred to as a “block” or a “motion compensation block”). Moreover, according to H.264 and VP8, a macroblock can be divided into 16 blocks each having a size of 4×4 pixels. When the macroblock is divided into blocks in this way, the reference image data indicated by the motion vector is read from the external memory for each of the regions corresponding to the divided blocks. As a result of this, when the number of divided blocks is larger, the number of times the memory is accessed is increased. Furthermore, a problem arises that data traffic between the memory and the decoding circuit increases, meaning that the memory band width increases.
Moreover, the motion vector can usually indicate a reference image position. Or more specifically, the motion vector can indicate not only an integer pixel position of the reference image but also a sub-pixel position of the reference image (such as a half pixel position or a quarter pixel position). To calculate a prediction image of a sub-pixel position, filtering needs to be performed using the reference image indicated by the motion vector and peripheral pixels.
For example, in motion compensation performed according to H.264 or VP8, a 6-tap filter may be used (see NPL 1 and NPL 2). Here,
In the diagram, a cross indicates a prediction pixel calculated after motion compensation, and a circle indicates a pixel necessary for motion compensation performed using the 6-tap filter. To be more specific, for motion compensation performed on a block having the partition size of 4×4 (indicated by the solid line in
Accordingly, when the partition size is 16×16 for example, reference image data necessary for generating a prediction image for a luminance component is 21×21 pixels in size as shown in
On the other hand, when the partition size is 4×4, a region of 9×9 pixels needs to be read as shown in
To address the problem of the increased amount of reference image data to be read as described above, a video decoding circuit has been proposed (see Patent Literature (PTL) 1 and PTL 2, for example). This video decoding circuit reduces the amount of reference image data to be read, by combining the reference image data necessary for one macroblock into a single two-dimensional data region.
The order in which horizontal filtering and vertical filtering are performed, filter coefficients, and rounding are different depending on the video encoding standard (see NPL 1 and NPL 2, for example). FIG. 12A is a diagram showing: pixels (each indicated by a blank circle) necessary for motion compensation in the case where the partition size is 4×4; output pixels (each indicated by a filled square) after horizontal filtering; and output pixels (each indicated by a cross) after motion compensation.
Suppose the case where the prediction image for the luminance component of one macroblock (256 bytes) is to be generated. In this case, when the partition size is 4×4, the number of times (throughput) 6-tap filtering needs to be performed is: 36 times (indicated by the filled squares in FIG. 12A)=4 (pixels)*9 (pixels) in horizontal filtering; and 16 times (indicated by the crosses in FIG. 12A)=4 (pixels)*4 (pixels) in vertical filtering. In other words, 6-tap filtering needs to be performed 832 times=(36+16)*16 (the number of partitions) for one macroblock.
On the other hand, when the partition size is 4×8, horizontal filtering needs to be performed 52 times (indicated by the filled squares in FIG. 12B)=4 (pixels)*13 (pixels) and vertical filtering needs to be performed 32 times (indicated by the crosses in FIG. 12B)=4 (pixels)*8 (pixels). In other words, 6-tap filtering needs to be performed 672 times=(52+32)*8 (the number of partitions) for one macroblock. Thus, the number of times filtering needs to be performed is reduced as compared with the case where the partition size is 4×4. Similarly, when the partition size is 16×16, horizontal filtering needs to be performed 336 times=16 (pixels)*21 (pixels) and vertical filtering needs to be performed 256 times=16 (pixels)*16 (pixels). In other words, 6-tap filtering needs to be performed 592 times=(336+256)*1 (the number of partitions) for one macroblock. Thus, the number of times filtering needs to be performed is further reduced.
More specifically, when the partition size is smaller, the number of times filtering needs to be performed increases. This has caused a problem of performance degradation.
Here, processing performance can be increased by a circuit having a configuration whereby a plurality of pixels can be outputted at one time in the horizontal or vertical direction through filtering. For example, when the partition size is 4×4 (as in
Thus, even in this case, when the partition size is smaller, the number of times filtering needs to be performed increases. This leads to a problem of performance degradation.
As in the case of the technology described above, the amount of reference image data to be read can be reduced by combining the reference images necessary for one macroblock into one image region. However, motion compensation, or more specifically, filtering, needs to be performed for each partition. In addition, when the partition size is smaller, the processing load for filtering increases. This results in degradation of processing performance, which may in turn interfere with acceleration in decoding. Moreover, an increase in the operating frequency for acceleration may cause a problem of increased power consumption.
The present disclosure is conceived in view of the aforementioned problem, and has an object to provide a video decoding apparatus and a video decoding method capable of implementing motion compensation at high speed with low power consumption by reducing the number of pixels to be read as reference image data from a frame memory when decoding an encoded stream encoded using motion estimation.
A video decoding apparatus according to an aspect of the present disclosure decodes an encoded video stream encoded using motion estimation performed on a block-by-block basis. To be more specific, the video decoding apparatus includes: a decoding unit which decodes the encoded video stream to derive a flag regarding a motion vector, the flag indicating one of (i) a prediction direction indicating that the motion vector is equal to a motion vector of an adjacent block, (ii) that the motion vector is 0, and (iii) that difference information on the motion vector is encoded in the encoded video stream; a motion vector comparing unit which determines whether or not a plurality of the motion vectors of adjacent blocks are equal to each other, using a plurality of the flags regarding the motion vectors of the adjacent blocks, the flags being derived by the decoding unit; a block combining unit which combines the adjacent blocks determined by the motion vector comparing unit as being equal in motion vector, into one motion compensation block on which motion compensation is to be performed; a motion vector generating unit which generates a motion vector based on the flag regarding the motion vector; a reference image obtaining unit which obtains, based on the motion vector generated by the motion vector generating unit, a reference image corresponding to the motion compensation block from reference image data previously decoded and stored into a memory; a motion compensating unit which performs motion compensation using the reference image obtained by the reference image obtaining unit, to generate a prediction image corresponding to the motion compensation block; and a reconstructing unit which reconstructs an image using the prediction image generated by the motion compensating unit.
With this, when it is determined based on the flags regarding the motion vectors that the adjacent blocks are equal in motion vector, the blocks are combined. Then, the reference image data can be obtained for each combined motion compensation block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
Moreover, the block combining unit may set a motion compensation block determined by the motion vector comparing unit as being different from the adjacent block in motion vector, as an independent motion compensation block.
With this, even when the adjacent blocks are different in motion vector, the reference image data can be obtained for each independent motion compensation block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
Furthermore, the prediction direction indicated by the flag regarding the motion vector may indicate that a block associated with the flag is equal in motion vector to an above adjacent block or a left adjacent block.
With this, whether or not the adjacent blocks are equal in motion vector can be easily determined simply based on the flag indicating the prediction direction of the motion vector, that is, the flag indicating that the current block is equal in motion vector to the above or left adjacent block. In the case where the adjacent blocks are equal in motion vector, the adjacent blocks are combined. Then, the reference image can be obtained for each combined block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
Moreover, each of the blocks to be compared by the motion vector comparing unit and each of the blocks to be combined by the block combining unit may be 4 pixels by 4 pixels or 8 pixels by 8 pixels in size.
With this, whether or not the adjacent blocks are equal in motion vector is determined based on the flags regarding the prediction directions of the motion vectors of the adjacent blocks each having the size of 4×4 pixels or 8×8 pixels. In the case where the adjacent blocks are equal in motion vector, the adjacent blocks are combined. Then, the reference image can be obtained for each combined motion compensation block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
Furthermore, the blocks to be compared by the motion vector comparing unit may be included in the same macroblock.
With this, whether or not the adjacent blocks are equal in motion vector is determined based on the flags regarding the motion vectors of the adjacent blocks in the macroblock. In the case where the adjacent blocks are equal in motion vector, the adjacent blocks are combined. Then, the reference image can be obtained for combined motion compensation block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
Moreover, the motion vector comparing unit may determine, for each of the blocks, whether or not the block is equal in motion vector to an above adjacent block, a left adjacent block, or an upper-left adjacent block.
With this, when it is determined, based on the flag regarding the motion vector of the above adjacent block, the left adjacent block, or the upper-left adjacent block, that the adjacent blocks are equal in motion vector, the adjacent blocks are combined. Then, the reference image can be obtained for each combined block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
Furthermore, when the flags regarding the motion vectors of two adjacent blocks to be compared by the motion vector comparing unit indicate that the two blocks are equal in motion vector to a motion compensation block included in a macroblock adjacent to the two blocks, the motion vector comparing unit may determine that the motion vectors of the two blocks are equal to each other.
With this, the motion vector of the motion compensation block in the adjacent macroblock is used by the motion vector comparing unit. Thus, when it is determined with reference to the motion vector of the adjacent macroblock that the adjacent blocks are equal in motion vector to the adjacent macroblock, the adjacent blocks are combined. Then, the reference image can be obtained for each combined blocks.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
Moreover, the motion compensation block included in the macroblock adjacent to the two blocks may be 16 pixels by 16 pixels or 8 pixels by 8 pixels in size.
With this, when the size of the motion compensation block in the adjacent macroblock is 16×16 pixels or 8×8 pixels, the motion vector comparing unit uses the macroblock. Thus, when it is determined with reference to the motion vector of the motion compensation block in the adjacent macroblock that the adjacent blocks are equal in motion vector to the motion compensation block in the adjacent macroblock, the adjacent blocks are combined. Then, the reference image can be obtained for each combined block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
Furthermore, the encoded video stream may be encoded according to VP8.
With this, when VP8 is employed, the motion vector comparing unit uses the flag regarding the motion vector. Thus, when it is determined that the adjacent blocks are equal in motion vector, the adjacent blocks are combined. Then, the reference image data can be obtained for each combined block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
A video decoding apparatus according to another aspect of the present disclosure decodes an encoded video stream encoded using motion estimation performed on a block-by-block basis. To be more specific, the video decoding apparatus includes: a decoding unit which decodes a difference value of a motion vector from the encoded video stream; a vector predictor calculating unit which calculates a vector predictor indicating a prediction value of the motion vector; a motion vector generating unit which generates a motion vector by adding the vector predictor calculated by the vector predictor calculating unit to the difference value of the motion vector decoded by the decoding unit; a motion vector comparing unit which compares the motion vector generated by the motion vector generating unit with motion vectors of adjacent blocks to determine whether or not the motion vector is equal to the motion vectors of the adjacent blocks; a block combining unit which combines the blocks determined by the motion vector comparing unit as being equal in motion vector, into one motion compensation block on which motion compensation is to be performed; a reference image obtaining unit which obtains, based on the motion vector generated by the motion vector generating unit, a reference image corresponding to the motion compensation block from reference image data previously decoded and stored into a memory; a motion compensating unit which performs motion compensation using the reference image obtained by the reference image obtaining unit, to generate a prediction image corresponding to the motion compensation block; and a reconstructing unit which reconstructs an image using the prediction image generated by the motion compensating unit.
With this, when the adjacent motion compensation blocks are equal in motion vector, these blocks are combined. Then, the reference image data can be obtained for each combined motion compensation block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
A video decoding method according to an aspect of the present disclosure is a method of decoding an encoded video stream encoded using motion estimation performed on a block-by-block basis. To be more specific, the video decoding method includes: decoding the encoded video stream to derive a flag regarding a motion vector, the flag indicating one of (i) a prediction direction indicating that the motion vector is equal to a motion vector of an adjacent block, (ii) that the motion vector is 0, and (iii) that difference information on the motion vector is encoded in the encoded video stream; determining whether or not a plurality of the motion vectors of adjacent blocks are equal to each other, using a plurality of the flags regarding the motion vectors of the adjacent blocks, the flags being derived in the decoding; combining the adjacent blocks determined in the comparing as being equal in motion vector, into one motion compensation block on which motion compensation is to be performed; generating a motion vector based on the flag regarding the motion vector; obtaining, based on the motion vector generated in the generating, a reference image corresponding to the motion compensation block from reference image data previously decoded and stored into a memory; performing motion compensation using the reference image obtained in the obtaining, to generate a prediction image corresponding to the motion compensation block; and reconstructing an image using the prediction image generated in the performing.
With this, when it is determined, based on the flags indicating the prediction directions of the motion vectors, that the adjacent blocks are equal in motion vector, the blocks are combined. Then, the reference image data can be obtained for each combined block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
An integrated circuit according to an aspect of the present disclosure decodes an encoded video stream encoded using motion estimation performed on a block-by-block basis. To be more specific, the integrated circuit includes: a decoding unit which decodes the encoded video stream to derive a flag regarding a motion vector, the flag indicating one of (i) a prediction direction indicating that the motion vector is equal to a motion vector of an adjacent block, (ii) that the motion vector is 0, and (iii) that difference information on the motion vector is encoded in the encoded video stream; a motion vector comparing unit which determines whether or not a plurality of the motion vectors of adjacent blocks are equal to each other, using a plurality of the flags regarding the motion vectors of the adjacent blocks, the flags being derived by the decoding unit; a block combining unit which combines the adjacent blocks determined by the motion vector comparing unit as being equal in motion vector, into one motion compensation block on which motion compensation is to be performed; a motion vector generating unit which generates a motion vector based on the flag regarding the motion vector; a reference image obtaining unit which obtains, based on the motion vector generated by the motion vector generating unit, a reference image corresponding to the motion compensation block from reference image data previously decoded and stored into a memory; a motion compensating unit which performs motion compensation using the reference image obtained by the reference image obtaining unit, to generate a prediction image corresponding to the motion compensation block; and a reconstructing unit which reconstructs an image using the prediction image generated by the motion compensating unit.
With this, when it is determined, based on the flags indicating the prediction directions of the motion vectors, that the adjacent blocks are equal in motion vector, the blocks are combined. Then, the reference image data can be obtained for each combined block.
Accordingly, the number of pixels to be read as the reference image data from the frame memory is reduced, and the blocks for motion compensation are largely combined. Hence, motion compensation can be implemented at high speed with low power consumption.
As described, the video decoding apparatus according to the present disclosure is capable of reducing the memory band width and also reducing the throughput in motion compensation by combining the blocks that are units of motion compensation. As a result, decoding performance can be increased and decoding processing can be performed at higher speed. Moreover, the reduction in the throughput results in lower power consumption.
These and other objects, advantages and features of the disclosure will become apparent from the following description thereof taken in conjunction with the accompanying drawings that illustrate a specific embodiment of the present disclosure.
The following is a description of embodiments according to the present disclosure, with reference to the drawings.
A video decoding apparatus according to Embodiment 1 of the present disclosure is described as follows.
As shown in
The decoding unit 110 decodes an encoded video stream inputted into the decoding unit 110. Then, the decoding unit 110 outputs a flag regarding a motion vector, motion vector data, and a value of difference between a current image to be encoded and a prediction image (this image difference is referred to as the “residual image” hereafter). Moreover, the decoding unit 110 outputs the flag regarding the motion vector and the motion vector data to the motion vector comparing unit 120 and the motion vector generating unit 140, and also outputs the residual image to the adder 190.
Here, the flag regarding the motion vector refers to a flag indicating: a prediction direction indicating that the current motion vector is equal to the motion vector of an adjacent block; that the motion vector is 0; or that the motion vector data is included in the encoded video stream. The flag is stored in the encoded video stream for each block.
Moreover, the motion vector data refers to a difference value of the motion vector or the motion vector itself. Here, note that this motion vector data may be stored in the encoded video stream only when the flag regarding the motion vector indicates that the motion vector data is included in the encoded video stream.
When receiving the flag regarding the motion vector from the decoding unit 110, the motion vector generating unit 140 generates a motion vector of a current block to be decoded, using the motion vector of an adjacent block. Moreover, when receiving the difference value of the motion vector, the motion vector generating unit 140 adds, to the difference value of the motion vector, the motion vector of the adjacent block or a prediction MV calculated from the motion vector of the adjacent block. As a result, the motion vector of the current block is generated.
The motion vector comparing unit 120 compares the flag regarding the motion vector received from the decoding unit 110 or the motion vector generating unit 140 with flags regarding motion vectors of adjacent blocks. Then, the motion vector comparing unit 120 determines whether or not the motion vectors of the adjacent blocks are equal to each other and outputs a result of the determination to the block combining unit 130.
When the result of the comparison is received and the adjacent blocks are equal in motion vector, the block combining unit 130 decides to combine these adjacent blocks into a motion compensation block having a size corresponding to a unit of motion compensation (referred to as the “partition size” hereafter). Then, the block combining unit 130 outputs this result to the motion vector generating unit 140. On the other hand, the block combining unit 130 decides to set a block different in motion vector from any of the adjacent blocks, as an independent motion compensation block. Then, the block combining unit 130 outputs this result to the motion vector generating unit 140.
Next, the flag regarding the motion vector is described with reference to
When the encoded video stream includes the flag regarding the motion vector for each block having the partition size (such as in the case of VP8), the flag is classified into one of four types (indicated as “Left”, “Above”, “Zero”, and “New”) as shown in
The block described as “Left” in
The block described as “New” in
As described, since the flag regarding the motion vector is encoded and included into the encoded video stream, the amount of encoded information regarding the motion vector can be easily reduced.
Moreover, when the partition size of the adjacent block is different from that of the current block (such as when the partition size of the current block is 8×8 pixels whereas the partition size of the adjacent block is 4×4 pixels), the adjacent block can be considered to include two blocks each having the partition size of 4×4 pixels. When the adjacent block includes prediction MV candidates in this way, the corresponding video encoding standard defines, for example, whether one of the candidates is to be used or whether the average of the candidates is to be used. Furthermore, it is predetermined, for example, that an adjacent block outside the region of the picture is replaced with a certain value (such as 0) and is not to be used as a prediction MV candidate. Similarly, it is predetermined, for example, that an adjacent block in an intra macroblock where inter prediction is not performed is replaced with a certain value (such as 0) and is not to be used as a prediction MV candidate.
To be more specific, the motion vector generating unit 140 generates the motion vector of the current block to be decoded, using the motion vector of the adjacent block specified by the flag regarding the motion vector that is received from the decoding unit 110. Moreover, suppose that a value of difference from the prediction MV is necessary in addition to the flag regarding the motion vector (as in the case shown in
The frame memory transfer control unit 150 transfers the following data from the buffer 160 to the local reference memory 170. That is, the data to be transferred includes a reference image region indicated by the generated motion vector and pixels necessary for motion compensation (pixels necessary for prediction image generation), for each motion compensation block having the partition size obtained as a result of combining by the block combining unit 130.
The motion compensating unit 180 obtains a prediction image for each motion compensation block from data stored in the local reference memory 170, and outputs the prediction image to the adder 190. The adder 190 adds the residual image outputted from the decoding unit 110 to the prediction image obtained from the motion compensating unit 180, and outputs the result to the buffer 160. After this, the decoded image data is outputted from the buffer 160 to a display unit (not illustrated).
It should be noted that the residual image outputted from the decoding unit 110 is calculated by performing inverse quantization on coefficient data of the frequency component decoded by the decoding unit 110 (such as DCT coefficients) and then transforming the result into pixel data (by, for example, inverse transform or inversed discrete cosine transform (IDCT)). Moreover, in the case of an I picture or an intra macroblock where a temporally-different reference image is not used, a prediction image can be calculated through intra prediction.
Furthermore, although not illustrated in
Next, the motion vector comparing unit 120 is described. The motion vector comparing unit 120 compares the motion vectors, using the flag regarding the motion vector that is outputted from the decoding unit 110 (the details are described above with reference to
The motion vector comparing unit 120 compares the motion vectors based on the flags regarding the motion vectors of the adjacent blocks. Then, when the adjacent blocks are equal in motion vector, the motion vector comparing unit 120 combines the blocks having the same motion vector into a motion compensation block having a new partition size for motion compensation.
For example, when the partition size is 4×4 and the flag regarding the motion vector indicates “Above”, the above adjacent block (having the partition size of 4×4 for example) and the current block have the same motion vector. Thus, these blocks are combined into a motion compensation block having the partition size of 4×8. Then, the frame memory transfer control unit 150 obtains a reference image corresponding to the motion compensation block having the new combined partition size. Moreover, the motion compensating unit 180 generates a prediction image, by performing motion compensation on the motion compensation block using the obtained reference image.
As described, since the blocks having the same motion vector are combined into one motion compensation block having a larger partition size for motion compensation, the memory transfer size can be reduced as compared with the case where the reference image is obtained corresponding to a smaller partition size before the combining. Moreover, since motion compensation is performed in a larger partition size, the throughput in motion compensation can be reduced.
Each of
To be more specific, each of the flags of blocks 200, 206, 208, and 215 is “New (indicating that the MV is separately present as in the case shown in FIG. 2D)”. Moreover, each of the flags of blocks 201, 203, 207, 210, and 214 is “Left (indicating that the MV is equal to the MV of the left adjacent block as in the case shown in FIG. 2A)”. Furthermore, each of the flags of blocks 202 and 209 is “Zero (indicating that the MV is 0 as in the case shown in FIG. 2C)”. Moreover, each of the flags of blocks 204, 205, 211, 212, and 213 is “Above (indicating that the MV is equal to the MV of the above adjacent block as in the case shown in FIG. 2B)”.
Here, the four blocks 200, 201, 204, and 205 included in the upper left 8×8 partition are explained. The flags of the blocks 201 and 204 indicates that the current MVs are equal to the MV of the block 200. Moreover, the flag of the block 205 indicates that the current MV is equal to the MV of the block 201. In other words, it can be understood from the flags of the motion vectors that these four blocks 200, 201, 204, and 205 have the same MV. Therefore, the four blocks 200, 201, 204, and 205 included in the upper left 8×8 partition in
Similarly, the four blocks 202, 203, 206, and 207 included in the upper right 8×8 partition in
Moreover, the four blocks 208, 209, 212, and 213 included in the lower left 8×8 partition in
Here, each of the four blocks 210, 211, 214, and 215 included in the lower right 8×8 partition in
In the above, whether or not the adjacent blocks are combinable is determined based on the flags of the motion vectors of the blocks included in the 8×8 partition which is a unit of motion compensation. However, the partition size of the blocks to be compared is not particularly limited. For example, the flags may be compared on an 8×4 partition basis or on a 4×8 partition basis. Moreover, the flags of any adjacent blocks within a macroblock or in different macroblocks having a boundary in between may be compared. For example, the four blocks 209, 210, 213, and 214 each having the partition size of 4×4 in
Furthermore, recursive combining can be performed. With this, after combining is performed once, whether or not the motion compensation blocks can be further combined is determined. For example, the combined motion compensation blocks 305, 306, and 308 shown in
Each of
To be more specific, each of the flags of blocks 220, 221, 222, 223, 225, 226, 227, 228, 229, 230, 231, 233, 234, and 235 is “Left”. Each of the flags of blocks 224 and 232 is “Above”.
Here, the four blocks 220, 221, 224, and 225 included in the upper left 8×8 partition are explained. It can be understood from the flags of the motion vectors that these four blocks 220, 221, 224, and 225 have the same MV. Therefore, the four blocks 220, 221, 224, and 225 included in the upper left 8×8 partition in
Moreover, the four blocks 222, 223, 226, and 227 included in the upper right 8×8 partition in
The motion compensation block 321 having the partition size of 8×8 is on the left side of the motion compensation blocks 322 and 323. More specifically, the resulting motion vectors of the two motion compensation blocks 322 and 323 each having the partition size of 8×4 are equal to the motion vector of the motion compensation block 321 shown in
Moreover, the four blocks 228, 229, 232, 233 included in the lower left 8×8 partition in
Furthermore, the four blocks 230, 231, 234, 235 included in the lower right 8×8 partition in
More specifically, the resulting motion vectors of the two motion compensation blocks 325 and 326 each having the partition size of 8×4 are equal to the motion vector of the motion compensation block 324 shown in
Moreover, the combined motion compensation block 332 shown in
Similarly, the combined motion compensation block 334 shown in
As described, recursive combining is performed. With this, after the blocks are combined into motion compensation blocks once, whether or not the motion compensation blocks can be further combined is determined. As a result, the blocks having the same motion vector are combined into a motion compensation block having a larger partition size, and motion compensation can be performed for each of such motion compensation blocks. In the above, the flags regarding the motion vectors of the motion compensation blocks are compared on an 8×8 partition basis. However, the flags of the motion vectors of the motion compensation blocks in a 16×16 partition may be compared.
As described, the blocks having the same motion vector are combined into one motion compensation block having a larger partition size, and motion compensation can be performed for each of such motion compensation blocks.
As the memory transfer size for obtaining a reference image from the buffer 160, reference image data corresponding to the size of 9×9 pixels as indicated by the dashed line in
Therefore, when the partition size of the motion compensation block is 16×16 for example, reference image data corresponding to the size of 21×21 pixels is necessary for one motion compensation block in order to generate the prediction image for the luminance component. In this case, the maximum amount of reference image data to be read for generating the prediction image (256 bytes) for the luminance component of one macroblock is 441 (bytes)=21 (pixels)*21 (pixels)*1 (the number of vectors), for each of the prediction directions.
On the other hand, when the partition size is 4×4, the reference image data corresponding to the size of 9×9 pixels needs to be read for one motion compensation block as shown in
The following explains about the amount of reference image data to be read in the case, as described with reference to
When each partition size of the four motion compensation blocks before combining is 4×4, the reference image data corresponding to the size of 9×9 pixels needs to be read for each of the motion compensation blocks having the partition size of 4×4. In this case, the maximum amount of data to be read as the reference image data necessary for generating the prediction image for the luminance component of the 8×8 partition (including the four 4×4 motion compensation blocks) is 324 (bytes)=9 (pixels)*9 (pixels)*4 (the number of vectors (i.e., the number of motion compensation blocks)).
Moreover, when the partition size of the combined motion compensation block is 8×8, the reference image data corresponding to the size of 13×13 pixels needs to be read. In this case, the maximum amount of data to be read as the reference image data necessary for generating the prediction image for the luminance component of the 8×8 partition is 169 (bytes)=13 (pixels)*13 (pixels)*1 (the number of vectors (i.e., the number of motion compensation blocks)). It is understood that, as compared with the case before combining, the amount of data to be read can be reduced. To be more specific, the memory transfer size for obtaining the reference image can be reduced by combining the motion compensation blocks.
Furthermore, it can be understood that the memory transfer size for the combined motion compensation block having the partition size of 8×8 is the same as in the case of the original motion compensation block having the partition size of 8×8. In this case, when the motion vectors are equal to each other, a data transfer sequence and an access sequence (such as an address, a control command, and a control signal for an SDRAM) for reading the aforementioned reference image data of the luminance component from the buffer 160 (the external memory such as an SDR-SDRAM, a DDR-SDRAM, a DDR2-SDRAM, or a DDR3-SDRAM) are the same. It should be noted here that the amount of data to be read as the reference image data of the luminance component and the access time may increase depending on, for example, a bus width of a bus connected to the external memory, the amount of data per access, and AC characteristics of the external memory (such as a CAS latency and a wait cycle of the SDRAM). Moreover, note that the operation of reading the aforementioned reference image data of the luminance component from the buffer 160 may be interrupted by, for example, a different access operation (such as an operation of reading reference data of a chrominance component corresponding to the current motion compensation block, an operation of reading and outputting the image data to the display unit, and access from a CPU).
Furthermore, suppose that the throughput in motion compensation is equivalent to the number of output pixels including intermediate pixels, as shown in
On the other hand, when the motion compensation block has the partition size of 4×8, horizontal filtering needs to be performed 52 times (indicated by the filled squares in FIG. 12B)=4 (pixels)*13 (pixels) for one motion compensation block; and vertical filtering needs to be performed 32 times (indicated by the crosses in FIG. 12B)=4 (pixels)*8 (pixels) for one motion compensation block. In other words, 6-tap filtering needs to be performed 672 times=(52+32)*8 (the number of partitions) for one macroblock. Thus, the number of times filtering needs to be performed in the case where the partition size is 4×8 is reduced as compared with the case where the partition size is 4×4.
Similarly, when the motion compensation block has the partition size of 16×16, horizontal filtering needs to be performed 336 times=16 (pixels)*21 (pixels) for one motion compensation block; and vertical filtering needs to be performed 256 times=16 (pixels)*16 (pixels) for one motion compensation block. In other words, 6-tap filtering needs to be performed 592 times=(336+256)*1 (the number of partitions) for one macroblock. Thus, the number of times filtering needs to be performed is further reduced.
Here, processing performance can be increased by a circuit having a configuration whereby a plurality of pixels can be outputted at one time in the horizontal or vertical direction through filtering. For example, when the partition size is 4×4 (as in
To be more specific, when the partition size of the motion compensation block increases, the number of times filtering needs to be performed (the throughput) can be reduced. As a result, the number of pixels to be read as the reference image data from the buffer 160 can be reduced, and thus motion compensation can be performed at high speed with low power consumption.
Firstly, the decoding unit 110 obtains a flag regarding a motion vector from an encoded video stream and outputs the flag to the motion vector comparing unit 120 (Step S401).
Next, based on the input flag regarding the motion vector and the previously-obtained flags regarding the motion vectors of the adjacent blocks, the motion vector comparing unit 120 determines whether or not the adjacent blocks are equal in motion vector and outputs the result to the block combining unit 130 (Step S402).
After this, when it is determined that the adjacent blocks are combinable (namely, when the blocks are equal in motion vector in Step S402) (Yes in Step S403), the block combining unit 130 changes the blocks that are equal in motion vector into a motion compensation block having one partition size (Step S404).
Then, the motion vector generating unit 140 calculates a motion vector and outputs the motion vector to the frame memory transfer control unit 150. It should be noted that the motion vector generating unit 140 may calculate a motion vector for each motion compensation block. To be more specific, it is only necessary for the motion vector generating unit 140 to calculate a motion vector of one block among the blocks included in the motion compensation block.
The frame memory transfer control unit 150 obtains, from the buffer 160, a reference image region indicated by the motion vector, that is, reference image data necessary for motion compensation to be performed on the motion compensation block, and then transfers the reference image data to the local reference memory 170 (Step S405).
The motion compensating unit 180 performs motion compensation for each motion compensation block using the reference image data obtained from the local reference memory 170, and outputs the generated prediction image to the adder 190 (Step S406).
When it is determined that the adjacent blocks are not combinable in Step S403, the partition size is not changed. Thus, reference image data is obtained and motion compensation is performed, for each motion compensation block having the original partition size (Steps S405 and S406).
In the flowchart shown in
According to the processing shown in
Each of
Each of
For example, Case 43 shown in
In this way, when the flags regarding the motion vectors of the four blocks included in the 8×8 partition are compared on an 8×8 partition basis, the comparison processing can be simplified by employing the result of comparison shown in
In addition to the examples shown in
Each of
In the macroblock including the blocks having the partition size of 4×4, a flag regarding a motion vector is described for each of the blocks. Moreover, four blocks 420, 421, 422, and 423 each having the partition size of 8×8 on the left side in the diagram are, for example, motion compensation blocks previously combined by the block combining unit 130.
To be more specific, the flag of a block 406 is “New (indicating that the MV is separately present as in the case shown in FIG. 2D)”. Moreover, each of the flags of blocks 400, 401, 403, 404, 405, 407, 408, 409, 410, 411, 412, 413, and 415 is “Left (indicating that the MV is equal to the MV of the left adjacent block as in the case shown in FIG. 2A)”. Furthermore, the flag of a block 402 is “Zero (indicating that the MV is 0 as in the case shown in FIG. 2C)”. Moreover, the flag of a block 414 is “Above (indicating that the MV is equal to the MV of the above adjacent block as in the case shown in FIG. 2B)”.
Here, the four blocks 400, 401, 404, and 405 included in the upper left 8×8 partition are explained. The blocks 400 and 401 have the same MV and the blocks 404 and 405 have the same MV, as can be seen from the flags regarding the motion vectors (“Left” in this case) shown in
Similarly, the four blocks 402, 403, 406, and 407 included in the upper right 8×8 partition in
Moreover, the four blocks 408, 409, 412, and 413 included in the lower left 8×8 partition in
Furthermore, the four blocks 410, 411, 414, and 415 included in the lower right 8×8 partition in
Moreover, the blocks included in the upper left 8×8 partition in
More specifically, the resulting motion vectors of the two motion compensation blocks 501 and 502 each having the partition size of 8×4 are equal to the motion vector of the block 421 shown in
As described, when each of the motion vectors of the two comparison target blocks (the motion compensation blocks 501 and 502 in the above example) is equal to the motion vector of the block (the block 421 in the above example) that is adjacent to these two blocks and has the partition size larger than the partition size of these two blocks, these two blocks can be combined into one motion compensation block (the motion compensation block 601 in the above example).
Similarly, each of the flags regarding the motion vectors of the two motion compensation blocks 505 and 506 included in the lower left 8×8 partition shown in
More specifically, the resulting motion vectors of the two motion compensation blocks 505 and 506 each having the partition size of 8×4 are equal to the motion vector of the block 423 shown in
As described above, a comparison can be made not only within one macroblock but with an adjacent macroblock. As a result, the adjacent motion compensation blocks can be further combined.
Moreover, a combined motion compensation block 605 shown in
The above describes the case where the partition size of the block included in the adjacent macroblock is 8×8. However, even when the motion compensation block included in the adjacent macroblock is 16×16, 8×16, or 16×8 in size or is an intra macroblock (such as when the motion vector is processed as 0), the resulting combined motion compensation block is in the same size as described above.
As described, the blocks having the same motion vector are combined and thus motion compensation can be performed on the combined motion compensation block having a larger partition size. As a result, the memory transfer size for obtaining the reference image data can be reduced, and the throughput in motion compensation can also be reduced. Hence, the number of pixels to be read as the reference image data from the buffer 160 can be reduced, and thus motion compensation can be performed at high speed with low power consumption.
Next, each of
Each of
For example, Case 1 shown in
In this way, when the flags regarding the motion vectors of the four blocks included in the 8×8 partition are compared on an 8×8 partition basis, the comparison processing can be simplified by employing the result of comparison shown in
In addition to the examples shown in
The following describes a video decoding apparatus according to Embodiment 2 of the present disclosure. The video decoding apparatus according to Embodiment 2 is different from the video decoding apparatus according to Embodiment 1 in that motion vectors of adjacent blocks are compared by actually calculating a motion vector for each block without using a flag regarding a motion vector. It should be noted that detailed descriptions of points common to Embodiment 1 and Embodiment 2 are not repeated here and that only different points are thus mainly described.
Firstly, a decoding unit 110 obtains a motion vector or a difference value of a prediction motion vector from an encoded video stream, and outputs the motion vector or the difference value to a motion vector generating unit 140. The motion vector generating unit 140 calculates a motion vector from the received motion vector or difference value and the prediction motion vector, and outputs the result to a motion vector comparing unit 120 (Step S801).
Next, the motion vector comparing unit 120 compares the motion vector received from the motion vector generating unit 140 with motion vectors of adjacent blocks to determine whether or not the adjacent blocks are equal in motion vector, and outputs the result of the determination to a block combining unit 130 (Step S802).
Then, when it is determined that the blocks are combinable (that is, the blocks are equal in motion vector in Step S802) (Yes in Step S803), the block combining unit 130 combines the blocks equal in motion vector into one motion compensation block having a large partition size and outputs the result to the motion vector generating unit 140 (Step S804).
After this, the motion vector generating unit 140 calculates a motion vector of the motion compensation block and outputs the calculated motion vector to a frame memory transfer control unit 150. It should be noted that the motion vector calculated in Step S801 may be used here.
Then, the frame memory transfer control unit 150 obtains, from a buffer 160 based on the result achieved by the block combining unit 130, a reference image region indicated by the motion vector, that is, reference image data necessary for motion compensation to be performed on the motion compensation block, and then transfers the reference image data to a local reference memory 170 (Step S805).
A motion compensating unit 180 performs motion compensation on the motion compensation block using the reference image data obtained from the local reference memory 170, and outputs the generated prediction image to an adder 190 (Step S806).
When it is determined that the adjacent blocks are not combinable in Step S803, the partition size is not changed. Thus, reference image data is obtained and motion compensation is performed, for each motion compensation block having the original partition size (Steps S805 and S806).
In the flowchart shown in
According to the processing shown in
In Embodiment 1 and Embodiment 2, the diagrams showing the configurations are described. However, these embodiments are not intended to be limiting. The configuration may be implemented as a single Large Scale Integrated (LSI) chip or individual LSI chips. In the future, with advancement in semiconductor technology, a brand-new technology may replace LSI. The functional blocks can be integrated using such a technology. The possibility is that the present disclosure is applied to biotechnology. Moreover, the present disclosure may be implemented as a program to be executed on a computer.
Although only some exemplary embodiments of the present disclosure have been described in detail above, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of the present disclosure. Accordingly, all such modifications are intended to be included within the scope of the present disclosure.
The video decoding apparatus in the present disclosure is useful as a video decoding apparatus for decoding an encoded video stream encoded using motion estimation and as a video reproducing method. Moreover, the video decoding apparatus in the present disclosure is also applicable to, for example, a DVD recorder, a DVD player, a Blu-ray disc recorder, a Blu-ray disc player, a digital TV, and a mobile data terminal such as a smartphone.
标题 | 发布/更新时间 | 阅读量 |
---|---|---|
一种视频帧预测方法、装置及终端设备 | 2020-05-08 | 774 |
二次编码优化方法 | 2020-05-12 | 183 |
一种视频帧编码方法、装置及终端设备 | 2020-05-12 | 404 |
一种视频解码中运动补偿的方法及装置 | 2020-05-08 | 71 |
一种确定运动信息的方法、帧间预测方法及装置 | 2020-05-11 | 777 |
一种HEVC中P、B帧快速运动估计方法 | 2020-05-11 | 110 |
涉及仿射运动的一般应用 | 2020-05-11 | 497 |
不同视频块尺寸的仿射模式计算 | 2020-05-13 | 331 |
视频编解码方法、装置、计算机设备和存储介质 | 2020-05-11 | 345 |
一个帧内编码块的多个预测块 | 2020-05-12 | 431 |
高效检索全球专利专利汇是专利免费检索,专利查询,专利分析-国家发明专利查询检索分析平台,是提供专利分析,专利查询,专利检索等数据服务功能的知识产权数据服务商。
我们的产品包含105个国家的1.26亿组数据,免费查、免费专利分析。
专利汇分析报告产品可以对行业情报数据进行梳理分析,涉及维度包括行业专利基本状况分析、地域分析、技术分析、发明人分析、申请人分析、专利权人分析、失效分析、核心专利分析、法律分析、研发重点分析、企业专利处境分析、技术处境分析、专利寿命分析、企业定位分析、引证分析等超过60个分析角度,系统通过AI智能系统对图表进行解读,只需1分钟,一键生成行业专利分析报告。