Gradientbased preprocessing for intra prediction in High Efficiency Video Coding
 Anis BenHajyoussef^{1, 2}Email author,
 Tahar Ezzedine^{1} and
 Ammar Bouallègue^{1}
DOI: 10.1186/s1364001601599
© The Author(s). 2017
Received: 5 September 2016
Accepted: 19 December 2016
Published: 25 January 2017
Abstract
In order to reach higher coding efficiency compared to its predecessor, a stateoftheart video compression standard, the High Efficiency Video Coding (HEVC), has been designed to rely on many improved coding tools and sophisticated techniques. The new features are achieving significant coding efficiency but at the cost of huge implementation complexity. This complexity has increased the HEVC encoders’ need for fast algorithms and hardware friendly implementations. In fact, encoders have to perform the different encoding decisions, overcoming the realtime encoding constraint while taking care of coding efficiency. In this sense, in order to reduce the encoding complexity, HEVC encoders rely on lookahead mechanisms and preprocessing solutions. In this context, we propose a gradientbased preprocessing stage. We investigate particularly the Prewitt operator used to generate the gradient and we propose necessary approaches that enhance the gradient performance of detecting the HEVC intra modes. We also set different probability scenarios, based on the gradient information, in order to speed up the mode search process. Moreover, we propose a gradientbased estimation of the texture complexity that we use for coding unit decision. Results show that the proposed algorithm achieves a reduction of 42.8% in encoding time with an increase in BD rate of only 1.1%.
Keywords
HEVC Intra prediction Preprocessing Image gradient Sobel Prewitt1 Introduction
Especially, with the emergence of the H.264/AVC standard, a significant progress has been performed in video applications. This progress has led to an increasing need for better video quality and higher compression especially with the applications and services dealing with high and ultrahigh resolutions.
In this context, the Joint Collaborative Team on Video Coding (JCTVC), a team of experts from the ITUT Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) have standardized, in 2013, a stateoftheart video coding, the HEVC [1, 2]. The architecture of the new standard has kept the same highlevel design as its predecessor. But, the HEVC relies on many improved coding tools and techniques that offer effectively higher coding efficiency but at the cost of more encoding complexity.
All these sophisticated prediction features offer a better coding efficiency, but at the cost of significant complexity at the encoder side. Thus, the HEVC encoders are facing a real challenge for speeding up the encoding process and especially the mode decisions while paying a close attention to encoding efficiency.
Many approaches have been studied in order to speed up the encoding decisions. Among these approaches, manycore processors technique, which relies on parallelization of encoding algorithms, presents a good alternative. Many works have been conducted to speed up some encoding processes as the coding unit partitioning [4], the motion estimation [5], the HEVC deblocking filter [6], and the intra prediction [7]. Another approach relies on multipass processing. For example in [8], Wang et al. proposed a twopass based rate control algorithm. In addition to the solution of multipass processing, of which the possibility remains quite related to the application constraints, it became quite important for HEVC encoders to rely on lookahead and preprocessing solutions.
In this work, the large number of HEVC supported intra modes presents a motivation to investigate the solution of a pixel gradientbased preprocessing stage that will operate on the original frame. We are interested in intra coding dealing with mode decision as well as CU coding.
Many works have been proposed to deal with these aspects. In [9], a fast preprocessing is proposed to generate estimations of the RD costs. Operating on the original frame instead of the reconstructed one, the preprocessing allows to reduce the data dependency from the reconstruction loop. Then, the generated data is used to reduce the number of tested prediction unit levels as well as the number of tested intra modes.
In [10, 11], a downsampling approach is applied on the CU in order to reduce the prediction related computation. The downsampled prediction is coupled with a progressive search in order to reduce the intra candidate modes. In [12], authors categorize the edge directions in five groups by applying different types of differences on the pixel values. A dominant edge direction for each PU is generated and used to reduce the number of intra modes going to be evaluated. Shen et al. [13] have used the spatial correlation between neighbor CUs in order to speed up the CU split decision as well as early terminating the motion estimation. A Bayesian rulebased approach has been proposed in [14]. The CU split decision is formulated as classification problem for which a probability density function is estimated. A minimization of the Bayesian risk is performed in order to approach the optimal CU split decision. The works [15–17] have relied on the correlation between the intra RD cost and its estimation based on Hadamard transform, for early termination of the intra mode and CU coding decisions.
Now regarding the gradientbased approach, which is of particular interest here, many works have been proposed in video coding and they could be categorized into two main classes: The first class deals with works that generate gradient information through differences computation on the pixel blocks. Such work has been conducted by Tsai et al. [18] for H.264 intra prediction. A similar approach has been proposed by Yongfei [19] for HEVC intra prediction. The second class concerns works in which a differential operator is used to approximate the mathematical gradient values such as [20], where Pan et al. proposed to measure the edge directions, at a preprocessing level, with the Sobel operator. The generated gradient information is used then to predict the H.264 intra modes. A similar approach, using the same operator, has been proposed by Jiang [21] for HEVC intra prediction. More recent similar work has been proposed in [22] coupled by gap consideration into the values of the sums of absolute transformed differences (SATD), in order to eliminate less probable modes from the prediction process.
In this work, we focus on this later class as it offers a mathematic generation of the gradient direction, which is an interesting solution for taking advantage of the large number of HEVC angular intra modes. As we are particularly interested in reducing the implementation complexity compared to [21, 22], we focus on the operator used for the gradient computation. The reason why Sobel operator is widely used in gradient intra prediction works and in general in many image and video algorithms and applications comes especially from its significant performance on edge detection area. In this work, we are interested in comparing its performance in detecting the HEVC intra direction with the Prewitt operator. Such a work is motivated by the fact that Prewitt operator offers simpler coefficients that can contribute much less implementation complexity for a gradient solution. In [23], we have conducted a motivation work using the Prewitt operator with granular pixel coverage, toward further understanding the gradient operators’ impact on HEVC video coding. We also presented a pixel neighbor extension of the gradient values in order to enhance the performance of the intra mode detection. In [24], we investigated the twodimensional Roberts operator to even more simplify the gradient computation. In addition, we considered the appearance number of modes, as well as the gradient magnitude in order to optimize the performance of intra mode detection.
In this work, we extend the latter approaches to present a complete preprocessing solution for intra coding. In order to speed up the process of optimal intra mode research, we exploit the gradient information, generated at the preprocessing stage, to limit the modes to be tested to only the most probable ones based on different probability scenarios. Moreover, we propose a gradientbased scheme for the CU intra split decision. For this purpose, we propose an approach to measure the texture complexity depending on the CU sizes.
We consider here the work of Jiang as a basis work for a gradient solution for HEVC. Jiang has worked on HM4.0 [21, 25] but since that time, some features in the intra prediction design has changed. For example, unlike HM4.0 which supports three modes for 64 × 64 PUs, the HEVC standard supports 35 modes as will be exposed in more details in a next section of this paper. Therefore, in this work, we test the gradientbased approach on recent adopted HEVC design of intra prediction.
The remainder of this paper is organized as follows. Section 2 presents the experimental methods. Section 3 presents an overview of the HEVC intra prediction algorithm as well the proposed gradientbased intra prediction. In addition, it exposes the proposed optimization approaches dealing with a preselection of intra mode as well as an optimized mode selection at PU level. In Section 4, we present an approach for speeding up the intra prediction based on the gradient information. Section 5 exposes the proposed schemes for the CU split decision. Then, Section 6 presents the experimental results of the proposed algorithm. And finally, we present the conclusions in Section 7.
2 Experimental methods
The aim is to measure the impact of the proposed solution on video coding efficiency as well as on time of coding. For that purpose, the proposed algorithm was integrated in HEVC test model (HM) version 14.0. Simulations were performed conforming to common test condition specified in [30]. As the implemented feature concerns mainly the intra coding, we present the results for an all intra (AI) coding. We used test video sequences of classes A to E. To measure the coding efficiency, we present the Bjontegaard delta rate (BDrate) [30]. This metric represents the average difference between the original ratedistortion curve and that obtained after the integration of the proposed features. The ratedistortion curves are obtained by coding each test sequence at four different QPs: 22, 27, 32, and 37. We measure the coding time saving according to Eq. (25), using T _{HM14} which is the encoding time of HM14.0 and T _{Prop} which is that obtained after the integration of the proposed solution on HM14.0.
3 HEVC intra prediction
3.1 Overview of HEVC intra prediction
After the RMD step, a mode candidate set ψ ^{ R } is generated by considering the best intra modes. The number of the candidate modes is set to 3, 3, 3, and 8, respectively, for PU sizes of 64 × 64, 32 × 32, 16 × 16, and 8 × 8 [28]. To exploit the correlation of direction information between the neighboring blocks [29], a check is performed, at a second stage, for additional most probable modes (MPMs) that are derived from neighbors. These modes are added, if they are not already included, to form an extended candidate set ψ ^{ M } [30]. At the third stage, a ratedistortion optimized quantization (RDOQ) is performed using the modes of the candidate set at only the maximum size of TU. The goal of this step is pick the optimal intra mode m _{opt} for the PU as well as the best PU split structure at ratedistortion wise. In the last stage, the optimal mode m _{opt} found previously is used in order to find out the optimal residual quadtree (RQT) structure.
3.2 Gradientbased intra prediction
The idea of a gradientbased intra prediction is estimate the pixel intensity variation in order to approach the best intra mode direction.
Lookup table with HEVC intra directions
Mode  Φ _{ m } (Gy/Gx)  Mode  Φ _{ m } (Gy/Gx) 

2  1  19  −1.23 
3  0.81  20  −1.52 
4  0.65  21  −1.88 
5  0.53  22  −2.46 
6  0.4  23  −3.55 
7  0.28  24  −6.4 
8  0.15  25  −16 
9  0.06  26  <−16 
10  0  27  16 
11  −0.06  28  6.4 
12  −0.15  29  3.55 
13  −0.28  30  2.46 
14  −0.4  31  1.88 
15  −0.53  32  1.52 
16  −0.65  33  1.23 
17  −0.81  34  1 
18  −1 
At the end of this preprocessing step, we will obtain a mode map m _{ i } as well as a magnitude map M _{ i } where i presenting a pixel position i.
For each PU, a mode histogram with accumulated mode magnitudes will be generated. The modes with highest values will be selected to form the candidate set. We mention here that the generated mode matrix contains only angular modes.
3.3 Operators analysis
The works that have proposed gradientbased solutions such as [20] and [21] have used the Sobel operator to compute the gradient. The reason behind this is that Sobel has one of the best edge detection performance among the existing operators.
This operator, with simpler coefficients presents some keys points that offer less implementation complexity. In fact, the Prewitt filter has only 1 and −1 coefficients, which can be implemented with simple additions and subtractions instructions. However, for Sobel operator case, which includes 2 and −2 coefficients, the gradient calculations would be implemented with additional instructions. For hardware implementation considerations, the 2 and −2 coefficients, make the convolution implementation need applying additional masks to isolate the pixels concerned by these coefficients as well as extra addition/subtraction instructions. Such considerations make the Prewitt filter much simpler, hardwarewise, especially for the applications that require a gradient generation at a pixel level.

The relation between approximating the pixel gradients and detecting the intra prediction directions is not that evident. In fact, the gradient solution is used as an approximation of the pixel intensity direction that would best represent the current PU. But the theoretical optimal direction is only related to ratedistortion wise. Hence, the impact of gradient operator on video coding efficiency should be investigated.

There are only 33 angular directions to represent each PU best direction. Hence, we have to choose the nearest HEVC supported direction Φ _{ m } to represent the computed gradient direction Φ _{ G }. This difference between Φ _{ m } and Φ _{ G } would offer an additional margin for a less accurate operator to make up for detection performance.
Average hit rate of the theoretical optimal mode
Sequence  QP  Hadamard  Sobel  Prewitt 

Traffic  22  85.0  67.2  67.7 
37  95.0  64.6  65.4  
Kimono1  22  83.0  57.1  58.4 
37  93.0  47.0  49.2  
BasketballDrill  22  91.0  72.0  70.3 
37  97.0  65.4  65.1  
BasketballPass  22  88.0  70.3  69.7 
37  97.0  61.6  61.8  
FourPeople  22  91.0  63.6  64.0 
37  97.0  61.3  62.3  
Ave.  91,70  63.01  63.39 
Actually, this difference should be considered taking into account that the Hadamard prediction is performing a kind of multipass processing. In fact, that the Hadamard prediction is performing huge transform computation that is made for each intra mode to estimate the corresponding distortion.
Also, it is estimating the bit consumption of each mode. These estimations are then used into a ratedistortion cost function in order to choose the best intra mode.
The comparison with the results of the Hadamard based prediction suggests to optimize the gradientbased solution to have better detection performance. So in the next sections, we propose some approaches that would improve the gradient solution performance of detecting the HEVC directions while keeping a close watch to the implementation complexity.
3.4 Optimization of intra mode detection
3.4.1 Optimal mode selection
To choose the best modes for the candidate set, Jiang [21] has considered, as a cost function, the accumulated gradient magnitudes M _{ m } for each mode m in the current PU.
However, the M _{ m } criterion is presenting some limitations. In fact, we can have, in some cases, a mode that appears in many points in the PU but with small magnitudes representing a spread variation of pixel intensity but with very small values. And we can have, in other cases, a mode that exists in few points but with high gradient magnitudes reflecting a limited but high variation of pixel intensity. So as in both cases, the most appearing modes as well as the modes with high gradient values would approach the optimal mode, so we propose here, to consider in addition to M _{ m }, the number of appearance of a mode m in the current PU, N _{ m }.
Average hit rate of the theoretical optimal mode in function of α
Sequence  QP  α  

0  0.2  0.4  0.6  0.8  1  
Traffic  22  67.7  68.0  68.0  68.1  68.1  67.7 
37  65.4  65.2  65.2  65.3  65.4  65.4  
Kimono1  22  58.4  58.5  58.5  58.5  58.5  58.4 
37  49.2  47.7  47.8  47.9  48.2  49.2  
BasketballDrill  22  70.3  71.4  71.4  71.4  71.4  70.3 
37  65.1  65.9  65.9  65.9  66.0  65.1  
BasketballPass  22  69.7  70.1  70.1  70.1  70.1  69.7 
37  61.8  61.6  61.6  61.6  61.7  61.8  
FourPeople  22  64.0  64.2  64.2  64.3  64.3  64.0 
37  62.3  61.9  62.0  62.1  62.3  62.3  
Ave.  63,39  63.45  63.47  63.52  63.60  63.39 
3.4.2 Mode preselection
Average hit rate of the theoretical optimal mode in function of q
Sequence  QP  OFF  q  

1  1.3  1.5  2  
Traffic  22  67.7  71.0  71.0  71.1  71.1 
37  65.4  65.7  65.8  65.9  66.2  
Kimono1  22  58.4  61.1  61.9  61.9  61.9 
37  49.2  45.9  47.9  48.0  48.4  
BasketballDrill  22  70.3  78.7  78.7  78.7  78.2 
37  65.1  71.0  71.4  71.4  71.0  
BasketballPass  22  69.7  77.5  77.3  77.4  77.3 
37  61.8  65.6  66.8  66.8  66.6  
FourPeople  22  64.0  68.6  69.3  69.4  69.3 
37  62.3  64.6  66.0  66.1  66.3  
Ave.  63,39  66.97  67.61  67.67  67.63 
In fact, this extension improved the average rate by 3.58, 4.22, 4.28 and 4.24% for, respectively, q values of 1.0, 1.3, 1.5, and 2.0. From the results, we notice also that the hit rate have the best result of q value around 1.5 obtained with bonus couple of (3;2). Thus, in the remaining of this paper, we continue working with these bonus values.
4 Fast intra mode decision
As mentioned before, all the 35 modes will be tested, in the RMD stage through a Hadamard transform encoding in order to choose the best modes for the current PU. Here, the idea is to select the most probable modes, in order to limit the number of the modes to be tested and so speed up the intra prediction process. In fact, the generated histogram for each PU, presents the cost values Cost_{ m } of each intra mode m.
The value Cost_{ m } reflects a kind of probability of the intra mode m to be the theoretical optimal mode for the current PU, i.e., higher the value Cost_{ m } is, more probable the intra mode m is matching the optimal mode. Therefore, instead of going through all the modes, only a limited list of modes will be investigated. We refer to this list as the gradient candidate set, \( {\psi}_i^G \) where 0 ≤ i ≤ N _{ G }, N _{ G } being the appearance number of modes in the current PU. The gradient modes are ordered from most probable to least probable in the candidate set. The gradient generated modes are more precise for bigger sizes of PU as it has more points to approximate the most representative gradient in the PU. Thus, the number of modes N _{ G } has to be set accordingly. We set this number to 15, 14, 8, 6, and 5 for, respectively, PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64, as we noticed that under theses settings, we have good tradeoff between time saving and encoding performance.
The best modes obtained through the RMD process will form the RMD candidate set referred to as \( {\psi}_i^R \), where 0 ≤ i ≤ N _{ R }, N _{ R } being the number of modes. We keep the number of modes N _{ R } as it set in HM14.0, i.e., 8, 8, 3, 3, and 3 for, respectively, PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64.

Scenario 1: the best RMD mode is DC (i.e., DC mode is the better than all angular modes. Since DC mode has high probability to be best mode, it is not worth going on large testing of angular modes).

Scenario 2: the best RMD mode is planar. (The RMD performance of detecting planar is relatively low, so the reduction of the number N _{ R } should be relatively careful).

Scenario 3: the best three RMD modes are matching the three best gradient modes.

Scenario 4: the best RMD mode is the best gradient mode.

Scenario 5: the best RMD mode and best gradient mode are neighbors.
5 CU coding
In HEVC, for each CU of depth d and size of 2N × 2N, a CU split decision has to be performed. This decision is to evaluate if an encoding of the CU at that depth would be preferred rather than an encoding of the four subCUs at depths of d + 1 and sizes of N × N.
The split decision is done based on the smallest cost and the process is used for all the supported depth levels d, where d = 0, 1, 2, or 3, so that an optimal CU structure is generated at RD wise.
To deal with such complexity, the proposed scheme in this section, suggests predicting the nonsplit decisions allowing to avoid unnecessary encoding of subCUs.
5.1 Gradientbased scheme
To approach the optimal CU split decision, the proposed scheme estimates the spatial texture complexity of each CU and subCUs. The idea is relying, in a first hand, on the hypothesis that detailed texture area would suggest small CU sizes which implies to consider split decisions. In a second hand, flat area would suggest large CU sizes, which implies to consider no split decisions. The complexity estimation is generated through the pixelbased gradient values computed at the preprocessing stage.
Number of possible CUs depending on the size
Depth  2N  Nbr 

0  64  1 
1  32  4 
2  16  16 
3  8  64 
We notice that, in some cases of CUs, the texture complexity measured by T fails to approach the optimal decision. In fact, some CUs have a low texture but four relatively different texture complexities inside its four subCUs, respectively. In such a case, a split structure would generate a better RD cost. Therefore, in addition to T, we estimate the texture complexity of each of the four N × N subCUs by considering T _{ i }, the median value of the N × N gradient magnitude values in the ith subCU (1 ≤ i ≤ 4).
where α and β are two weighting factors.
where T _{ d } is a threshold that depends on the depth d of the current CU.
5.2 CU coding performance
In this section, we evaluate the performance of the proposed split scheme and refine the adopted criteria based on this performance. For that purpose, we compare the new scheme performance to the case of a theoretical optimal decision. The optimal decision, obtained by encoding the current CU twice (without split and with a split), represents the encoding case with the smallest RD cost.

No split matching rate (NS): this rate represents the cases in which we obtain a no split decision through the proposed criterion while the optimal decision is also a no split. In such cases, the proposed schemes succeed to speed up the encoding by avoiding unnecessary encoding of the subCUs without involving any loss in RD performance.

Split error rate (SE): this rate represents the cases of a no split decision while the optimal decision is a split. Such cases imply speeding the encoding but with a RD loss.
In order to investigate on the impact of the two factors T and V involved in the split criterion SpC, we present below the NSSE relation for different values of α and β (0, 1, and 2), with different values of the threshold T _{ d }.
For the threshold T _{d}, we choose the values of 65 and 2.2, respectively, for depths 3 and 2 as we notice that the algorithm achieves favorable results under these values.
6 Results and discussion
where T _{HM14} is the encoding time of HM14.0 and T _{Prop} is that of the proposed solution integrated on HM14.0.
As the implemented feature concerns mainly the intra coding, we present the results for an (AI) coding. We have set the number of modes in the candidate set to be tested in the RMD to 15, 14, 8, 6, and 5 for, respectively, the PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64. And for the RDO, we kept the tested mode numbers as defined in the HM (8, 8, 3, 3, and 3 accordingly).
Encoding efficiency for all intra coding, with a bit depth of 8
Class  Sequence  BDBR (%)  ΔT (%) 

A  Traffic  0.7  42.6 
PeopleOnStreet  0.4  38.1  
Nebuta  0.4  28.4  
SteamLocomotive  0  37.2  
B  Kimono  0.3  55.8 
ParkScene  0.4  47.3  
Cactus  1.3  46.2  
BasketballDrive  1.1  56.4  
BQTerrace  1  41.2  
C  BasketballDrill  1.5  43 
BQMall  1.5  37.3  
PartyScene  1  28.3  
RaceHorses  0.8  36.4  
D  BasketballPass  1.7  47.1 
BQSquare  2.5  36.5  
BlowingBubbles  1.3  34.3  
RaceHorses  1.2  34.2  
E  FourPeople  0.9  48.9 
Johnny  1.7  56.6  
KristenAndSara  1.6  55.2  
Average  1.1  42.6 
Performance comparison
Config.  SG [21]  OSG  FOSG  PropMD  PropSplt  PropOverall  

Class  BDBR  ΔT  BDBR  ΔT  BDBR  ΔT  BDBR  ΔT  BDBR  ΔT  BDBR  ΔT 
A  0.5  9.5  0.0  8.7  0.5  30.2  0.3  30.1  0.0  25.2  0.4  36.6 
B  0.5  12.7  0.5  11.8  0.9  33.6  0.9  33.8  0.6  38.2  0.8  49.4 
C  0.5  10.0  0.1  9.9  0.8  29.9  1.0  30.0  0.9  23.1  1.2  36.3 
D  0.8  12.6  0.5  11.9  1.5  32.3  1.4  32.2  1.3  25.9  1.7  38.0 
E  0.7  13.4  0.3  12.8  1.0  33.1  1.1  32.8  0.9  42.7  1.4  53.6 
Ave.  0.6  11.6  0.3  11.0  0.9  31.8  0.9  31.8  0.7  31.0  1.1  42.8 
We note here that Jiang has used different RMD and RDO iteration numbers. In our simulations, in order to have apples to apples comparison, we use the same iteration numbers specified earlier, for all the configurations. From the results, we see that SG achieves an average of 11.6% in time reduction with an increase of 0.6% in BDrate. This results seem less obvious in time saving than [21] but with less loss in BDrate. The small result difference is mainly due to the fact that we use here different iteration numbers and also because the intra prediction implementation in HM 4.0, used by Jiang, presents some difference with that in HM14.0. For example, the intra prediction supports now 35 modes for all PU sizes, unlike that in HM4.0 which supports 3 modes for 64 × 64 PUs.
The combination of SG with our gradient stage optimizations, referred to as optimized Sobel based gradient algorithm (OSG), achieves almost the same complexity reduction as SG configuration, with 11.0%. This configuration gives an increase in BDrate of only 0.3%. This result shows thus that the optimizations enhance the performance of the gradient HEVC intra modes detection and offers around 0.3% in BDrate. Additionally, we expose the performance of the combination of SG with the optimizations as well as the fast RDO feature. This combination, referred to as fast optimized Sobel gradient algorithm (FOSG), allows to reach 31.8% in time saving with an increase of 0.9% in BDrate. Such an algorithm shows then how the gradient information would be exploited to avoid unnecessary treatment.
In addition to the exposed combinations, we consider, for the performance evaluation, additional configurations since the proposed algorithms deals with the intra mode decision and the CU decision. The first configuration includes only the intra MD and will be noted as PropMD.
The second configuration includes only the CU split decision algorithm and will be named PropSplt. The configuration combining these two aspects will be noted as PropOverall.
As we can see from the results table, the configuration PropMD, gives 31.8% in time saving with 0.9% in BDrate. Comparing this result to that of FOSG confirms that Prewitt operator offers better intra mode detection and so better encoding efficiency than the Sobel operator. This confirms the advantage of a preprocessing solution based on the Prewitt operator, offering in addition more friendly hardware implementation, with better options for multiple data operations.
The configuration PropSplt, which presents a solution for CU coding, gives an average reduction time of 31.0% with a BDrate increase of 0.7%. Finally, the configuration PropOverall, combining both the intra MD and the CU coding presents a time reduction of 42.8% with an average BDrate increase of 1.1%.
We propose here that the profiling of execution time computed according to Eq. (25) aims to estimate the complexity reduction at the prediction stage compared to the Hadamard transform based prediction used in HM. Such time profiling does not aim to estimate the time execution effects of the two operators at the preprocessing stage. This is due to the fact that the preprocessing stage is about only 2% of the whole HM intra encoding.
7 Conclusions
This paper has presented a pixelbased gradient preprocessing stage for HEVC intra coding. The proposed algorithm uses Prewitt as a discrete differentiation operator in order to approximate the gradient values on the original picture. The algorithm generates a preferred direction for each pixel in each PU, from which we select a candidate set of modes to be tested at a ratedistortion optimization level. The mode selection is optimized through neighbor mode extension and adapted cost function that takes into account both the most appearing modes and those with higher gradient magnitudes. Moreover, we exploit the gradient information in order to speed up the best intra mode research process. For that purpose, we rely on different probability scenarios in order to limit the modes to be tested to only the most probable ones. In addition to the intra mode decision, we propose a gradientbased CU split scheme in which we set criteria to measure the texture complexity of each CU. The results show that the proposed algorithm achieves a time saving of 42.8% with an average increase in BDrate of just 1.1%.
As the proposed gradient preprocessing stage presents promising performances, we intend to further optimize the solution for hardware realtime application. In fact, we are finalizing an investigation work that allows to completely ovoid the pixel based research process of the intra mode from the lookup table presented in section 3.2, which is the heaviest step of the preprocessing stage.
Abbreviations
 AI:

All intra
 BDrate:

Bjontegaard delta rate
 CTB:

Coding tree block
 CU:

Coding unit
 FOSG:

Fast optimized Sobel gradient algorithm
 HEVC:

High Efficiency Video Coding
 HM:

HEVC test model
 JCTVC:

Joint Collaborative Team on Video Coding
 LCU:

Largest coding unit
 MPEG:

ISO/IEC Moving Picture Experts Group
 MPM:

Most probable mode
 NS:

No split matching rate
 OSG:

Optimized Sobelbased gradient algorithm
 PU:

Prediction unit
 RD:

Rate distortion
 RDOQ:

Ratedistortion optimized quantization
 RMD:

Rough mode decision
 RQT:

Residual quadtree
 SATD:

Sum of absolute transform difference
 SE:

Split error rate
 TU:

Transform unit
 VCEG:

ITUT Video Coding Experts Group
Declarations
Funding
No funding sources were available for these research works.
Authors’ contributions
ABH, TE, and AB conceived and designed the research. ABH performed the experiments. ABH and TE analyzed the data. ABH and TE wrote and edited the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 GJ Sullivan, JR Ohm, WJ Han, T Wiegand, Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 22(12), 1649–1668 (2012). doi:10.1109/TCSVT.2012.2221191 View ArticleGoogle Scholar
 B Bross, WJ Han, JR Ohm, GJ Sullivan, YK Wang, T Wiegand, High Efficiency Video Coding (HEVC) Text Specification Draft 10, in Doc. JCTVCL1003 (rev. 37), JCTVC 13th Meeting of Joint Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 2013Google Scholar
 J Lainema, F Bossen, WJ Han, J Min, K Ugur, Intra coding of the HEVC standard. IEEE Trans Circuits Syst Video Technol 22(12), 1792–1801 (2012). doi:10.1109/TCSVT.2012.2221525 View ArticleGoogle Scholar
 C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, F Wu, A Highly, Parallel framework for HEVC coding unit partitioning tree decision on manycore processors. IEEE Signal Process Lett 21(5), 573–576 (2014). doi:10.1109/LSP.2014.2310494 View ArticleGoogle Scholar
 C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, F Wu, Efficient parallel framework for HEVC motion estimation on manycore processors. IEEE Trans Circuits Syst Video Technol 24(12), 2077–2089 (2014). doi:10.1109/TCSVT.2014.2335852 View ArticleGoogle Scholar
 C Yan, Y Zhang, F Dai, X Wang, L Li, Q Dai, Parallel deblocking filter for HEVC on manycore processor. Electron Lett 50(5), 367–368 (2014). doi:10.1049/el.2013.3235 View ArticleGoogle Scholar
 C Yan, Y Zhang, F Dai, J Zhang, L Li, Q Dai, Efficient parallel HEVC intraprediction on manycore processor. Electron Lett 50(11), 805–806 (2014). doi:10.1049/el.2014.0611 View ArticleGoogle Scholar
 S Wang, A Rehman, K Zeng, Z Wang, SSIMinspired Twopass Rate Control for High Efficiency Video Coding (IEEE International Workshop on Multimedia Signal Processing (MMSP), Xiamen, 2015), pp. 19–21Google Scholar
 H Sun, D Zhou, S Goto, A Lowcomplexity HEVC Intra Prediction Algorithm Based on Level and Mode Filtering (IEEE International Conference on Multimedia and Expo (ICME), Melbourne, 2012), pp. 9–13Google Scholar
 H Lei, Z Yang, Fast Intra Prediction Mode Decision for High Efficiency Video Coding, 2nd International Symposium on Computer (Communication, Control and Automation, , Singapore, 2013). doi:10.2991/3ca13.2013.9 Google Scholar
 H Zhang, Z Ma, in 13th PacificRim Conference on Multimedia, Singapore, December, 2012. Lecture notes in artificial intelligence, ed. by W Lin, D Xu, A Ho, J Wu, Y He, J Cai, M Kankanhalli, MT Sun, vol. 1114 (Springer, Heidelberg, 2012), p. 157Google Scholar
 TD Silva, LV Agostini, LADS Cruz, Fast HEVC Intra Prediction Mode Decision Based on Edge Direction Information (European Signal Processing Conference (Eusipco), Bucharest, 2012), pp. 27–31Google Scholar
 SL Shen, Z Liu, X Zhang, W Zhao, Z Zhang, An effective cu size decision method for HEVC encoders. IEEE Trans on Multimedia 15(2), 465–470 (2013). doi:10.1109/TMM.2012.2231060 View ArticleGoogle Scholar
 X Shen, L Yu, J Chen, Fast Coding Unit Size Selection for HEVC Based on Bayesian Decision Rule (Picture Coding Symposium (PCS), Krakow, 2012), pp. 7–9Google Scholar
 H Zhang, Z Ma, Early Termination Schemes for Fast Intra Prediction in HighEfficiency Video Coding (IEEE International Symposium on Circuits and Systems (ISCAS), Melbourne, 2013), pp. 1–5Google Scholar
 H Zhang, Z Ma, Fast intra mode decision for high efficiency video coding (HEVC). IEEE Trans Circuits Syst Video Technol 24(4), 660–668 (2014). doi:10.1109/TCSVT.2013.2290578 View ArticleGoogle Scholar
 Y Kim, D Jun, S Jung, JS Choi, J Kim, A fast intraprediction method in HEVC using ratedistortion estimation based on Hadamard transform. ETRI J 35(2), 270–280 (2013). doi:10.4218/etrij.12.0112.0223 View ArticleGoogle Scholar
 AC Tsai, A Paul, JC Wang, JF Wang, Intensity gradient technique for efficient intraprediction in H.264/AVC. IEEE Trans Circuits Syst Video Technol 18(5), 694–698 (2008). doi:10.1109/tcsvt.2008.919113 View ArticleGoogle Scholar
 Y Zhang, Z Li, B Li, Gradientbased Fast Decision for Intra Prediction in HEVC (IEEE Visual Communications and Image Processing (VCIP), San Diego, 2012), pp. 27–30Google Scholar
 F Pan, X Lin, S Rahardja, K Lim, Z Li, D Wu, S Wu, Fast mode decision algorithm for intra prediction in H.264/AVC video coding. IEEE Trans Circuits Syst Video Technol 15(7), 813–822 (2005). doi:10.1109/TCSVT.2005.848356 View ArticleGoogle Scholar
 W Jiang, H Ma, Y Chen, Gradient Based Fast Mode Decision Algorithm for Intra Prediction in HEVC, in International Conference on Consumer Electronics (Communications and Networks (CECNet), Yichang, 2012), pp. 21–23Google Scholar
 M Jamali, S Coulombe, F Caron, Fast HEVC Intra Mode Decision Based on Edge Detection and SATD Costs Classification (Data Compression Conference (DCC), Snowbird, 2015), pp. 7–9Google Scholar
 A BenHajyoussef, T Ezzedine, A Bouallegue, Fast gradient based intra mode decision for high efficiency video coding. Int J Emerg Trends Technol Comput Sci 3(3), 223–228 (2014)Google Scholar
 A BenHajyoussef, T Ezzedine, A Bouallegue, Optimized Intra Mode Decision for High Efficiency Video Coding (International Conference on Image Analysis and Processing (ICIAP), Genoa, 2015), pp. 7–11Google Scholar
 B Bross, WJ Han, JR Ohm, GJ Sullivan, T Wiegand, WD4: Working Draft 4 of HighEfficiency Video Coding, in Doc. JCTVCF803_d6 (rev. 6), JCTVC 6th Meeting of Joint Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 2011Google Scholar
 HEVC reference model. https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/. Accessed 06 Aug 2016
 Y Piao, J Min, J Chen, Encoder Improvement of Unified Intra Prediction, Doc. JCTVCC207 (JCTVC 3rd Meeting, Guangzhou, 2010), pp. 7–15Google Scholar
 L Zhao, L Zhang, X Zhao, Further Encoder Improvement of Intra Mode Decision (Doc. JCTVCD283, in JCTVC 4th Meeting, Daegu, 2011), pp. 20–28Google Scholar
 L Zhao, L Zhang, S Ma, D Zhao, Fast Mode Decision Algorithm for Intra Prediction in HEVC (Visual Communications and Image Processing (VCIP), Tainan City, 2011), pp. 6–9Google Scholar
 F Bossen, Common HM Test Conditions and Software Reference Configurations (Doc. JCTVCL1100, in JCTVC 13th Meeting, Genova, 2013), pp. 14–23Google Scholar
 G Bjontegaard, Calculation of Average PSNR Differences Between RD Curves (Doc. VCEGM33, in ITUT VCEG 13th Meeting, Austin, 2001), pp. 2–4Google Scholar