A study of artificial speech quality assessors of VoIP calls subject to limited bursty packet losses
© Jelassi and Rubino; licensee Springer. 2011
Received: 1 November 2010
Accepted: 23 September 2011
Published: 23 September 2011
A revolutionary feature of emerging media services over the Internet is their ability to account for human perception during service delivery processes, which surely increases their popularity and incomes. In such a situation, it is necessary to understand the users' perception, what should obviously be done using standardized subjective experiences. However, it is also important to develop artificial quality assessors that enable to automatically quantify the perceived quality. This efficiently helps performing optimal network and service management at the core and edges of the delivery systems. In our article, we explore the behavior rating of new emerging artificial speech quality assessors of VoIP calls subject to moderately bursty packet loss processes. The examined Speech Quality Assessment (SQA) algorithms are able to estimate speech quality of live VoIP calls at run-time using control information extracted from header content of received packets. They are especially designed to be sensitive to packet loss burstiness. The performance evaluation study is performed using a dedicated set-up software-based SQA framework. It offers a specialized packet killer and includes the implementation of four SQA algorithms. A speech quality database, which covers a wide range of bursty packet loss conditions, has been created and then thoroughly analyzed. Our main findings are the following: (1) all examined automatic bursty-loss aware speech quality assessors achieve a satisfactory correlation under upper (> 20%) and lower (< 10%) ranges of packet loss processes; (2) they exhibit a clear weakness to assess speech quality under a moderated packet loss process; (3) the accuracy of sequence-by-sequence basis of examined SQA algorithms should be addressed in detail for further precision.
KeywordsVoIP QoE Artificial speech quality assessors Bursty packet losses
Early telecommunication networks were engineered in such a way that enables offering a steady perceived quality of delivered services during a media session. This goal is achieved through the reservation of resources needed before launching services' delivery processes. Telecoms operators are impelled to select and install suitable transmission mediums and equipment that guarantee a standardized perceived quality for their customers independently of their geographical location and service delivery context. In such a situation, a client request is solely admitted if there are sufficient resources to accommodate it in the transport network. However, the introduction of 2G cellular telecom systems that deliver services to moving customers induces difficulties to conquer the challenge of keeping a time-constant perceived quality. The principal factors entailing perceived quality fluctuation are handovers among access points and vulnerability of wireless channels to unpredictable interferences and obstacles. It is worth to note here that keeping a steady perceived quality over a mobile telecom system is achievable, but the remedies are unreasonably expensive and impracticable for telecom operators. In reality, mobile customers are more tolerant and tend to accept fluctuations in the perceived quality during a media session given their awareness regarding mobile network features. The integration of delay sensitive telecom services over the best effort IP networks obviously emphasizes the fluctuation of perceived quality of delivered services.
There are a wide range of vital network-related operations where the accurate assessment of time-varying perceived quality is desirable and helpful [1, 2]. A reliable measure of perceived quality can be beneficial before, during, and after service delivery. The offline usages of perceived quality measurement include network planning, optimization, and marketing. The online usages of perceived quality measurement include networks and services management, monitoring, and diagnosis. This ultimately indicates that the use of perceived quality help decision makers to select choices that maximize profitability while maintaining an optimal user's satisfaction. Under the scope of this work, we explore the accurate estimation of perceived listening quality of PC-to-PC and PC-to-PSTN phone calls, denoted often as VoIP (Voice over IP), that currently live in their blossoming period.
Basically, rather than the packet loss pattern itself, theoretical and representative models that capture the relevant features of packet loss processes are used for the estimation of the perceived quality for efficiency purposes. The characterization parameters are extracted from packet loss models that are calibrated at run-time using efficient packet-loss driven counting algorithms. Next, the effect of prevailing packet loss patterns can be judged using parametric assessment quality models built a priori. Typically, temporally-dependent packet loss processes are modeled using a simple, yet accurate 2-state discrete-time Markov chain, referred to as the Gilbert model, which has been well studied in the literature . In a few words, Gilbert model has NO-LOSS and LOSS states that, respectively, represent successful and failing packet delivery operation. The Gilbert model is wholly characterized by the Packet Loss Ratio (PLR) and the Mean Burst Loss Size (MBLS) . Typically, the higher the value of MBLS, the greater the burstiness of the loss process. For the sake of a more subtle characterization of packet loss processes, Clark  proposed a dedicated packet loss model that discriminates between isolated and bursty loss instances. The author defined adequate rules to classify loss instances either in isolated or bursty state and developed an efficient packet loss driven algorithm that enables to calibrate his enriched model at run-time. 'Appendix' section gives a survey about models of packet loss processes over VoIP networks.
This article explores the effectiveness of four single-ended bursty-loss aware Speech Quality Assessment (SQA) algorithms to evaluate the perceived quality of VoIP calls subject to distinct and limited bursty packet loss processes. To do that, a dedicated SQA framework has been set-up and a suitable SQA database has been built. It is crucial to note here that the perceived quality is automatically estimated using the double-sided signal-layer speech quality assessor defined in the ITU-T Rec. P.862, denoted as Perceived Evaluation of Speech Quality (PESQ), recognized by its accuracy to estimate subjective scores under a wide range of circumstances. The limitations of ITU-T PESQ have been considered in the design phase of the conducted empirical experiences, reducing its known defective behavior under 'generalized' bursty-packet loss processes (see below). To enhance measures' faithfulness, data filtering procedures have been applied on gathered raw ITU-T PESQ scores that involve outliers' detection and removal, coupled with the computation of the average scores among re-iterated experiences of each considered condition. Moreover, our study investigates the perceived effect of Comfort Noise (CN) and frequency bandwidth changeover required for speech material preparation. A statistical analysis has been conducted that enables drawing some conclusions about the rating behavior of existing bursty-loss aware SQA algorithms. As such, a set of potential clues for a better and consistent judgment accuracy of VoIP calls at run-time are identified and summarized.
The following sections are organized as follows. 'A review of SQA algorithms sensitive to packet loss burstiness' section reviews the four examined SQA algorithms that subsume packet loss burstiness. 'Set-up SQA framework and measurement strategy' section presents our set-up speech quality framework and measurement strategy. 'Speech material preparation and configuration parameters selection' section describes and discusses speech material preparation processes. A performance evaluation analysis is presented in 'Performance analysis of bursty-loss aware SQA algorithms' section. Concluding remarks and perspectives are given in 'Concluding remarks and perspectives' section.
A review of SQA algorithms sensitive to packet loss burstiness
The next sections introduce four SQA algorithms that will be thoroughly evaluated later. The shared feature of examined artificial speech quality assessors resides in their sensitivity to the different degrees of packet loss burstiness sustained by a VoIP packet stream.
VQmon: Voice Quality monitoring
where t i is the switching instant from (i-1)th to i th segment, RI(t i ) refers to the intermediate rating factor estimated during the interval [t i , ti+1], RP(t i ) refers to the perceptual instantaneous rating factor estimated at the instant t i . The time variable x refers to the prevailing instant in the speech presentation. The time constants τ1 and τ2 are used to calibrate the rapidity of the exponential decay at the transition from Good to Bad state, and converselyb. In the scope of VQmon, the value of RI is automatically estimated based on a directory of empirical subjective results that holds a mapping between the average PLR values and subjective rating factors.
At the end of a listened sequence, VQmon extracts packet loss characterization metrics, e.g., interval durations and their corresponding Good/Bad status and features, from a 4-state chain calibrated at run-time (see 'Appendix' section for further details). These control data are used to calculate the overall rating factor as follows, the built perceptual instantaneous rating function RP over a given Good and the next adjacent Bad segment is integrated over time. Then, the obtained value is divided by the interval duration. The resulting rating factor is referred to as average rating factor, R i (av), where the index i represents the number of i th good/bad segment (see Figure 2).
The limited subjective tests conducted by Clark showed that most of the time VQmon predicts with acceptable accuracy subjective rating of time-varying speech quality. In our opinion, the key shortcoming of VQmon resides in its incapability to accurately estimate RI value under bursty packet loss behavior. In fact, VQmon quantifies the effect of a bursty packet loss process solely using PLR value. As such, there is no subtle characterization and specification of the burstiness of the packet loss processes. This could lead to a wrong judgment of perceived quality because it has been subjectively observed that two distinct bursty packet loss patterns with identical PLR may lead to an obvious difference in the perceived quality . Moreover, the rapidity of the exponential decay/growing is hold static independently of the duration of preceding Good or Bad state and the magnitude variation of previous and current packet loss ratios.
The ITU-T defines in Rec. G.107 a computational model for use in planning of telephone networks, known as E-Model . Briefly, the E-Model combines a set of characterization metrics of the transport system and provides as output a rating factor, R, that quantifies the users' satisfaction. The ultimate objective of E-Model consists of giving a synthesized overview regarding the perceived quality delivered over a given telecom infrastructure. It has been subsequently extended to consider packet-based telephone networks and to operate as a single-ended speech quality assessor . The original release of the E-Model solely considers the negative perceived effect of independently removed voice packets. It has been recently evolved to account for bursty packet loss processes characterized using two newly defined parameters . The first metric, denoted as BurstR, is defined as the ratio between the undergone average number of successive missing packets and the expected average number of successive missing packets under independent packet lossesc. The second metric, denoted as Bpl, is a constant defined to consider the robustness of a given couple of CODEC and Packet Loss Concealment (PLC) algorithm to deal with bursty packet loss processes. The value of Bpl is derived a priori for each CODEC and PLC algorithm using subjective tests and a comprehensive regression analysis .
The previously defined metrics for the characterization of packet loss burstiness explicitly (resp. implicitly) consider the nominal average length of sustained loss instances (resp. inter-loss durations). This could raise a biased quality rating factor because the subtle details of packet loss patterns are definitely ignored. The next presented speech quality assessors will consider this concern in a more careful fashion.
As outlined before, the previously described speech quality assessors capture the burstiness of packet loss processes using global characterization parameters. Hence, the concrete packet loss pattern is poorly considered in the estimation of the listening perceived quality. To overcome this shortage, Roychoudhuri and Al-Shaer  proposed a subtle grained speech quality assessor, denoted as Genome, that more accurately considers the pattern of dropped voice packets. To do that, a set of 'base' quality estimate models which quantify the perceived quality entailed by the application of a periodic packet loss processese were developed, following a simple logarithmic regression analysis. The base quality estimate models are parameterized using the inter-loss gap and burst loss sizes. Specifically, for a packet loss run equal to 1, 2, 3, or 4 packets, a dedicated base quality estimate model, which has as input parameters the inter-loss gap size, has been built.
where Bn, ed(resp. Bn, ld) refers to the exponential (resp. linear) dependency measurement strategy. The value of Bn, ed(resp. Bn, ld) geometrically (resp. linearly) decreases as the distance between two missing packets increases.
Set-up SQA framework and measurement strategy
It is worth to note here that typical VoIP applications install packet loss protection mechanisms at application and/or CODEC levels such as Forward Error Correction (FEC) or interleaving, in order to recover dropped voice packets in the network. Moreover, an adaptive de-jittering buffer is usually deployed that enables smartly reducing losses caused by late arrivals. Both, packet loss recovery schemes and de-jittering buffer policies are implicitly considered in our context because the considered packet loss pattern is monitored at the input of the speech decoder which should receive speech frames at a fixed frequency. Note that the perceived effect of many recovery schemes and de-jittering buffer dynamics has been studied in literature [13, 14].
Empirical conditions for packet loss behavior using Gilbert model.
Packet Loss Ratio (PLR)
3, 5, 10, 12, 15, 20, 25, 30%
Mean Burst Loss Size (MBLS)
1, 2, 3, 4
16 male, 16 female
Total number of combinations
1 × 8 × 4 × 32
The measurement process is conducted using speech material that includes 32 standard 8 s-speech sequences, spoken by 16 male and 16 female English speakers. Such duration induces a maximal number of created 20 ms-voice packets equal to 400. Typically, such cardinality is insufficient to produce packet loss patterns with PLR and MBLS values close to theoretical values of PLR and MBLS set by users (see 'Appendix' section for further details). Moreover, unsent silence parts of a given speech sequence alter the initially generated packet loss pattern. This explains why we calculate and store the actual PLR and MBLS values for each couple of packet loss pattern and speech sequence (similarly as what it is done in  for video quality assessment). Table 1 summarizes conducted experiences, where a total number of 1024 scores have been produced. As indicated in Table 1, we evaluate the performance of each SQA algorithm using the ITU-T G.729 coding scheme that is the unique speech CODEC covered by all examined speech quality assessors. It worth to note that our primary concerns is to examine the behavior and performance of bursty aware speech quality assessors under common configurations. In the scope of this work, the performance evaluation and improvement of speech CODECs under bursty packet loss processes are secondary concerns. A personalized extension of considered speech quality assessors to cover a large set of shared speech CODECs will be investigated in our future work using subjective tests.
Speech material preparation and configuration parameters selection
A preparatory processing stage of speech material is necessary for a faithful assessment of speech quality. Indeed, manipulated raw speech sequence must meet a set of prerequisites for a consistent use of the ITU-T G.729 speech CODEC and the SQA algorithm defined in ITU-T Rec. P.862. In our case, raw speech material used to conduct our experiences was taken from the ITU-T P.Sup23 coded speech database . The original sampling rate of considered speech sequences is equal to 16 kHz, where each sample is encoded using 16 bits. However, the specification of ITU-T G.729 speech CODEC indicated that input speech signals should be coded following linear PCM format characterized by a sampling rate and sample precision, respectively, equal to 8 kHz and 16 bits. As such, a down-sampling algorithm should be executed before processing speech signals by ITU-T G.729 speech CODEC. To do that, we resort to the open source and widely used software Sox (SOund eXchange) that comprises three distinguished resampling technology, a.k.a. frequency bandwidth changeovers, denoted as polyphase, resample, and rabbit strategies.
In Figure 8, we see that there is a possibility to evaluate multiple down- and up-sampling iterations using distinguished resampling technologies. Moreover, speech sequences are not coded to filter-out the effect of coding/decoding schemes. Actually, additional factors can interfere with resampling technology, such as filtering schemes, echo cancellers, de-noising algorithms, encoding schemes, and voice activity detectors. Moreover, configuration parameters of each re-sampling technology, such as window features, number of samples, and cutoff frequency influence its behavior.
The histograms given in Figure 9b present the average MOS-LQOWB scores produced by each treated re-sampling technology. As we can note, polyphase outperforms candidates resampling technologies. This explains why the polyphase resampling technology has been used to down-sample our original speech material.
Apart the perceived effect of resampling technology, it is necessary to consider the VAD (Voice Activity Detector) algorithm included in ITU-T G.729 CODECh to discriminate between active and silence speech wave sections . This allows holding packet delivery processes during silence periods, which is highly recommended for the sake of utilization efficiency of network resources. The shortcoming of such a procedure consists of generating a mute-like signal between successive active periods in a way that could embarrass talker party. To generate more human-relaxing silence, ITU-T G.729 speech CODEC has been equipped with a CN capability. This option enables to periodically send at low rate Silence Insertion Descriptor (SID) packets that contain description about the ambient noise surrounding the listener party. As a result, the receiver will be able to generate more human-relaxing background noise.
Performance analysis of bursty-loss aware SQA algorithms
As we can see from (5), under no loss condition, the utilized Ie model induces a distortion amount equal to 22.45 rather than 11, which has been suggested based on earlier subjective-based testing . Moreover, following ITU-T Rec. G.107, the values of Ie should lay in the interval [0...40]. However, the Ie model given in (5) can generate distortion measures as high as 73 for a PLR greater than 30%. Following our preliminary tests, this value may be considered as the upper bound that can be accurately obtained using PESQ algorithm. As such, for PLR values higher than 30% a value equal to 73 is assigned to Ie. For a fair comparison, we set, respectively, the lower and upper bound of the E-Model to 22.45 (no loss condition) and 73 (PLR higher than 30%). Further calibration is needless for Genome since it has been initially developed based on PESQ.
Sequence-by-sequence methodology: It consists of directly computing ρ and Δ values using the measured and correspondent estimated scores. This strategy enables some understanding of the sensitivity of a given SQA algorithm with respect to a specific bursty packet loss pattern and the speech content of a given sequence.
Cluster-by-cluster methodology: It consists in creating a set of groups of measured scores according to shared features, such as PLR, MBLS, active and silence durations. For each measure and examined SQA algorithm, the estimated score is inserted into the corresponding group of the measured cluster. Finally, we calculate the average of measured and estimated scores of each produced cluster. The values of ρ and Δ are obtained by processing averaged scores of clusters. This strategy enables to filter-out deviations caused by speech content and specific packet loss distributions that may be required to satisfy specific needs of some applications and service providers, especially for planning purposes.
In the following, E-Model(1) and E-Model(2) denote, respectively, the E-Model designed to consider independently and bursty dropped packets . Q-Model(1) and Q-Model(2) refer, respectively, to the Q-Model where local burstiness increases linearly and exponentially, as a function of inter-loss gap (see 'Genome' section) .
Histograms given in Figure 11b summarize the obtained values of Δ using sequence-by-sequence and cluster-by-cluster measurement strategies. As we can see, the examined SQA algorithms induce significant deviation between measured and estimated scores. E-Model(1) induces the maximal value of mean deviation, which is expected since it has been designed for randomly removed packets. Q-Model(2) achieves the minimum average deviation. The accuracy of E-Model(2) is better than E-Model(1)'s since it subsumes more properly packet loss burstiness. As we can note, the minimum value of Δ is roughly equal to 6, which in our opinion is still pretty important. This constitutes the principal weakness and limitation of the treated SQA, which should be comprehensively tackled in future work.
Summary of calibrated models and their performance.
where a and b are the fitting coefficients that minimize the RMSE. RT and RR stand for transformed and raw rating factors, respectively. As we can see, Q-Model(1) and Q-Model(2) slightly outperform other competing strategies. The transformed (improved) models can be utilized for a better estimation of measured rating factor.
Performance of bursty-aware SQA algorithms under a large space.
Concluding remarks and perspectives
Existing bursty-aware SQA algorithms are basically designed to averagely approximate the subjective score of a given disturbing configuration. This signifies that they are unsuitable to accurately estimate speech quality on a sequence-by-sequence basis.
The strategy of the Q-Model achieves a consistent and reasonable performance under a wide range of conditions. Further investigation is necessary for a better and dynamic calibration. The Q-Model assures an elegant trade-off to subsume the perceived effect of packet loss at short- and long-terms. In our opinion, it constitutes a solid base for the development of a sequence-by-sequence SQA strategy, which considers speech content, packet loss burstiness, and 'recent' effect.
VQmon and E-Model(2) need more improvement to accurately judge perceived quality. Indeed, they seem to be more suitable for assessments over long periods since they utilize characterization parameters that need an important amount of measures to be stabilized. Moreover, both strategies definitely ignore temporal distribution details of loss instances.
The statistical property of Genome leads to some inaccuracy in the estimated scores. Preliminary conducted experiences revealed that it is insensitive to the distribution of (inter-loss, loss) couples.
As future work, we strongly believe that a hybrid speech quality assessor that utilizes additional meta-data about speech wave are required to improve accuracy of existing SQA algorithms such as silence/active patterns and feature of removed signals, e.g., voiced or unvoiced. Moreover, the location of a given loss instance should be considered during the evaluation processes. We believe that a perceptual packet loss pattern should be determined according to the concrete packet loss pattern and sequence features. Furthermore, it is crucial to extend existing speech quality assessors to cover a wide range of speech CODECs using subjective tests under longer bursty packet loss processes. This will enable identifying which assessment methodology is better as a function of the running speech coding scheme. The goal is the development of a versatile and highly accurate speech quality assessor of VoIP service on call-by-call basis.
Finally, it is important to note that the authors realize that extensive subjective testing should be done to tune, validate, and improve the competitive speech-quality assessment technologies. This constitutes a principal priority that will be addressed in our future work.
On Packet Loss Modeling over VoIP Networks
The metrologies of packet loss throughout VoIP calls show that voice packets are removed in bursts. Basically, bursty packet loss processes are modeled using either discrete- or continuous-time Markov chains. A simple, yet accurate 2-state discrete-time Markov chain, referred to as Gilbert model, or sometimes simplified Gilbert model, has been well explored in the literature (see Figure S1a, Additional file 1) . It was proposed to analyze noisy channels that introduce bursty bit errors. It has been subsequently extended to model bursty packet loss processes .
Besides capturing the features of bursty packet loss processes, the Gilbert chain can be utilized to synthesize packet loss patterns following user-defined PLR and MBLS values. Notice that a large number of packets should be generated to produce packet loss patterns that respect PLR and MBLS values given by the user. Figure S2, Additional file 1 illustrates the average deviation between specified and measured PLR and MBLS of ten generated packet loss patterns using distinct seed values, as a function of the number of generated packets. As we can observe, the greater the number of generated packets, the lower the deviation between specified and measured PLR and MBLS. This series of experiences showed that number of packets greater than 3000 packets achieves sufficient accuracy between target and measured PLR and MBLS values.
Besides this discrete-time Gilbert model, a continuous-time 2-state Modulated Markov Poison Processes (MMPP-2) can be used to characterize time-varying packet loss processes that alternate between low and high packet loss periods (see Figure S1b, Additional file 1). In state 0 (resp. 1), packet loss instances are introduced to the rendered packet stream following Bernoulli processes with average value equal to PLRLOW (resp. PLRHIGH). The parameters of the MMPP-2 model can be estimated at run time for a given data trace using a maximal likelihood estimator (MLE) . Multiple variants of the expectation-maximization (EM) algorithm have been utilized by statisticians to obtain such values . Li  developed a freely downloadable code of a variety of EM algorithms dedicated to calibrate MMPP model. The calibrated model can be utilized to judge the severity of packet loss burstiness and its variability.
To generate packet loss patterns using the MMPP-2 model, the PLR values can be randomly selected at the start time of each new period among a set of user-defined values. The sojourn period in each state follows an exponential distribution that should be parameterized by users. Figure S3, Additional file 1 shows multiple profiles generated using the MMPP-2 model described previously under several settings. As we can observe, MMPP-2 produces more realistic packet loss profiles under a large observation interval.
A loss instance that comprises more than two consecutive missing packets.
A single missing packet preceded by a loss event that has been happened at a distance smaller than a given constant g min. Clark recommends using a value equal to 16 10-ms voice packets.
A transition from sub-chain 2 to sub-chain 1 happens once an isolated packet loss instance preceded by gmin successfully received packets is detected. Clark  developed an efficient packet loss driven algorithm that enables to calibrate at run-time the proposed model. A set of metrics can be extracted from Clark model at the end of a monitoring period, e.g., PLR during gap and bursty loss periods and their corresponding durations. As depicted in Figure S4, Additional file 1, Clark accounted for the effect of discarded packets at the de-jittering buffer caused by late arrivals.
A loss instance is defined as a block of consecutive missing packets delimited by two successfully received ones.
The initial version of VQmon suggests the use of time constants τ1 and τ2, respectively, equal to 5 and 15 s . Recently, a more elaborated analysis conducted by Raake  indicated that time constants τ1 and τ2, respectively, equal to 9 and 22 s are more accurate to mimic users' behavior rating.
This definition implies that the delivery network introduces independent (resp. bursty) packet losses when BurstR is equal to (resp. greater) one. As a rule of thumb, the greater the value of BurstR above 1, the higher the intensity of packet loss burstiness. Notice that MBLS value of the expected independent packet loss processes is equal to 1/(1 - PLR) where the value of PLR is set to the measured packet loss ratio.
The variable CLP refers to the probability of losing a packet given that the previous one is lost.
A packet loss process that periodically drops a static number of consecutive speech frames preceded by a given inter-loss gap size.
Precisely, the value of α n is set to 1 if packet loss ratio till n th packet is below 4%, otherwise it is set to -1/2.
The recommended value of window size is equal to 8 20-ms voice packets.
Basically, all emerging speech CODEC include a built-in VAD.
Mean Burst Loss Size
Perceived Evaluation of Speech Quality
Packet Loss Ratio
root mean squared error
Silence Insertion Descriptor
speech quality assessment
Voice over IP.
We would like to express our sincere thankfulness to the anonymous reviewers for their constrictive comments that helped us to improve the paper during the submission processes. In particular, the authors feel committed to pursue investigation in some specific issues according to reviewers' recommendations.
- Rix A, Beerends J, Kim D, Kroon P, Ghitza O: Objective Assessment of Speech and Audio Quality: Technology and Applications. IEEE Trans Audio Speech Language Process 2006, 14(6):1890-1901.View ArticleGoogle Scholar
- Jelassi S, Youssef H, Pujolle G: Perceptual Quality Assessment of Packet-Based Voice Conversations over Wireless Networks: Methodologies and Applications. In Quality of Service Architectures for Wireless Networks: Performance Metrics and Management. IGI Global Publisher; 2009.Google Scholar
- Raake A: Short- and long-term packet loss behavior: towards speech quality prediction for arbitrary loss distributions. IEEE Trans Audio Speech Language Process 2006, 14(6):1957-1968.View ArticleGoogle Scholar
- Mohamed S, Rubino G, Varela M: Performance evaluation of real-time speech through a packet network: a random neural networks-based approach. Perform Eval 2004, 57(2):141-162. 10.1016/j.peva.2003.10.007View ArticleGoogle Scholar
- Clark A: Modeling the effects of burst packet loss and recency on subjective voice quality. In Proceedings of 2nd IP-Telephony Workshop (IPTel'2001). Columbia University, New York City, USA; 2001.Google Scholar
- ITU-T: Study the relationship between instantaneous and overall subjective speech quality for time-varying speech sequence: influence of a recency effect. 2000.Google Scholar
- Jelassi S, Youssef H, Hoene C, Pujolle G: Voicing-aware parametric speech quality models over VoIP networks. In Proceedings of 2nd IEEE Global Information Infrastructure Symposium (GIIS 2009). Hammamet, Tunisia; 2009.Google Scholar
- ITU-T: The E-Model, a computational model for use in transmission planning. Recommendation G.107 2005.Google Scholar
- Cole RG: JH Rosenbluth Voice over IP performance monitoring. In Comput Commun Rev. Volume 31. ACM SIGCOMM; 2001:9-24. 10.1145/505666.505669
- Roychoudhuri L, Al-Shaer E: Real-time audio quality evaluation for adaptive multimedia protocols. In Proceedings of Multimedia Networks and Services (MMNS 2005). Spain; 2005.Google Scholar
- Zhang H, Xie L, Byun J, Flynn P, Shim C: Packet loss burstiness and enhancement to the E-Model. In Proceedings of the 6th IEEE International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Towson, Maryland, USA; 2005.Google Scholar
- ITU-T: Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Recommendation P.862 2001.Google Scholar
- Turunen J, Loula P, Lipping T: Assessment of objective voice quality over best-effort networks. Comput Netw 2005, 28(5):582-588.Google Scholar
- Jelassi S, Youssef H, Pujolle G: Parametric speech quality models for measuring the perceptual effect of network delay jitter. In Proceedings of 34th Annual IEEE Conference on Local Computer Networks (LCN 2009). Zürich, Switzerland; 2009.Google Scholar
- Basterrech S, Rubino G, Varela M: Single-sided real-time PESQ score estimation. In Proceedings of Measurement of Speech, Audio, and Video Quality in Networks (MESAQIN2009). Prague, Czech Republic; 2009.Google Scholar
- Couto-da-Silva A, Rodriguez-Bocca P, Rubino G: Optimal quality-of-experience design for a P2P multi-source video streaming. In Proceedings of ICC'08. Beijing, China; 2008.Google Scholar
- ITU-T: Coded-speech database. Recommendation P.Supplement 23 1998.Google Scholar
- ITU-T: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP). Recommendation G.729 2007.Google Scholar
- Sun L, Ifeachor E: New models for perceived voice quality prediction new models for perceived voice quality prediction optimization for VoIP networks. In Proceedings of IEEE International Conference on Communications (ICC 2004). Paris, France; 2004:1478-1483.Google Scholar
- Jelassi S, Youssef H, Sun L, Pujolle G, NIDA: a parametric vocal quality assessment algorithm over transient connections. In Proceedings of 12th IFIP/IEEE International Conference on Management of Multimedia and Mobile Networks and Services (MMNS 2009). Venice, Italy; 2009.Google Scholar
- Sanneck H: Packet Loss Recovery and Control for Voice Transmission over the Internet. Technical University of Berlin; 2000.Google Scholar
- Rydenl T: An EM algorithm for estimation in Markov-modulated Poisson processes. Elsevier Comput Stat Data Anal 1992, 21(4):431-447.View ArticleGoogle Scholar
- Hui L: Workload Modeling in Grid Computing Environments.2010. [http://www.liacs.nl/~hli/gwm/index.htm]Google Scholar
- Carvalho L, Mota E, Aguiar R, Lima AF, de Souza JN, Barreto A: An E-Model implementation for speech quality evaluation in VoIP systems. In Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC'05). La Manga del Mar Menor, Cartagena, Spain; 2005.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.