The most popular KPI to assess call quality is the Mean Opinion Score (MOS). A Mean Opinion Score (MOS) serves as a numerical indicator reflecting the overall quality of a given event or experience, particularly prominent in the realm of telecommunications, where it gauges the quality of voice and video sessions.
Typically assessed on a scale ranging from 1 (poor) to 5 (excellent), MOS derives its value from the average of various individual parameters evaluated by human judges. While initially reliant on surveys conducted by expert observers, contemporary MOS calculation often employs Objective Measurement Methods, mimicking human assessments.
Specifically designed for assessing voice quality, MOS has long been integral to telephony, offering a means to gauge users’ opinions on call quality. Widely applied in Voice over Internet Protocol (VoIP) networks, MOS facilitates the assurance of quality voice transmission, detection of issues, and furnishes a metric for assessing voice degradation and performance.
With the surging popularity of VoIP services, MOS scoring becomes indispensable in ensuring client satisfaction and fostering sustained network growth. As the industry standard for measuring VoIP call quality, MOS considers factors such as jitter, latency, and packet loss, presenting a numerical representation of the subjective auditory experience.
Is MOS an objective or subjective measure?
The determination of whether MOS is an objective or subjective measure has undergone a transformative journey. Initially rooted in subjectivity, MOS was a subjective rating system for audio quality, with listeners in a controlled environment assigning scores on a scale from 1 to 5 based on their perception. However, recognizing the impracticality of relying solely on human opinions, especially in contexts like customer experience management and Service Level Agreement (SLA) monitoring, the concept of MOS expanded. Today, MOS is not exclusively derived from empirical studies with human participants but also encompasses computer-based analyses, marking a shift toward objectivity.
Originally, individuals in quiet rooms assessed audio quality through empirical subjective measurements. As user experience measurement demanded continuous evaluation, the scalability issues of human-based assessments became apparent. Consequently, MOS calculation evolved to leverage automated computer-based methods, replacing human assessors. Call-quality monitoring tools now employ algorithms to emulate the human scoring process, striving to replicate the subjective experience objectively. This adaptation addresses scalability challenges and privacy concerns, positioning MOS as a hybrid measure that combines its subjective origins with the efficiency of computer-based analyses.
The properties of MOS
The Mean Opinion Score (MOS) is characterized by specific mathematical properties and inherent biases, sparking ongoing debates regarding its utility as a single scalar value for quantifying Quality of Experience.
When MOS is obtained through categorical rating scales, akin to Likert scales, it operates on an ordinal scale, where item ranking is known, but interval values are not. Mathematically, calculating the mean over individual ratings is deemed incorrect; instead, the median is recommended. Despite this, practical applications and MOS definitions often accept the arithmetic mean.
Studies have shown that perception of scale items is not equal, as a result of factors such as language translation. Biases extend beyond non-linear scales to include a “range-equalization bias,” where subjects tend to span the entire rating scale during experiments.
Consequently, MOS becomes a relative rather than absolute measure of quality, making direct comparisons between values acquired in different contexts or test designs impractical. ITU-T P.800.2 underscores the importance of contextual reporting, emphasizing that MOS values from separate experiments should not be directly compared unless explicitly designed for comparison, and statistical analysis is employed to validate such comparisons.
What Affects VoIP MOS Scores?
Various factors play a crucial role in influencing VoIP Mean Opinion Scores (MOS), impacting the overall quality of voice calls:
- One pivotal factor is bandwidth, as higher bandwidth enhances voice quality by reducing congestion and allowing smoother data transmission.
- Network latency, associated with bandwidth and congestion, is another determinant, with high latencies adversely affecting voice quality and subsequently lowering MOS.
- Jitter, characterized by variations in delay, introduces congestion as the network struggles to adapt, leading to potential packet misalignment and degradation in VoIP call quality.
- Packet loss, a consequence of network competition for bandwidth, results in degraded voice quality and lower MOS scores.
- Network congestion, a temporary issue arising from a sudden surge in traffic, affects VoIP and various applications, contributing to a decline in MOS.
- Collisions, occurring when packets contend for network access, stem from faulty equipment or inadequate cabling, causing delays and congestion.
These interconnected factors collectively underscore the sensitivity of VoIP MOS scores to the intricate dynamics of bandwidth, latency, jitter, packet loss, network congestion, and collisions.
The limitations of Mean Opinion Score (MOS)
While Mean Opinion Scores (MOS) offer valuable insights and serve as a convenient reference for evaluating call quality, their application is not without limitations, particularly in the context of VoIP service metrics.
- Temporal information: One notable constraint is the insufficient representation of temporal information. MOS averages fail to elucidate whether packet loss occurs uniformly throughout a call or in concentrated periods, potentially leading to call termination.
- Comparability: Comparing MOS values becomes challenging when evaluating calls of varying durations, lacking a standard approach for comparability between, for instance, a 30-second call and a 10-minute call.
- Aggregation: The aggregation of a singular MOS value for calls of different lengths hampers the utilization of advanced statistics and Key Performance Indicators (KPIs) tailored to specific purposes. Condensing the entirety of a call and its nuances into a solitary quality metric is inherently limited, as a single MOS value may convey divergent narratives.
In light of these limitations, the reliance on a singular average MOS can be misleading, emphasizing the need for a more nuanced approach to capturing the multifaceted nature of the user experience in VoIP calls.
The applications and uses of Mean Opinion Score (MOS)
Mean Opinion Score (MOS) finds diverse applications in the realm of telecommunications, with notable uses spanning quality monitoring and alerts, fault isolation, and Service Level Agreements (SLAs).
In the domain of quality monitoring and alerts, MOS serves to swiftly notify operational and support teams of potential issues, aiming to pre-empt customer helpdesk calls and minimize operational costs. Its suitability lies in its ability to represent overall quality and capture diverse problems, producing values for short content segments, facilitating the detection of short-term quality fluctuations with minimal delays.
In fault isolation, while MOS can identify the existence of a problem, its limitations in providing detailed information necessitate additional measurements to pinpoint the root cause for effective troubleshooting. The multidimensional nature of media quality is emphasized, urging the need for explicit measurement and reporting of different dimensions.
MOS also plays a crucial role in SLAs, where it defines service levels and aids service providers in offering quality-based SLAs with content providers or customers. Long-term quality behavior analysis and trend monitoring, extending from single programs to weekly or monthly scales, are integral for SLA applications, enabling providers to differentiate pricing plans and compare service quality against competitors.
What affects VoIP MOS test scores?
The VoIP Mean Opinion Score (MOS) test scores are influenced by various factors, providing a nuanced perspective on call quality beyond a singular value. Three essential dimensions—Listening Quality (LQ), Conversational Quality (CQ), and Talking Quality (TQ)—shed light on specific aspects of the call experience, considering how it is perceived by the listening and talking parties during the conversation.
The MOS story becomes even more intricate when considering the methodology used to derive the final value.. Subjective MOS involves an empirical study, capturing the average scores assigned by a group of people. Objective MOS relies on automated calculations based on end-to-end quality measurements, ensuring a truly objective output rooted solely in measurements. Estimated MOS utilizes network planning models, exemplified by the E-Model in ITU-T G.107, providing an estimate based on objective measurements. Recommendation P.800.1 from the ITU-T delineates these descriptors.
To comprehensively assess call quality, it is imperative to consider the interplay of the three quality types and three processes, yielding nine distinct MOS values that collectively contribute to a comprehensive understanding of the VoIP experience.