Mobile Network Failures – Some Causes to Think About
Technology is moving on fast forward these days as communication services are connecting more users to smart devices. Leading telecom companies around the globe are focusing on the future by building out a 5G network to increase bandwidth for faster speeds, while many worldwide telecom users face frustration with poor or non-existent mobile network coverage.
In a 2016 Heavy Reading survey1 that included 54 mobile operators, it reports that hardly any mobile operator gets through a year without being impacted by an outage or degradation. Successful operators were managing to keep incidents between one and three a year, where the least successful operators averaged 15+ a year.
Mobile operators are spending around $20 billion a year dealing with incidents of network outages and service degradations.
Overview of the Cause of Outages
For mobile operators, there is a level of challenge in identifying the root cause of outages due to investment and operational competence. 21% of operators stated that their company had an excellent ability to understand the cause of outage and degradation. These mobile operators relied on multiple inputs and an automated notification system.
The top three causes of network outages identified by mobile operators were network failures, physical link failures, and network congestion/overloads. Among the specific causes raised in interviews are chip failures in network equipment due to broken air conditioning, board faults in network equipment, damage from theft of site equipment, power outages, and transmission equipment not working.
Most faults remain unsolved for more than three days generating:
- Associated high costs in terms of revenue lost during the service interruption
- Increased maintenance cost
- Quality of Service (QoS) impact
A Root Cause Analysis of Mobile Site Outages
Telecom networks consist of thousands of different hardware elements of varied types: base stations, servers, routers, modems, switching units, cables, fittings, cooling elements, energy elements, etc. An interesting root cause analysis of mobile site outage was conducted by Mesfin Geremew and Ephrem Teshale, from Addis Ababa University, using the Bayesian Networks Model. According to Geremew the discovery of the root cause of a problem is a challenge for technicians because the possible relationships between network elements (i.e., base stations, servers, routers, switching units, cables, cooling elements) are not explicitly well defined. Here is one example emphasized in the paper:
The relationship between the cooling system that controls a room temperature and all the hardware installed in the room is not defined though a failure in the air conditioner will probably affect a smooth running of the hardware.
This makes it challenging to use traditional programming solutions to automatically link failures to a root cause incident management systems, also known as Trouble Ticket (TT) systems. A root cause is needed to link an incident produced on a network element to another existing incident, creating a child-parent relation. The outcome of Geremew’s analysis is the “Model based Root Causes Analysis of BTS mobile site network outage.” The model can help inform network technicians of the real scope of failures and the probable existence of root problems, which optimizes resources and reduces recovery time.
Short-Time Cell Outages in Mobile Cellular Networks
In most cases, mobile operator technicians detect outages in real-time, through the reception of automated failure logs sent to the Network Operations Center (NOC). At times, there are some hidden outages that cannot be directly detected through logs, which are perceived by operators as hidden outages. Unfortunately, mobile companies become indirectly aware of these hidden outages through subscriber complaints or service traffic decrease.
Josip Lorincz, Luca Chiaravigliob, and Francesca Cuomob from the University of Split and the University of Rome, have studied the Short-Time Cell Outages (STCO) phenomena affecting base station in a mobile cellular operator network. They define STCO as:
A short-time outage of all or some BS cells (sectors) that lasts up to 30 minutes in a day, thus still guaranteeing more than 98% of operation. A type of outage that can’t be detected directly through an operator network monitoring system.
The researchers performed a statistical analysis of STCOs based on BS measurements of a complete operator mobile network, and their results showed that:
- STCOs impact the everyday life of an operator network
- Most of STCOs are recorded in urban areas compared to rural ones
- The impact of STCOs on users is higher in rural areas compared to urban ones
- The STCOs are correlated with the transferred traffic
This study’s results show that STCOs are affecting the everyday behavior of the network, and their impact tends to be higher during peak traffic hours and in the presence of adverse weather conditions. On average, the duration of STCOs is less than 2 minutes, thus imposing challenges on how to properly detect STCOs in real-time. STCOs occurrence is different in different areas, where urban areas experience a larger number of the STCOs, while STCOs in rural areas have a stronger impact on users.
Improving Your Network’s Resilience and Visibility
Network outages are a top factor influencing telecom subscriber’s churn. Mobile network outages occur due to many different reasons, and to some extent, BTS station outages and hardware elements failure are inevitable when operating a network. As these failures frequently result in degradation or complete service interruption, causing operator revenue losses and bad user experience, there are things telecom businesses can do to minimize and understand reduce the impact of failures.
Telecom sites have gained an increase of visibility into an array of possible root causes by implementing a telecom site automation solution. These solutions provide much better visibility into the performance of all the different equipment at remote telecom sites. The solution also includes the added benefit to remotely and quickly correct these problems without a service visit. Everyone’s goal is a more resilient and efficient network and it is worth the effort to learn more about telecom site automation and its benefits. Take the time to explore our website, case studies, white papers, and contact us to learn more about the advantages of telecom site automation.
Resources:
(1) Spirent. “Mobile Network Outages & Service Degradations – A Heavy Reading Survey Analysis.” https://www.spirent.com/assets/wp/wp_mobile-network-outages-service-degradations
(2) Geremew, Mesfin. Teshale, Ephrem. “Root Cause Analysis of Mobiel Site Outage Using Bayesian Network: the Case of ethio telecom.”https://pdfs.semanticscholar.org/4071/6012e0df4a051abfdfa6861fd5eaf7b68039.pdf
(3) Lorincz, Josip. Chiaraviglio, Luca. Cuomo, Francesca. “A Measurement Study of Short-time Cell Outages in Mobile Cellular Networks.” https://www.researchgate.net/publication/288933226_A_Measurement_Study_of_Short-time_Cell_Outages_in_Mobile_Cellular_Networks