Improving Process Safety Through Better Alarm Management

Dr. Gopal Jayaraman

Mr. Kaushik Jayaraman

When dealing with Process Safety, it is quite easy to envision a strong path to protection through better systems and maintaining their integrity, while overlooking the human factors that are involved in the continuous operations and maintenance of those systems. In an age of computerized digital information overload, it becomes prudent to cater to the personnel who are actually involved in handling the information. Alarms are a step -up towards informing these priority stakeholders, when they need to sit-up and take notice.

In our quest to improve systems, and add in the bulk technological advancements, we have come across a new labyrinth of hardware and related process information. The flood of information coming in to the operator leaves quite a few baffled. It is easy to understand how someone may miss a couple signals in the mayhem. This makes it necessary to help control the process data flow to the operator and make it possible to take actions, where required, when required. This, in essence, is the crux of the Alarm Management credo. Alarms should function to alert the operators to a change where an action by the operator is required at that point in time.

The implementation of a Process Safety Management (PSM) System revolves around the premise of preventing the Loss of Primary Containment (LOPC) of Highly Hazardous Chemicals (HHC) in a process. One of the key factors governing this is maintenance of the process within the intended operating envelope. The focus of alarms thus should be built around the identified safe operating process envelope. This paper showcases the issues faced by Essar Oil Limited at its Vadinar Refinery while moving towards better process control and safety through Process Alarm Management.

HISTORICAL RELEVANCE

Alarm systems were offshoots of the control systems developed to keep processes operating successfully to create desired products. This started with having "panel boards" in control rooms loaded with a slew of control instruments and indicators. These were in turn connected to sensors in the field, which relayed information through 4 - 20 mA current loops. Initially designed only for relaying information, the need for controlling process parameters drove the focus onto designing the panel boards on the basis of human factors and limitations. Alarms were formed as a displayed light beacon usually coupled with an audible horn (i.e. audio and visual signals) to notify the panel operator to the deviation from desired operational envelope either after the deviation or just on the verge of it. These were laid out in bunches reflecting the plant layout, making it easier to recognize and initiate corrective actions. It was deemed a simple matter to look at the entire panel board and decide where action was required.

But as plant complexities grew, these became more difficult to keep track of, and more operators and controllers were required. This increased the threat of human errors, and chances of failure. This was evidenced in high-profile incidents such as the Three-Mile Island Nuclear Meltdown (1979), Bhopal Gas Disaster (1984), Chernobyl Disaster (1986), etc. As part of increasing process integrity and, process and equipment monitoring, the advent of the Distributed Control Systems (DCS) both helped and complicated the alarm scenario.

FIGURE 1: CONTROL ROOM (GULF OIL CO., PORT ARTHUR)
[COURTESY: PETROLEUM REFINER (MARCH 1949)/ HYDROCARBON PROCESSING (JAN 2012)]

FIGURE 2: MODERN DCS-EQUIPPED CONTROL ROOM WITH CRT.
[COURTESY: ABB]

FIGURE 3: THREE-MILE ISLAND, 1979

FIGURE 4: UNION CARBIDE, BHOPAL, 1984

FIGURE 5: CHERNOBYL, 1986

With the DCS, the engineer need not spend more time, money and space for the alarm setup, but all that needed to be done was simply type in a command location, and a set-value for the parameter to be monitored. This led to everyone alarming everything they could get their hands on. Instead of increasing process integrity, this led to operators being over-burdened and ignoring relevant alarms due redundancy or flood situations. It was not after several high consequence process events that people stopped to take notice. Milford Haven (1994), BP Texas (2005) are but some in the long line of incidents because of improper Alarm Management, where either alarms were acknowledged and forgotten about, or just observed as the usual effect. Another consequence that was noted was the restrictions on display size. Multiple screens represented a single unit and no complete picture emerged from a single display (BP Texas). Alarms had to become reliable and fast, but on an average every parameter had alarms at 80% and 20% of the high and low limits, which was hardly any use to the stressed operators.

ALARM MANAGEMENT PRINCIPLES

Alarm Management is essentially the application of knowledge of human factors (scientifically) in the engineering of plant instrumentation and system information, for designing alarms to increase their usability for managing abnormal situations.

In most processes, maintaining asset parameter at a set value is not completely possible all the time. This is why, identifying and establishing a Process Envelope becomes very important. Most often, abnormal situations are caused by the process parameters running out of control, i.e. disturbances beyond the "Process / Operating Envelope" (normal operating range), which may be of minimal or catastrophic consequence.

It is the responsibility of the Operations team to identify the cause of the situation, quickly, and execute corrective actions in a timely and efficient manner. For this to be possible, they should have an idea of what could go wrong, and an indication of when it does go wrong. This is the fundamental principle behind an alarm. The ultimate objective is to prevent, or at least minimize, physical and economic loss through operator intervention in response to the condition that was alarmed. Alarms should be set-up if and only if there are relevant operator actions connected to them, but ultimate plant safety should not depend on operator response to an alarm.

Many process plants devote considerable resources to rationalizing of the alarm systems, which would allow the operators to effectively manage the process instead of merely responding to the alarms throughout the shift. A well-designed and adequately functioning alarm system is crucial to the existing plant process safety. But simply staying within the limits is neither sufficient nor useful without knowledge of the critical operating limits. This can be better explained by the two pictures shown here. The limits around the normal operating parameter are shown in the Figure 6, while Figure 7 below showcases the actions induced by the same.

In essence, the Alarm indicating an "abnormal situation" should be easy to understand and presented at a rate that the operator will be able to deal with to initiate a corrective action. Every alarm should have a related corrective action.

Studies by the Abnormal Situation Management (ASM) Consortium have shown that worker actions cause 42% of the abnormal situations or upsets in process operations. A prime example would be the Three Mile Island Incident (1979), when operators could not understand the exact fault due to a "lit-up" panel and took corrective actions that actually led to the incident. The ASM Consortium also notes that 36% of the upsets can be related to Equipment problems, with half of these attributable to the equipment or process units functioning outside of their "operating envelope".

ISSUES & CHALLENGES

Alarm Management should start from the design phase itself, in order to help in the plant console design, layout, training, manual preparation, etc. But in most cases, the plants considered for alarm management are older, and the whole study should be started afresh. Besides, before the plant starts operation, it is difficult to design or configure optimal alarm settings. There may be too many alarms where a single parameter may reliably govern the process (consequential), or there may be too few to give the operator an accurate view of the actual process variance.

FIGURE 6: ILLUSTRATION OF THE LIMITS SURROUNDING A PARAMETER (OPERATING ENVELOPE)

FIGURE 7: OPERATION LIMITS AND ALARM SET POINTS

Most standards and user documents, including from the American National Standards Institute (ANSI)/ the Instrumentation, Systems, and Automation Society (ISA) [ISA 18.2 Management of Alarm Systems for the Process Industries], detail the "What" and not the "How" of Alarm Management. Though, all of the standards or Engineering Practices (including the Engineering Equipment and Materials Users Association (EEMUA) 191, International Electro-technical Commission (IEC) 61508 and 61511) provide the basic framework for the implementation of the alarm management system at the facility. Most Distributed Control Systems (DCS) come with an abundance of unstructured alarms.

The Refinery (or any other Hydrocarbon Processing Industry (HPI)) is among facilities that undergo constant changes in an effort to improve their productivity and market share. This, in turn, initiates a change in the operating envelope. A process/ operating envelope (Figure 6) is a collection of boundary limits that, when exceeded, put the integrity of assets at risk. It becomes a challenge to monitor the change in the envelope and maintain the Alarm levels at the point where operator intervention is meaningful.

There is a lack of Process Safety Information (PSI)

FIGURE 9: DECISION ON ALARM SETTINGS OFTEN IS A DELICATE BALANCING JOB

Operating Process Envelope
Process Safety Information
Plant Ageing/ Change
Framework/ Methodology
Over-alarming
Nuisance Alarms
Chattering
Consequential
Duplicate
Stale
Alarm Prioritization
Routing
Suppression
Spurious Alarms & Floods
Inappropriate configuration
Operator Stress
Training & Response
Operational Burden, etc

There is a lack of Process Safety Information (PSI) systems that defines the operating parameters, their limits, the alarm set points, the effects of variance beyond the Maxima and Minima values, the reaction time or priority, etc. Building up such a database forces the investment of a lot of time and working personnel that may be otherwise busy. Often times, it is a question of quantity vs. quality when deciding whether you use all your manpower for improving production. Is there a chance that your fine tuning may result in better control and response for the future? The effects of such an analysis will often tend to favour production as often against the intangible/ invisible safety benefits.

Alarm Flood Situations where the operators get so many alarms coming in that they don't know which actions to take first (over-alarming). This is where Alarm Prioritization and Rationalization come in. Yet, current systems may not reach effective levels for quite some time and need constant monitoring and corrections.

Relevance of Alarms also plays a major role in keeping Operators focussed. For example, an alarm with a lower set-point may give early indication of the abnormal situation developing but a very low value may add to the Nuisance value. The abnormal situation could take hours to develop and Operators may feel free to ignore or become desensitized to the alarms till values reach their self- managed set-points (stale).

Alarms have been observed to chatter, if operating too close to the envelope limits. These may also increase the number of alarms/ deviations from the operating envelope. It is essential for control room operating personnel to distinguish between false and genuine boundary excursions, and ensure that the deviations are relevant. This would make it achievable to minimize process upsets or loss of containment.

Operators must sometimes bypass or temporarily render alarms and emergency shut-down devices inoperative so they can either be tested to ensure dependable operation or repaired. Because the process unit typically remains in operation while these alarms or emergency shut-down devices are temporarily out of service, the ability to monitor the process units during this period for possible process upsets or possible need for shutdown of the process is diminished. As a result, it is important for a refinery to minimize the bypass time, communicate awareness of the degraded operational safety condition to all refinery personnel who need to know, and keep records documenting the rationale for, and confirming the restoration of, the bypassed components.

STEPS IN ALARM MANAGEMENT

Alarm Management may start at any stage of a Processing Plant's lifecycle. For Essar Oil, the journey started in 2009, with the establishment of a Process Safety Management (PSM) System, and the creation of a PSI database. As this neared completion in 2010-2011, the need for a proper Alarm Management system was felt, and acknowledged through various high-level committee meetings, and Internal and External Audits. Any facility hoping to move along this path must first acknowledge the problems that exist. At the Refinery, Alarm Management was done in three phases, viz.

PHASE 1:

ALARM SYSTEM PERFORMANCE STUDY

Assembling the data - from logs & system backup. The Process Safety Information (PSI) Database creation went a long way in speeding up the process.
2 Study of the existing Alarm Philosophy & Settings is essential to give an in-depth understanding of the methodology employed by the package vendor or licensor for your plant. It may be prudent to refer any Process Hazard Analysis (PHA) studies conducted for the facility. This helps in identifying those alarms that have been set as safeguards and need to be addressed on higher priority.
3 Analysis of the system based on Performance Benchmarks/ Standards. While it was good to know where we currently were, the comparison was crucial to give a perspective on where we could be. The standards chosen for the purpose were the contemporary EEMUA 191 and IEC 61508. Both give clear limits and Recognized and Generally Accepted Good Engineering Practices (RAGAGEP) concerning acceptable alarm rate for normal operating and shutdown/start-up cases.
Categorization of alarms and resolution of any existing "Bad Actors" was done, as part of the Analysis. For this further aid was received from the DCS Vendors such as ABB and Yokogawa.

ALARMS RATE (FOR)	ACCEPTABLE	MANAGEABLE
10 minutes	1	2
1 hour	6	12
1 day	150	300

PHASE 2:
PERFORMANCE IMPROVEMENT PROJECT

Creation of a DCS Alarm Management Guideline to establish standards for managing alarms generated in the process units, and to ensure that operators don't miss alarms through analysis, optimization, and redesign, if required.
Setting up clear Alarm Philosophies. After studying the existing Alarm Rates with the benchmarks, a clear strategy for reaching or crossing the benchmarks has to be created. This was achieved through the Guideline document, where clarity on Rationalization and Prioritization methodologies was provided. Three features were emphasized:

Alarms must require a response from operator
Multiple alarms should not signify same/ similar things/parameters
Alarm must only be activated in an abnormal situation, not on expected / sequential case of operation

Training and Awareness sessions on Alarm Management were conducted, especially for Operations and Maintenance personnel.
Alarm Management System Documentation - Full system redesign by Process and Instrumentation departments
Alarm Rationalization studies focused on identifying and reducing the percentage of alarms which:

do not require operator action. These can be kept in history modules as records.
are repetitive. Actions can be in the form of setting dead bands, changing set-points, suppression/ activation, etc.
have incorrect set-points. Periodic review (at least once a year) should be done to ensure that the set-points are accurate. An initial study by Process ensured the database data was correct. Few discrepancies between field and records were noted and corrected. Setting up of Dead-bands may also afford more tolerances against chattering, especially for Analog field instruments. The following dead-bands were used with the ABB systems:

SIGNAL TYPE	FLOW	LEVEL	PRESSURE	TEMPERATURE
Dead Band %	5	5	2	1

are consequential; or have multiple alarms related to same parameter. Deviation alarms (Range: 5%) were used with parameters assigned through multiple tags (as is the case with 2oo3, 2oo2, 1oo2 transmitters)
are related to non-functional/ idle/ stand-by/ out-of service equipments. Alarm suppression schemes can be used to automatically suppress, inhibit, or activate alarms based on the equipment or plant status.

All of the results were tabulated for records.

Alarm Prioritization is an important factor in determining the system effectiveness. This helps the panel operator decide where to focus. The prioritization should be based on the consequences of alarming, i.e., whether the ultimate consequence will be an effect on Human/ Process Safety, Environment, or Financial; and how much time will the operator take to respond to the situation against what time is required before shutdown.
The Alarm Prioritization was also done using a consequence matrix, consistent with our existing Incident Reporting Matrix, while assigning high priorities to certain critical tags such as the Emergency Shutdown (ESD) systems, Fire & Gas (F&G) Detection systems, etc. (Figure 10)
These rationalization and priorities were recorded and any changes made to the system had to go through a Management of Change (MOC) process, especially those concerning a change to alarms associated with critical parameters, ESD, and F&G systems. All others are monitored through an Alarm Change Record.

PHASE 3:

LIFECYCLE MAINTENANCE & PERFORMANCE MONITORING

FIGURE10: ALARM PRIORITY THROUGH CONSEQUENCE MATRIX
ANNEXURE -3 CONSEQUENCE MATRIX

Statistically performance of the alarm system reveals some rogue alarms and it is very common to find that 80% of the problem is down to 20% (or less) of the alarms! As per EEMUA Guidelines, alarm priorities should be such that High Priority alarms should only constitute 5% of the total alarms. Our efforts were towards achieving this objective.

FIGURE 11: DEGREE OF OPERATOR RESPONSE
The EEMUA Reference Standard also suggests an additional performance benchmark of:
- less than 10 Low Priority Alarms per hour;
- less than 2 Medium Priority Alarms per hour; or
- less than 5 High Priority Alarms per shift
Currently the Refinery is utilizing 2 different Alarm Information Management Software, viz. Matrikon AIMS systems for the majority of the DCS & ESD systems, and SSM Infotec AIMS for some ESD systems; essentially for ensuring that the work in Alarm Management continues. The software provide different analysis including alarm rates, distribution, performance, chattering or other stand-alone issues.
After the Alarm Management system has reached normal operation stage, it is essential to monitor it for changes or discouraging complacency. The AIMS systems help in continual improvement of the systems through constant monitoring, as with experience more history of operation and equipment develops and helps in streamlining the process. This has been very visible in the Refinery due to the volume of changes monitored during the Revamp (2011), Expansion, and Commissioning (2012 - 2013) activities.

EFFECTS/ BENEFITS

The major actions taken at our site to achieve adequate alarm rates (in addition to the above) have been to disable/ suppress alarms (where no operator action was required), set Alarm On-Off delay time (to take care of chatter), or in some cases, a change in the logic. A drastic reduction in the number of alarms was noticed. At the Refinery site, the numbers reduced from an initial 2456 alarms across 19 Units per hour to 455 alarms (June to August 2012). At the Power plant, there has been a more drastic change from an initial number around 9000 to around 500 alarms per day. The process is still under progress and review, keeping in view the almost constant modifications and expansion activities that are in progress at Vadinar.

It should be noted that the whole process has lent a better understanding of the system among the Operations, and other Maintenance personnel. It has improved operator response time, and complimented the Process Safety Information. There is a better monitoring of process excursions, which is being developed to monitor Critical Operating Parameters (COP). The reduction in alarms also improves operational situation response, affecting production and emergency response activities positively, as has been noted in Process Mock Drills, conducted in the plants. Another positive benefit has been increased reliability of equipment and the process. The alarm studies led to identification and correction of a few logic changes consistent with design and operational safety.

CONCLUSION

The demands on Operations are increasing due to a variety of factors, such as: (A) the need for process operation close to maximum efficiency; (B) higher costs of process interruptions; (C) more complex processes; (D) lower safety margins; (E) environmental regulations; (F) fewer operators; or (G) higher staff movement (less experienced operators).

Alarms and instruments form a vital link in communications between important parts of the process and the operator. Without properly functioning alarms and instruments, it is difficult to know the operating status of the process and safety equipment. It is essential that unique programs be present for the care and attention of these alarms and instruments. The Mechanical Integrity program should encompass these and any bypasses be done through Risk Assessment after appropriate level authorizations. Any alarm changes should go through a proper Management of Change process.

While alarm rationalization and prioritization studies do often end up reducing the alarm rate, the essential thing to remember is that the Alarm Management process is not towards only reducing the number of alarms, it is towards optimizing so that operators are able to react properly to avoid any loss of containment. Resorting to suppression is not suggested. Nuisance alarms can be significant early warning signs to maintenance issues on critical plant process and safety equipment.

Monitoring the Alarm Rate as part of the Process Safety Performance Indicators (PSPI) can go a long way towards helping maintain and fine tune existing systems. In the newer systems, Alarm Management is becoming an integral component of the initial design itself, and is being incorporated in the Vendor Packages, such as advertised by ABB, Honeywell, etc. It should also be noted that the establishment of Critical Operating Parameters (COP) and the Process Envelope will also be beneficial in understanding the process better, leading to better monitoring of the process excursions beyond stated envelope, and ultimately reducing/ stopping Loss of Primary Containment. It should be always remembered that alarms are a part of the overall plant Layers of Protection, and as such should be maintained reliably.

REFERENCES

OISD-STD-152:
International Electro-technical Committee, IEC 61508: "Functional Safety of electrical/electronic/ programmable-electronic safety related systems"
Engineering Equipment & Materials Users Association, EEMUA 191: "Alarm Systems - A Guide to Design, Management, and Procurement"
UK Health Safety Executive (HSE), and EEMUA: "Better Alarm Handling" presentation notes
American National Standards Institute/ Instrumentation, Systems, and Automation Society, ANSI/ ISA 18.2: "Management of Alarm Systems for the Process Industries"
Abnormal Situation Management (ASM) Consortium: "Effective Alarm Management Guidelines"
Chemical Processing: "Heed the Warning: Optimizing plant performance", Chris Stearns, Honeywell Process Solutions
ABB Limited: "Alarm Management: A Practical Guide to Users", Peter Bruce, John Noon, ABB Engineering Services
ABB Limited: "Oil & Gas Case Study: Don't be alarmed! - Effective control system analysis"
ABB Limited: "Human Factors in the Control Room" factsheet
IChemE Webinar: "Alarm Management and Operator Graphics", Peter Andow, Honeywell Process Solutions
ARC Advisory Group, Yokogawa ARC Strategies: "Alarm Management Strategies"
Honeywell Process Solutions: "The Early Event Detection Toolkit", Don Morrison, Wendy Foslien, Ward McArthur, Peter Jofriet, P. Eng
Honeywell Webinar: "Return on Imagination: Operations Excellence", Chris Stearns, Honeywell
ASM Consortium: "Operator Interface Requirements: Going Beyond the Obvious to Achieve Excellence", Peter Bullemer, Dal Vernon Reising (Human Centered Solutions, LLP), and Melvin Jones (Sasol Synfuels, Ltd)
Control Magazine: "Alarm Management improves Plant Operations", Dan Hebert (Editorial)
OECD-CCA Workshop on Human Factors in Chemical Accidents and Incidents (Germany, 2007): "Alarm Management in Process Industries", Dr. Hasso Drathen (Bayer Technology
Services), Dr. Hans Kruz (Degussa), presentation notes
Invensys: "Why is Alarm Management Required in Modern Plants?", Stan DeVries, Director, Energy Management Solutions, Invensys Operations Management
PAS and NovaTech LLC: "ANSI / ISA 18.2 Alarm Management Webinar Series", Joe Shingara, Chris Kourliouros, Kevin Johnson (NovaTech), Bill Hollifield (PAS)

[*Gist of the Technical Paper presented on ALARM MANAGEMENT and the Experience in ESSAR OIL REFINERY during a PSM SEMINAR at Kualalampur , Malaysia in April 2014.]

AUTHORS:

DR. GOPAL JAYARAMAN, Head (HSE), Energy Business, Member: ASSE
The author has helped develop the HSEF Department together with the Essar Refinery Integrated Management System for the Refinery and is currently working on establishing a World Class Safety Performance for Essar Energy Business (Essar Refinery, E&P and Power). He has published Papers and received numerous awards for the Essar Group.
Dr. Jayaraman graduated in Chemical Engineering (1971) and obtained PhD in Environment Science & Ecology with special focus in Oil & Gas Sector. He has over 40 years of Industrial Experience in Operation, Project Execution and HSE in Oil, Gas & Petrochemicals (Upstream & Downstream).
He can be reached by mailing to Jayaraman.Gopal@essar.com/ jayagopal6@gmail.com (LinkedIn/ Facebook Profiles)

MR. KAUSHIK JAYARAMAN
Dy. Manager (Process Safety Management), Essar Refinery
Member: ASSE, IChemE, IIChE
The author has been a part of the Process Safety Management division since 2011 and has worked in development and implementation of the Process Safety Management System at the Refinery.
Mr. Kaushik completed his graduation (B. Tech, Petrochemical Technology) and post-graduation (M. Tech, Gas Engineering) studies with Gold Medals and has Diplomas in Fire Safety, Industrial Safety, Project Planning Management, and Work Place Safety. He has Industrial Experience in Pipelines, Refining, and Safety (Operational, Process, and Occupational) in Oil & Gas.
He can be reached at Kaushik.Jayaraman@essar.com / jkeins@gmail.com (LinkedIn/ Facebook Profiles)
Other Contributors
1. Mr. Rajesh Shah (Head-Safety, Essar Oil Ltd),
2. Mr. Prakash Pathak (JGM, Instrumentation, Essar Oil Ltd),
3. Mr. Rajesh Mandaliya (Sr. Manager, Instrumentation, Essar Oil Ltd),
4. Mr. M. K. Sharma (Head-Operations, Essar Power â€“ Vadinar Power Company Ltd)

Responsible Care (RC) is a Chemical Industry Initiative, which calls on Companies to demonstrate their commitment to improve all aspects of performance, which relate to protection of health, safety and environment.