PSM Lessons from a Major Reformer Furnace Failure

PSM Lessons from a Major Reformer Furnace Failure


RDC Prior, SHExcellence CC, Johannesburg, South Africa


In May 2013 a major failure occurred during the start-up of a Primary Reformer Furnace in an oil refinery in Southern Africa. Whilst no human injuries and environmental damage occurred, the damage to the Furnace amounted to about R70 million (GBP 5 million). Downtime is expected to be at least 6 months. The author investigated the incident on behalf of the company and found that an inadequate understanding of process safety management principles and particularly the impact of Human Factors were major contributors to the incident.

Reformer furnace incidents are reasonably common. Special circumstances in the May start – up of the Furnace made that particular start – up very sensitive to errors. The investigator used the “Layers of Protection “(barriers) methodology to get a full understanding of the failure. The barriers of Hardware (Plant, equipment, instrumentation), Systems (Procedures) and People were each examined for an appropriate standard being in place and how well the adherence to required standards took place at the time of the incident. The investigation showed that multiple failures of the three types of barriers took place on the morning of the Reformer start-up. Whilst mistakes were made on the fateful morning, the seeds of the incident were sown years before.

Root causes of the incident included:

 Failure to upgrade and improve critical instrumentation over 24 years
 Failure to directly measure key parameters
 No suitable and relevant operating instruction for the start-up
 Inadequate hazard identification and resulting control measures
 No understanding of “defence in depth” using multiple barriers
 Use of trainee for a critical start – up
 Failure to interpret operating information correctly

The author will describe the incident and the investigation. The paper will explain how lack of understanding of the “Layers of Protection” concept, long and short term failures in the individual barriers and, in particular, Human Factor failures contributed to the massive equipment loss. Graphs of the key measurements and photographs will be used to illustrate important points.

Keywords : Systems failure, Layers of Protection, human factors, outdated design, PSM understanding


A Southern African refinery receives natural gas and condensate from an offshore oil rig and processes these materials into a variety of oils and petrochemicals. The refinery has been operating for about 24 years. The refinery also receives imported condensate for processing. Following an initial process step of recovering propane and butane the remaining gas called Lean Natural Gas (consisting mainly of methane) is passed through a two stage Reforming process where, in the Primary Reformer Furnace, the methane gas is partially converted to hydrogen, carbon monoxide and carbon dioxide using steam and a nickel catalyst. The catalyst is loaded inside the furnace tubes of the Primary Reformer. In the Secondary Reformer further conversion takes place using a Fischer Tropsch catalyst. There are three identical Reformer Trains.

On Friday 17 May 2013 the Reformer Train 1 was being started up when an operator, on checking the furnace tubes prior to installing more burners, noticed that some tubes were bent and ruptured. The plant was tripped immediately.

When cold, the Primary Furnace was inspected and of the 228 tubes, 139 were visually damaged, 23 were damaged as measured by Eddy Current tests leaving 66 still serviceable. The gas feed line to the Reformer tubes was distorted and there was some damage to the feed superheater. Apart from loss of production the material replacement cost of the tubes alone is expected to exceed R70 million (GBP 5 million). The Primary Reformer and some of the damaged tubes are shown in Photographs 1 and 2.

Photograph 1 – Primary Reformer Furnace


Photograph 2 – Damaged tubes in Furnace

The Management of the Refinery wanted an independent investigation of the event to establish the root causes and, in particular, the broader issues reflected in the incident. The author, a Chartered Chemical Engineer, with extensive experience in running hazardous plants and doing investigations into major incidents was contracted to investigate the incident. Two site visits took place after the incident. This paper concentrates on the Process Safety failings and possible solutions rather than the investigation itself.

Detailed Description of Primary Reformer

The main flows and instrumentation associated with the Primary Reformer are shown below in Fig 1.

Fig 1 Detailed process flow through Primary Reformer

In normal operation the feed gas is fed via a heat exchanger (ES105) and two preheaters (EM104, EM101) to the furnace tubes. Steam is added to this line before EM101. The steam and gas undergo the endothermic water gas reaction in the tubes. Heat is provided by up to 84 gas burners which heat up the furnace space. The reaction gases pass through 228 tubes in parallel. The flue gases are extracted via an extraction fan and vent to atmosphere. A slight vacuum is maintained in the furnace. In the start-up mode nitrogen gas is introduced into the gas feed line and is heated up so as to heat up the furnace. There are limits to the rate at which the temperature can be raised so as not to damage the furnace and the tubes.

The tubes are 12m long and 101,6mm diameter. The tubes are made austenitic stainless steel and most of the tubes were of G- X40NiCrNb3525 composition. This material resists high temperature creep by the inclusion of niobium and other carbide – stabilizing elements. A new material containing some titanium was introduced in 2013 with the expected advantage of longer tube life. In the February 2013 shutdown 18 tubes were replaced with the new material. The 228 tubes in the furnace at the time of the incident were a mixture of different aged materials and new / old material types. The table below shows the situation at the time.

Table 1 – Primary Reformer tube history


Layers of Protection

The risks of process hazards resulting in major events (fires, explosions and toxic releases) are, in principle, minimized by good design and the application of process safety principles. Older plant was not subjected to the same level of scrutiny that new plant is today. The adoption of process safety principles is influenced by the country legislation, the linkages of the organisation to global companies and the vision / policies of the particular organisation.

There are a number of models which illustrate the idea of “Layers of Protection”. The basic unmitigated risk posed by a process hazard is reduced by a number of “barriers” which either prevent the hazard from being materialized or / and mitigate the effect of the hazard once the event has happened. A simple popular descriptive model, the “Swiss Cheese Model”, is shown below to illustrate the concept.

The barriers are also known as safeguards, controls, safety measures or “Layers of Protection”. Each barrier has a finite probability of failure so multiple barriers are required to reduce the risk of the event taking place to an acceptable level. The barriers need to be fully independent of each other or else they may be subject to a form of common mode failure. The barriers need to be maintained so that their reliability does not decline. The barriers may be of three major types. They may be hardware (e.g. pressure relief valve), systems (e.g. operating procedure), and people (e.g. training).

What happened

On Friday 17th May 2013 the No1 Train of the Reformer was being started up. This cold start- up had been initiated on the earlier night shift when nitrogen was introduced into the train and through the 228 tubes in the Primary Reformer Furnace. 10 Gas Burners were lit late on the night shift to start the warming up of the nitrogen gas and the Primary Reformer to operating temperatures. A maximum nitrogen flow of 14500 nm3/hr was established instead of the required flow of 25000 nm3/hr due to limited nitrogen availability. The fuel gas used for start-up was Rich Fuel Gas (RFG) instead of the normal Lean Fuel Gas (LFG). The calorific value of the RFG was in excess of 40 MJ/m3 whilst the LFG calorific value is around 15 MJ/m3.

The morning shift lit another 10 gas burners between 06.30 hrs and 07.00 hrs to continue the heating. The RFG fuel gas flow was increased to 5400 nm3/hr from 1800 nm3/hr at around 06.45 hrs. Temperatures of the nitrogen gas feed to the Primary Reformer Furnace tubes (TI1121/TI1140), nitrogen gas exiting the tubes (TI1126, TI1128, TI1130) and the Furnace flue gas increased steadily until 09.00 hrs.

Another 10 gas burners were lit between 08.30 hrs and 09.00hrs. All temperatures continued to increase but at a faster rate than before. At 09.12 hrs the high temperature alarm for the nitrogen feed sounded. This is set at 570OC. The fuel gas rate was decreased from a peak of 6690 nm3/hr to 5556 nm3/hr in an erratic fashion. The fuel gas rate was held at this lower value from about 10.35 hrs to about 11.15 hrs. An attempt was made to decrease nitrogen temperatures by adding quench water to the steam line via the MM104. After 09.00 an attempt was made to put steam through the tubes as part of the start-up procedure. The nitrogen feed temperature reached a maximum of 760OC when tubes failures started to occur at 10.51 hrs.

Between 11.00 hrs and 11.30 hrs (probably closer to 11.30 hrs) the outside operator checked the conditions inside the Furnace via the inspection hatch in the side wall. He observed tubes bending and all tubes glowing red and appearing transparent. The control room was informed and after tubes were seen to be collapsing the Primary Reformer was shut down at 11.39 hrs.

After the Unit had cooled down it was observed that many tubes had ruptured and the feed pipe from the Superheater 06-EM-101 to the Furnace tubes was distorted. Later metallurgical tests showed that only 66 Furnace tubes were still serviceable and that damage had been observed on the heater 06-EM101. The Furnace walls and roof were essentially undamaged. A few concrete slabs on the floor had broken. The cost of replacement of the damaged equipment is likely to be in excess of R70 million (GBP5 million).

Two graphs in Appendix 2 and 3 chart the behaviour of key temperatures and rate of temperature change (ROC).

A detailed investigation was carried out into the accident. Key people were interviewed, documents and operating data reviewed and physical evidence such as damaged tubes subjected to metallurgical analysis. The timeline was constructed. A root cause analysis was carried out. This is depicted in Appendix 1. It became very clear that the furnace tube damage was due to prolonged overheating during the nitrogen heating phase of the start-up procedure. The catalyst in the furnace was green in colour indicating that the catalyst had been exposed to water (condensate). This has serious effects on catalyst performance and could affect gas flow and temperatures in the furnace. Root causes were established in the Hardware, Systems and People sectors of the analysis.


Barriers not in place or which failed


The plant is 24 years old and has not been upgraded with respect to measurement and automation.

 No burner gas temperature measurement was available in the lower part of the reformer furnace where the highest temperatures are found.
 No furnace tube skin measurements were provided for.
 The Wobbe indicator was unable to measure the calorific value of the Rich Fuel Gas
 There was no automatic trip of the heating supply on reaching a critical temperature (the trips on gas exit temperatures are ineffective)
 The start-up was almost totally manual – modern plants have automatic start-ups

Systems (procedures)

A number of Systems should have prevented or mitigated the accident. They included:

 The PHA (Process Hazard Analysis) for the Unit did not include identification of the hazard of overheating because of the use of RFG, overheating because of the combination of RFG and a low nitrogen feed rate and overheating of the feed gas to the furnace tubes. The PHA was overdue for revision and did not reflect the Refinery’s own history before 2006.
 A comprehensive 107 page Start-Up and Shutdown Procedure was available to assist operators. The Cold Start-Up section comprised some 170 steps. The Procedure does not cover starting up under the unusual conditions of using RFG for the burners and a limited nitrogen flow. This combination resulted in a heat flux eight times that of a conventional start-up. There was a single reference to RFG in the Procedure – “Be careful when using RFG”.The Start-Up Procedure requires the Steps to be filled in and signed off as they were completed. This was reasonably well done.
 The Shift Handover Log did not include a reference to RFG being used and that nitrogen was available in a limited supply.


 A trainee started up the plant. He had about 3 months experience on the control panel. He had experience of running other plants. The trainee had never started up the plant under the abnormal conditions of RFG and low nitrogen flow. An interview with the trainee revealed that he had a limited understanding of the plant and was unaware of the specific risks of this particular start-up.
 An experienced Senior Process controller (SPC) was responsible for the activities of the trainee and was present for most of the start-up. In principle the trainee was not permitted to make any independent decisions. The SPC had experience of previous start-ups under similar abnormal conditions.
 The trainee and SPC had a poor interpersonal relationship. The SPC followed an ambivalent approach of “hands off” and then getting involved at other times. For this particular start-up the SPC left the trainee to his own devices after advising him to just follow the written Procedure. The trainee did this with his limited understanding.
 Training was largely hands-on. There was no simulation of abnormal events including the situation on the day of the accident.
 There was a large reliance on people for the start-up as the plant was inadequately instrumented.

 The Shift Supervisor was not present at the start-up as he was required at a meeting. The Process Engineer was also not present. There was a general consensus that two experienced people should be present for the start-up.
 The Rate of Change (ROC) of the temperature of the feed gas exceeded the prescribed limit of 50OC per hour by significant amounts for extended periods. The graph in Appendix 2 shows the ROC for the morning period. This was not reacted upon by the operators.
 The inlet gas temperature upper limit of 570OC was exceeded by several hundred degrees for some hours during the start-up. There was panel information on this information but it was not seen as critical. The graph in Appendix 3 shows the temperatures.
 Different operators use different parameters to guide the start-up.
 The lessons from a similar incident in 2001 were not known by the current operating team. The PHA did not reflect this incident. The many major failures of Reformer Furnaces in the rest of the world were not known amongst the operating staff.


Why did so many barriers fail?

Many of the major process safety accidents (Texas City, Deepwater Horizon, Piper Alpha) have been characterised by the failure of many barriers (8-12 in some cases). It has also been stated that if any one of the barriers had worked then the accident may well have been prevented.

The accident described in this article is not in the league of these major accidents but there are many similarities.

Failure to identify the hazard

The hazard of overheating the furnace tubes during start – up was not recognized. If a hazard is not identified then appropriate barriers or “Layers of Protection” cannot be formally and systematically put in place to prevent the hazard from being realized or the consequences being extreme. There was a failure to identify the hazard through an analytic approach (PHA) and an inability to learn from the organisation’s own history and the experiences of others.

Lack of understanding of the basic Process Safety approach

At the heart of the good Process Safety is the concept of identifying all hazards, minimizing them where possible and then putting in sufficient barriers to reduce the risk to acceptable levels. Of course, the barriers need to be properly maintained to ensure they will function corrected when required. This simple model of keeping the “tiger in the cage” was not understood by the organisation. In addition the need to keep the barriers sound and the specific responsibilities for this was not understood. The implications of barriers not being present or being weakened in any way were not understood. With the Hardware and Systems barriers being non-existent or ineffective, total reliance was placed on the People barrier. When the accident occurred this barrier was at its weakest and failed to detect and prevent the problems that were developing.

The Basis of Safety is a tool used to describe condensed and related information on hazards and barriers for a particular process. This was not used at the Refinery. It is particularly well used in the explosives industry.

Human Factors (People)

People are involved are involved in all stages of process design, development, operation and maintenance of equipment. Thus all failures can be attributed to people but this is not necessarily a useful observation.

Plant Design

The plant, as originally designed, could not be described as inherently safe. There was no direct measurement of the highest temperatures in the furnace either on the furnace side or by skin temperatures. Indirect temperatures were measured. There was no automatic start-up possible and no trip system for preventing disaster. There was no modernisation of the furnace controls. These aspects were either a result of decisions made by various people or failure to make any decisions.


There were a multitude of failures during the actual start-up. The pairing of the two operators was unfortunate and should have been dealt with a long time previously. Although some attempts had been made to deal with the relationship (coaching) they were not successful. The absence of any other level of supervision / management or technical support ensured that the start-up was solely in the hands of the operating pair. An attempt had been made to revise the Start-up Procedure some months before the accident. A draft copy was found with some useful additions but this review had never been formalised. The actual start-up contravened several limitations on temperatures and rates of temperature increase. The operators used other temperature measurements to guide decisions but were totally unaware of the developing incident. Some high temperature alarms sounded during the start-up but these were merely cancelled.


Apart from the accident specifics, there was an element of blindness to major risk in key parts of the organisation. The phrase “risk blindness” was coined by Andrew Hopkins in his book on the Texas City disaster (Hopkins, 2008). Process Safety Management is a new concept in the organisation and little training has taken place. No formal implementation of a Process Safety Programme has been planned although some islands of the discipline exist. The PHA process is flawed, learning from previous incidents is minimal and there are no process safety indicators to alert all that problems exist or are developing.


This has been defined as “Mindfulness preserves the capability to see the significant meaning of weak signals and to give a strong response to weak signals”(Weick, 2001). This leads to organisations organising themselves in such a way to be better able to notice the unexpected in the making and halt its development. As Andre Hopkins comments : “these structures that might be put in place include reporting systems, auditing systems, training systems, maintenance systems etc.”

The Refinery where the accident took place did not have this characteristic and most of the individuals did not possess this quality. In general there was no shared feeling that if risks were not well managed then catastrophe could be lurking just below the surface. In “Mindful Organisations” senior management is sensitive to the major risks and takes steps to check that the unease is not well founded.

Auditing is one tool to detect issues. The organisation did carry out conventional auditing but without a PSM System formally in place, there was no auditing of the System. Auditing has failed in many well-known major disasters (Texas City & Piper Alpha) so the form and quality of auditing has to reveal the potential for a major incident. A different approach would be to take the barriers / Layers of Protection for a particular hazard and test the completeness of these measures, appropriateness and state of readiness to deal with the unwanted event.

Lack of mindfulness was seen at different levels of management / supervision / operation during the fateful start-up. No one sensed that this particular start-up was particularly hazardous either before or during the event. There was information and signal available to sensitize many people.


The accident at the Refinery resulted from a number of factors of both an immediate and long term nature. The design of the 24 year old Primary Reformer was inadequate in that direct temperature measurement was not provided for and no trip was available to save the plant from serious damage. The hazard of overheating the furnace tubes during start-up was not recognised via the PHA process or by learning from other furnace tube failures. Failure to recognise the hazard meant that safeguards or layers of protection were not formally put in place.

The barrier of having an effective start-up procedure was totally ineffective as this not apply to the specific conditions applicable on the day.

Failure of people undermined the entire risk management process. Human failure was most evident when the plant was effectively started up by a trainee with totally inadequate guidance by the experienced SPC or other competent people. Whilst people represent a source of failure they also are the last barrier to detect problems and save the day. In this accident this last barrier was extremely weak.

The basic Process Safety Model of a hazard protected by a series of protective barriers is not understood by the Refinery and support personnel. Hence the activities to ensure the barriers remain sound were not carried out.

The Refinery requires a number of urgent actions to reduce the risk of further accidents. These actions would include:

 Audit of Operation for current situation and “major gaps”
 Introduction of a formal PSM programme focussing on the “major gaps”
 Urgently train a critical mass of people in the principles of PSM
 Ensure that the Swiss Cheese Model of barrier protection is universally known and understood
 Upgrade the measurement capability in the furnaces and install appropriate trips to prevent overheating of tubes
 Revise the Start-Up Procedure to cover conditions of low nitrogen flow and use of RFG
 Review the training methods for new people and define the boundaries of trainee actions
 Revise PHAs across the Refinery using an outside resource and building in all relevant history both internal and external.


Hopkins, A., 2008, Failure to Learn – the BP Texas City Refinery disaster, CCH Australia Limited

Weick,  K  and  Sutcliffe,  K.,  2003,  Managing  the  unexpected: assuring  high  performance in an age of complexity, SanFrancisco,Jossey-Bass, p3


Appendix 1 – Fault Tree analysis for Furnace Tube Failure – May 2013



Appendix 2 – ROC of temperatures during start-up of Furnace no 1




Appendix 3 – Key temperatures recorded during start-up of Furnace No 1 in May 2013




Share this on:
Share on linkedin
Share on twitter

[user_registration_form id=”41351″]