JAX London Blog

Troubleshooting HTTP 502 Bad Gateway in AWS EBS

AWS Elastic Beanstalk architecture

Sep 20, 2022

The application that we are going to discuss in this post was running on Elastic Beanstalk (EBS) service in Amazon Web Services (AWS). Intermittently, this application was throwing an HTTP 502 Bad Gateway error. In this post, let’s discuss how we troubleshot and resolved this HTTP 502 bad gateway error in Elastic Beanstalk service

AWS Elastic Beanstalk architecture

This application was running on AWS Elastic Load Balancer, Nginx 1.18.0, Java 8, Tomcat 8, and Amazon Linux in AWS Elastic BeanStalk service (EBS). For the folks who are not that familiar with EBS, its high level architecture is below.

Fig. 1: AWS Elastic Beanstalk architecture

STAY TUNED!

Learn more about JAX London

There is an AWS elastic load balancer in the forefront. This Load Balancer will distribute the traffic to a set of EC2 instances (which can be auto-scaled). Each EC2 instance will have a Nginx web server and a Tomcat application server. Requests sent by the Elastic Load Balancer are first handled by the Nginx server. Then the Nginx server forwards the request to the Tomcat server.

HTTP 502 Bad Gateway error

Intermittently (not always), this application was throwing HTTP 502 bad gateway errors. A few seconds later, once again service will resume and things will start to function normally. It wasn’t clear what was causing this HTTP 502 bad gateway error in the AWS Elastic Beanstalk environment.

We first need to understand what this HTTP 502 bad gateway error means. This error is thrown by a web server/gateway/proxy server when it gets an invalid response from the backend end server to which it’s talking to.




HTTP 502 thrown by Nginx in AWS EBS

Now the question is: there are three primary components in the EBS stack:

  • Elastic Load Balancer
  • Nginx web server
  • Tomcat Application server

In these 3 components, which one is throwing HTTP 502 bad gateway error?

Fig. 2: Screenshot of the HTTP 502 Bad gateway error thrown by Nginx server

Above is the screenshot of the HTTP 502 bad gateway error that we were receiving. There is a clue in this screenshot to indicate who is throwing this HTTP 502 error. If you notice the highlighted part of the screen, you will see this HTTP 502 bad gateway error is to be thrown by the Nginx server.

As per the HTTP 502 error definition, Nginx should be throwing this error only if it would have got invalid response from the Tomcat server. Thus, this clue helped us to narrow down that Tomcat server is the source of the problem.

Out of memory: Kill process or sacrifice child

In order to identify the source of the problem, we executed the open source yCrash script on the EC2 instance in which Tomcat server was running. The yCrash script captures 16 different artifacts from the technology stack, which includes: the Garbage Collection log, thread dump, heap dump, ps, top, top -H, vmstat, netstat, and more. We uploaded the captured artifacts into the yCrash server for analysis.

One of the artifacts that yCrash script captures is the kernel log file. In this log file, all the events that happened in the Linux kernel can be found. yCrash pulls out critical errors and warnings from the log file & presents them. The analysis report of the kernel log generated by the yCrash is below.

Fig. 3: yCrash’s Kernel log analysis reporting ‘Out of memory: kill process or sacrifice child’

Please see the highlighted error message in the kernel log:

[Sat May 21 17:31:00 2022] Out of memory: Kill process 24339 (java) score 874 or sacrifice child

It indicates that the Tomcat server, which is a Java process, was terminated. The Linux kernel will terminate processes if their memory consumption exceeds the device’s RAM capacity limit. This is the exact scenario happening in this application as well. Whenever the application’s memory consumption goes beyond the capacity limits, the Linux kernel was terminating the Tomcat server.


Root cause – Lack of RAM

Now the question is: How can Linux terminating the Tomcat server result in an intermittent HTTP 502 bad gateway error? Shouldn’t a complete outage need to happen? It’s a fair question.

If you recall, this application is running on AWS Elastic Beanstalk (EBS) service. The EBS service will automatically restart the Tomcat server whenever it gets terminated. Thus, it’s hilarious. Linux is terminating and EBS is restarting the Tomcat server. During this intermittent period, customers were experiencing HTTP 502 bad gateway errors.

Solution – Upgrading EC2 instance RAM capacity

Apparently, the application was running on EC2 instances, which had only 1GB RAM capacity. It wasn’t sufficient memory to run Tomcat server, Nginx server and other kernel processes. Thus, when the application was upgraded to run on 2GB RAM capacity EC2 instance, the problem got resolved.

Behind the Tracks

Software Architecture & Design
Software innovation & more
Microservices
Architecture structure & more
Agile & Communication
Methodologies & more
DevOps & Continuous Delivery
Delivery Pipelines, Testing & more
Big Data & Machine Learning
Saving, processing & more