Recently I was working on a nodejs/express application to troubleshoot a memory leak. Those were the steps and tools I followed to identify the root cause.
1. Reproduce the issue
We had it in two modules, one a http server and the other a sqs consumer.
For the http server
Created a k6 script for a small performance test:
// script.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export default function () {
const url = 'http://localhost:3000/test';
const res = http.get(url);
check(res, { success: (r) => r.status === 200 });
}
And ran it with:
# 100 users, 1 hour
k6 run -u 100 -d 1h script.js
For the SQS consumer
Built a send command (note: I am using localstack):
# send.sh
AWS_REGION: "us-east-1" aws --endpoint-url=http://localhost:4566
sqs send-message
--region "us-east-1"
--queue-url http://localhost:4566/000000000000/queue
--message-body '{"my": "message"}'
and ran with with watch:
watch -n .1 ./send.sh
2. Next step, collect the data
Initially I was using node itself for generating it, e.g:
import v8 from 'node:v8';
// ...
app.get('/heapdump', (req, res) => {
const fileName = v8.writeHeapSnapshot();
res.send({ fileName });
});
but we can also use chrome devtools for this (I am using brave, but sames applies):
- run the app with the flag
--inspect
- go to
chrome://inspect
/brave://inspect
, connect - go to
Memory
tab, click onTake Heap Snapshot
3. Analyze the data
That is the tricky part. Pick the initial snapshot, run the test, pick the final snapshot, and compare.
Try to correlate what you have there with the code. Allocation on timeline
was also helpful for me.
I strongly recommend take a look on this presentation for detailed explanation and showcase:
Next?
Now is the time to apply a fix and re-run the tests.
We found a lot of DerivedLogger in the memory. It was because the code was calling winston.createLogger
on each call to enrich the log. We switched to child call and problem solved.
There was no need to use clinic.js this time.