In our project, my co-worker told me that the dockerd process consume very high cpu usage, it can be 90% or more sometimes. It causes the machine slow and can not process our business well. We need to find out why the dockerd consume so much cpu, and how to optimize it.
What
We can take a full picture of our machine by htop.
It’s colorful top
command with much more option.
As you can tell from the image above, the docker daemon process takes 77% cpu. It’s the highest cpu usage.
Why
But why, what the process doing to take so much cpu. We can using perf
command to analysis the status of the machine in realtime. The command below will show the cpu usage of the pid
in realtime.
perf top -g -p pid
It will change very quickly since it’s the realtime status. We can record the data by perf record -p pid
, it will create perf.data file in the current dir, and then using perf report
to analysis the data.
perf record -p pid
perf report
Children
represents the all subcalls cpu usage in totalself
represents the process itself cpu usageShared Object
the function or process which using cpuSymbol
show the function name or symbol,[k]
means in kernel space,[.]
means in user space.
Besides, we get the pid detail info using pidstat
.
pidstat -u -d -p pid
As we can tell from perf
output, the dockerd
process is busy in file processing, and we got the github issue after google. We may guess our container is busy and create lots of logs, and the dockerd is busying in processing the json-log
How
According what we observe, and the issue we found, we try to set the max-files
and max-size
in a reasonable size.
{
"data-root": "/data/docker",
"insecure-registries": [
"registry.asteria.com:5000"
],
"live-restore": true,
"log-driver": "json-file",
"log-opts": {
"max-size": "2m",
"max-file": "1"
}
}
Update the /etc/docker/daemon.json
file with the reasonable log-opts
config. We should restart the dockerd
process by
systemctl restart docker
to make it works.
After we do above steps, we using htop to monitor the machine again for while, the dockerd
process will at most using 40% cpu in a moment.
Conclusions
We learns how we found the issue, dig into it to find why, and last tackle it by the reason.
Linux performance is a huge topic, we can meet any conditions we don’t have met before, we should learn how to use the command or any other way to help us to location our issues quickly.
That’s all. Thanks for reading it.
If anything wrong here, please let me know.