Get File from pod by kubectl cp command or kubernetes api server
1. kubectl cp
We can copy file from pod to local destination by kubectl cp
.
# Copy /tmp/foo_dir local directory to /tmp/bar_dir
# in a remote pod in the default namespace
kubectl cp /tmp/foo_dir <some-pod>:/tmp/foo_dir
# Copy /tmp/foo local file to /tmp/bar in a remote pod
# in a specific container
kubectl cp /tmp/foo <some-pod>:/tmp/foo -c <specific-container>
# Copy /tmp/foo local file to /tmp/bar in a remote pod
# in namespace <some-namespace>
kubectl cp /tmp/foo <some-namespace>/<some-pod>:/tmp/foo
# Copy /tmp/foo from a remote pod to /tmp/bar locally
kubectl cp <some-namespace>/<some-pod>:/tmp/foo foo
kubectl cp
implement by tar
internal, you can read the source codes kubectl/pkg/cmd/cp/cp.go at master · kubernetes/kubectl · GitHub
2. kubernetes api server
const kubernetesClient = new KubernetesClientAPI('<cluster-name>', '<k8sCfg>')
const cpApiIns = kubernetesClient.getCpApiIns()
// copy file from pod to localFile
await cpApiIns.cpFromPod(namespace, podName, container, filePath, localFileName)
3. Stream file to Response
res.setHeader('Content-Disposition', `attachment; filename=${localFileName};`)
const statInfo = await stat(localFileName)
res.setHeader('Content-Length', statInfo.size)
const readable = fs.createReadStream(localFileName)
return readable
.on('error', err => {
console.error(`get readable stream for ${localFileName} error = %o`, err)
res.status(500).json(wrapError({ message: err.message } as any))
})
.pipe(res)
.on('finish', () => unlink(localFileName))
We have to download pod file to local first, and then send it to user again. If the file is huge, it may be very slow, since we need download the file twice, and write it to disk, and last read it to res.
It cost some cpu resource to execute tar command in pod, and some cpu resource in local to untar the stream.
Stream File from pod to res directly
The above kubectl cp
command could be replaced by kubectl exec
with tar command.
# Copy /tmp/foo_dir local directory to /tmp/bar_dir
# in a remote pod in the default namespace
kubectl cp /tmp/foo_dir <some-pod>:/tmp/foo_dir
tar cf - /tmp/foo_dir | kubectl exec -i <some-pod> -- tar xf -C /tmp/bar_dir -
# Copy /tmp/foo local file to /tmp/bar in a remote pod
# in a specific container
tar cf - /tmp/foo | kubectl exec -i <some-pod> -c <specific-container> -- tar xf -C /tmp/bar -
# Copy /tmp/foo local file to /tmp/bar in a remote pod
# in namespace <some-namespace>
tar cf - /tmp/foo | kubectl exec -i -n <some-namespace> <some-pod> -- tar xf -C /tmp/bar -
# Copy /tmp/foo from a remote pod to /tmp/bar locally
kubectl exec -n <some-namespace> <some-pod> -- tar cf - /tmp/foo | tar xf -C /tmp/bar -
# You can even copy a file from one pod to another!
kubectl exec <some-pod> -- tar cf - /tmp/foo | kubectl exec -i <other-pod> -- tar xf -
if we can stream file to response directly without save it to local disk, we can save the file size by gzip, and stream it to response directly without saving it. We can achieve best performance by this method.
const kubernetesClient = new KubernetesClientAPI('<cluster-name>', '<k8sCfg>')
const execIns = kubernetesClient.getExecApiIns()
const tarCmd = ['tar', '-czf', '-', filePath]
const fileName = filePath.split('/').pop()
res.setHeader('Content-Disposition', `attachment; filename=${fileName}.tar.gz;`)
await execIns.exec(
namespace,
name,
container,
tarCmd,
res,
stderrStream,
null,
false,
(status) => {
const { status: statusCode } = status
if (statusCode === 'Success') return
const stderr = stderrStream.getResult()
console.error(
`execute command: '${tarCmd.join(' ')}' failed, the error = %o, \n stderr = %s`,
status,
stderr
)
if (!res.headersSent) res.writeHead(500)
return res.end(`download file error: ${stderr}`)
},
)
We use exec
api to execute tar -czf - filePath
to get the filePath tar and gzip stream to stdout, and the stdout we pipe to res directly.
Compare with the first method, we don’t save file and read it to res again. The file size get down from 3.9G to 638M, it decrease 84% of original, it can be downloaded faster than original one.
We don’t untar the stream, we pipe the gzip tar stream to response on the contrary.
Conclusion
We should avoid to write file to disk and read it again, reduce io operations. Take advantage of stream in memory instead. It can speed up the process progress and save lot of system resources.