Download file or folder from pod through stream

zhi tao
3 min readJul 11, 2023

--

We have an ops platform, which based on kubernetes. It can manage multiple kubernetes clusters, and has one feature file assistant, that can download file from pod in any managed kubernetes clusters.

Get File from pod by kubectl cp command or kubernetes api server

1. kubectl cp

We can copy file from pod to local destination by kubectl cp.

# Copy /tmp/foo_dir local directory to /tmp/bar_dir
# in a remote pod in the default namespace
kubectl cp /tmp/foo_dir <some-pod>:/tmp/foo_dir

# Copy /tmp/foo local file to /tmp/bar in a remote pod
# in a specific container
kubectl cp /tmp/foo <some-pod>:/tmp/foo -c <specific-container>

# Copy /tmp/foo local file to /tmp/bar in a remote pod
# in namespace <some-namespace>
kubectl cp /tmp/foo <some-namespace>/<some-pod>:/tmp/foo

# Copy /tmp/foo from a remote pod to /tmp/bar locally
kubectl cp <some-namespace>/<some-pod>:/tmp/foo foo

kubectl cp implement by tar internal, you can read the source codes kubectl/pkg/cmd/cp/cp.go at master · kubernetes/kubectl · GitHub

2. kubernetes api server

const kubernetesClient = new KubernetesClientAPI('<cluster-name>', '<k8sCfg>')
const cpApiIns = kubernetesClient.getCpApiIns()
// copy file from pod to localFile
await cpApiIns.cpFromPod(namespace, podName, container, filePath, localFileName)

3. Stream file to Response

res.setHeader('Content-Disposition', `attachment; filename=${localFileName};`)
const statInfo = await stat(localFileName)
res.setHeader('Content-Length', statInfo.size)
const readable = fs.createReadStream(localFileName)
return readable
.on('error', err => {
console.error(`get readable stream for ${localFileName} error = %o`, err)
res.status(500).json(wrapError({ message: err.message } as any))
})
.pipe(res)
.on('finish', () => unlink(localFileName))

We have to download pod file to local first, and then send it to user again. If the file is huge, it may be very slow, since we need download the file twice, and write it to disk, and last read it to res.

It cost some cpu resource to execute tar command in pod, and some cpu resource in local to untar the stream.

Stream File from pod to res directly

The above kubectl cp command could be replaced by kubectl exec with tar command.

# Copy /tmp/foo_dir local directory to /tmp/bar_dir
# in a remote pod in the default namespace
kubectl cp /tmp/foo_dir <some-pod>:/tmp/foo_dir
tar cf - /tmp/foo_dir | kubectl exec -i <some-pod> -- tar xf -C /tmp/bar_dir -

# Copy /tmp/foo local file to /tmp/bar in a remote pod
# in a specific container
tar cf - /tmp/foo | kubectl exec -i <some-pod> -c <specific-container> -- tar xf -C /tmp/bar -

# Copy /tmp/foo local file to /tmp/bar in a remote pod
# in namespace <some-namespace>
tar cf - /tmp/foo | kubectl exec -i -n <some-namespace> <some-pod> -- tar xf -C /tmp/bar -

# Copy /tmp/foo from a remote pod to /tmp/bar locally
kubectl exec -n <some-namespace> <some-pod> -- tar cf - /tmp/foo | tar xf -C /tmp/bar -

# You can even copy a file from one pod to another!
kubectl exec <some-pod> -- tar cf - /tmp/foo | kubectl exec -i <other-pod> -- tar xf -

if we can stream file to response directly without save it to local disk, we can save the file size by gzip, and stream it to response directly without saving it. We can achieve best performance by this method.

const kubernetesClient = new KubernetesClientAPI('<cluster-name>', '<k8sCfg>')
const execIns = kubernetesClient.getExecApiIns()
const tarCmd = ['tar', '-czf', '-', filePath]
const fileName = filePath.split('/').pop()
res.setHeader('Content-Disposition', `attachment; filename=${fileName}.tar.gz;`)
await execIns.exec(
namespace,
name,
container,
tarCmd,
res,
stderrStream,
null,
false,
(status) => {
const { status: statusCode } = status
if (statusCode === 'Success') return
const stderr = stderrStream.getResult()
console.error(
`execute command: '${tarCmd.join(' ')}' failed, the error = %o, \n stderr = %s`,
status,
stderr
)
if (!res.headersSent) res.writeHead(500)
return res.end(`download file error: ${stderr}`)
},
)

We use exec api to execute tar -czf - filePath to get the filePath tar and gzip stream to stdout, and the stdout we pipe to res directly.

Compare with the first method, we don’t save file and read it to res again. The file size get down from 3.9G to 638M, it decrease 84% of original, it can be downloaded faster than original one.

We don’t untar the stream, we pipe the gzip tar stream to response on the contrary.

Conclusion

We should avoid to write file to disk and read it again, reduce io operations. Take advantage of stream in memory instead. It can speed up the process progress and save lot of system resources.

--

--