arena submit tfjob

Submit TFJob as training job.

Synopsis

Submit TFJob as training job.

arena submit tfjob [flags]

Options

  -a, --annotation stringArray     the annotations
      --chief                      enable chief, which is required for estimator.
      --chief-cpu string           the cpu resource to use for the Chief, like 1 for 1 core.
      --chief-memory string        the memory resource to use for the Chief, like 1Gi.
      --chief-port int             the port of the chief.
      --clean-task-policy string   How to clean tasks after Training is done, only support Running, None. (default "Running")
  -d, --data stringArray           specify the datasource to mount to the job, like <name_of_datasource>:<mount_point_on_job>
      --data-dir stringArray       the data dir. If you specify /data, it means mounting hostpath /data into container path /data
  -e, --env stringArray            the environment variables
      --evaluator                  enable evaluator, which is optional for estimator.
      --evaluator-cpu string       the cpu resource to use for the evaluator, like 1 for 1 core.
      --evaluator-memory string    the memory resource to use for the evaluator, like 1Gi.
      --gpus int                   the GPU count of each worker to run the training.
  -h, --help                       help for tfjob
      --image string               the docker image name of training job
      --logdir string              the training logs dir, default is /training_logs (default "/training_logs")
      --name string                override name
      --ps int                     the number of the parameter servers.
      --ps-cpu string              the cpu resource to use for the parameter servers, like 1 for 1 core.
      --ps-image string            the docker image for tensorflow workers
      --ps-memory string           the memory resource to use for the parameter servers, like 1Gi.
      --ps-port int                the port of the parameter server.
      --rdma                       enable RDMA
      --retry int                  retry times.
      --sync-image string          the docker image of syncImage
      --sync-mode string           syncMode: support rsync, hdfs, git
      --sync-source string         sync-source: for rsync, it's like 10.88.29.56::backup/data/logoRecoTrain.zip; for git, it's like https://github.com/kubeflow/tf-operator.git
      --tensorboard                enable tensorboard
      --tensorboard-image string   the docker image for tensorboard (default "registry.cn-zhangjiakou.aliyuncs.com/tensorflow-samples/tensorflow:1.12.0-devel")
      --worker-cpu string          the cpu resource to use for the worker, like 1 for 1 core.
      --worker-image string        the docker image for tensorflow workers
      --worker-memory string       the memory resource to use for the worker, like 1Gi.
      --worker-port int            the port of the worker.
      --workers int                the worker number to run the distributed training. (default 1)
      --working-dir string         working directory to extract the code. If using syncMode, the $workingDir/code contains the code (default "/root")

Options inherited from parent commands

      --arena-namespace string   The namespace of arena system service, like tf-operator (default "arena-system")
      --config string            Path to a kube config. Only required if out-of-cluster
      --loglevel string          Set the logging level. One of: debug|info|warn|error (default "info")
  -n, --namespace string         the namespace of the job (default "default")
      --pprof                    enable cpu profile
      --trace                    enable trace

SEE ALSO

Auto generated by spf13/cobra on 24-Apr-2019