- Resource: Cluster
- VirtualClusterConfig
- KubernetesClusterConfig
- GkeClusterConfig
- NamespacedGkeDeploymentTarget
- GkeNodePoolTarget
- Role
- GkeNodePoolConfig
- GkeNodeConfig
- GkeNodePoolAcceleratorConfig
- GkeNodePoolAutoscalingConfig
- KubernetesSoftwareConfig
- AuxiliaryServicesConfig
- SparkHistoryServerConfig
- ClusterStatus
- State
- Substate
- ClusterMetrics
- Methods
Resource: Cluster
Describes the identifying information, config, and status of a Dataproc cluster
JSON representation |
---|
{ "projectId": string, "clusterName": string, "config": { object ( |
Fields | |
---|---|
project |
Required. The Google Cloud Platform project ID that the cluster belongs to. |
cluster |
Required. The cluster name, which must be unique within a project. The name must start with a lowercase letter, and can contain up to 51 lowercase letters, numbers, and hyphens. It cannot end with a hyphen. The name of a deleted cluster can be reused. |
config |
Optional. The cluster config for a cluster of Compute Engine Instances. Note that Dataproc may set default values, and values may change when clusters are updated. Exactly one of ClusterConfig or VirtualClusterConfig must be specified. |
virtual |
Optional. The virtual cluster config is used when creating a Dataproc cluster that does not directly control the underlying compute resources, for example, when creating a Dataproc-on-GKE cluster. Dataproc may set default values, and values may change when clusters are updated. Exactly one of |
labels |
Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035. Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035. No more than 32 labels can be associated with a cluster. An object containing a list of |
status |
Output only. Cluster status. |
status |
Output only. The previous cluster status. |
cluster |
Output only. A cluster UUID (Unique Universal Identifier). Dataproc generates this value when it creates the cluster. |
metrics |
Output only. Contains cluster daemon metrics such as HDFS and YARN stats. Beta Feature: This report is available for testing purposes only. It may be changed before final release. |
VirtualClusterConfig
The Dataproc cluster config for a cluster that does not directly control the underlying compute resources, such as a Dataproc-on-GKE cluster.
JSON representation |
---|
{ "stagingBucket": string, "auxiliaryServicesConfig": { object ( |
Fields | |
---|---|
staging |
Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets). This field requires a Cloud Storage bucket name, not a |
auxiliary |
Optional. Configuration of auxiliary services used by this cluster. |
Union field
|
|
kubernetes |
Required. The configuration for running the Dataproc cluster on Kubernetes. |
KubernetesClusterConfig
The configuration for running the Dataproc cluster on Kubernetes.
JSON representation |
---|
{ "kubernetesNamespace": string, "kubernetesSoftwareConfig": { object ( |
Fields | |
---|---|
kubernetes |
Optional. A namespace within the Kubernetes cluster to deploy into. If this namespace does not exist, it is created. If it exists, Dataproc verifies that another Dataproc VirtualCluster is not installed into it. If not specified, the name of the Dataproc Cluster is used. |
kubernetes |
Optional. The software configuration for this Dataproc cluster running on Kubernetes. |
Union field
|
|
gke |
Required. The configuration for running the Dataproc cluster on GKE. |
GkeClusterConfig
The cluster's GKE config.
JSON representation |
---|
{ "namespacedGkeDeploymentTarget": { object ( |
Fields | |
---|---|
namespacedGkeDeploymentTarget |
Optional. Deprecated. Use gkeClusterTarget. Used only for the deprecated beta. A target for the deployment. |
gke |
Optional. A target GKE cluster to deploy to. It must be in the same project and region as the Dataproc cluster (the GKE cluster can be zonal or regional). Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}' |
node |
Optional. GKE node pools where workloads will be scheduled. At least one node pool must be assigned the |
NamespacedGkeDeploymentTarget
Deprecated. Used only for the deprecated beta. A full, namespace-isolated deployment target for an existing GKE cluster.
JSON representation |
---|
{ "targetGkeCluster": string, "clusterNamespace": string } |
Fields | |
---|---|
target |
Optional. The target GKE cluster to deploy to. Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}' |
cluster |
Optional. A namespace within the GKE cluster to deploy into. |
GkeNodePoolTarget
GKE node pools that Dataproc workloads run on.
JSON representation |
---|
{ "nodePool": string, "roles": [ enum ( |
Fields | |
---|---|
node |
Required. The target GKE node pool. Format: 'projects/{project}/locations/{location}/clusters/{cluster}/nodePools/{nodePool}' |
roles[] |
Required. The roles associated with the GKE node pool. |
node |
Input only. The configuration for the GKE node pool. If specified, Dataproc attempts to create a node pool with the specified shape. If one with the same name already exists, it is verified against all specified fields. If a field differs, the virtual cluster creation will fail. If omitted, any node pool with the specified name is used. If a node pool with the specified name does not exist, Dataproc create a node pool with default values. This is an input only field. It will not be returned by the API. |
Role
Role
specifies the tasks that will run on the node pool. Roles can be specific to workloads. Exactly one GkeNodePoolTarget
within the virtual cluster must have the DEFAULT
role, which is used to run all workloads that are not associated with a node pool.
Enums | |
---|---|
ROLE_UNSPECIFIED |
Role is unspecified. |
DEFAULT |
At least one node pool must have the DEFAULT role. Work assigned to a role that is not associated with a node pool is assigned to the node pool with the DEFAULT role. For example, work assigned to the CONTROLLER role will be assigned to the node pool with the DEFAULT role if no node pool has the CONTROLLER role. |
CONTROLLER |
Run work associated with the Dataproc control plane (for example, controllers and webhooks). Very low resource requirements. |
SPARK_DRIVER |
Run work associated with a Spark driver of a job. |
SPARK_EXECUTOR |
Run work associated with a Spark executor of a job. |
GkeNodePoolConfig
The configuration of a GKE node pool used by a Dataproc-on-GKE cluster.
JSON representation |
---|
{ "config": { object ( |
Fields | |
---|---|
config |
Optional. The node pool configuration. |
locations[] |
Optional. The list of Compute Engine zones where node pool nodes associated with a Dataproc on GKE virtual cluster will be located. Note: All node pools associated with a virtual cluster must be located in the same region as the virtual cluster, and they must be located in the same zone within that region. If a location is not specified during node pool creation, Dataproc on GKE will choose the zone. |
autoscaling |
Optional. The autoscaler configuration for this node pool. The autoscaler is enabled only when a valid configuration is present. |
GkeNodeConfig
Parameters that describe cluster nodes.
JSON representation |
---|
{
"machineType": string,
"localSsdCount": integer,
"preemptible": boolean,
"accelerators": [
{
object ( |
Fields | |
---|---|
machine |
Optional. The name of a Compute Engine machine type. |
local |
Optional. The number of local SSD disks to attach to the node, which is limited by the maximum number of disks allowable per zone (see Adding Local SSDs). |
preemptible |
Optional. Whether the nodes are created as legacy preemptible VM instances. Also see |
accelerators[] |
Optional. A list of hardware accelerators to attach to each node. |
min |
Optional. Minimum CPU platform to be used by this instance. The instance may be scheduled on the specified or a newer CPU platform. Specify the friendly names of CPU platforms, such as "Intel Haswell"` or Intel Sandy Bridge". |
spot |
Optional. Whether the nodes are created as Spot VM instances. Spot VMs are the latest update to legacy |
GkeNodePoolAcceleratorConfig
A GkeNodeConfigAcceleratorConfig represents a Hardware Accelerator request for a node pool.
JSON representation |
---|
{ "acceleratorCount": string, "acceleratorType": string, "gpuPartitionSize": string } |
Fields | |
---|---|
accelerator |
The number of accelerator cards exposed to an instance. |
accelerator |
The accelerator type resource namename (see GPUs on Compute Engine). |
gpu |
Size of partitions to create on the GPU. Valid values are described in the NVIDIA mig user guide. |
GkeNodePoolAutoscalingConfig
GkeNodePoolAutoscaling contains information the cluster autoscaler needs to adjust the size of the node pool to the current cluster usage.
JSON representation |
---|
{ "minNodeCount": integer, "maxNodeCount": integer } |
Fields | |
---|---|
min |
The minimum number of nodes in the node pool. Must be >= 0 and <= maxNodeCount. |
max |
The maximum number of nodes in the node pool. Must be >= minNodeCount, and must be > 0. Note: Quota must be sufficient to scale up the cluster. |
KubernetesSoftwareConfig
The software configuration for this Dataproc cluster running on Kubernetes.
JSON representation |
---|
{ "componentVersion": { string: string, ... }, "properties": { string: string, ... } } |
Fields | |
---|---|
component |
The components that should be installed in this Dataproc cluster. The key must be a string from the KubernetesComponent enumeration. The value is the version of the software to be installed. At least one entry must be specified. An object containing a list of |
properties |
The properties to set on daemon config files. Property keys are specified in
For more information, see Cluster properties. An object containing a list of |
AuxiliaryServicesConfig
Auxiliary services configuration for a Cluster.
JSON representation |
---|
{ "metastoreConfig": { object ( |
Fields | |
---|---|
metastore |
Optional. The Hive Metastore configuration for this workload. |
spark |
Optional. The Spark History Server configuration for the workload. |
SparkHistoryServerConfig
Spark History Server configuration for the workload.
JSON representation |
---|
{ "dataprocCluster": string } |
Fields | |
---|---|
dataproc |
Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload. Example:
|
ClusterStatus
The status of a cluster and its instances.
JSON representation |
---|
{ "state": enum ( |
Fields | |
---|---|
state |
Output only. The cluster's state. |
detail |
Optional. Output only. Details of cluster's state. |
state |
Output only. Time when this state was entered (see JSON representation of Timestamp). A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
substate |
Output only. Additional state information that includes status reported by the agent. |
State
The cluster state.
Enums | |
---|---|
UNKNOWN |
The cluster state is unknown. |
CREATING |
The cluster is being created and set up. It is not ready for use. |
RUNNING |
The cluster is currently running and healthy. It is ready for use. Note: The cluster state changes from "creating" to "running" status after the master node(s), first two primary worker nodes (and the last primary worker node if primary workers > 2) are running. |
ERROR |
The cluster encountered an error. It is not ready for use. |
ERROR_DUE_TO_UPDATE |
The cluster has encountered an error while being updated. Jobs can be submitted to the cluster, but the cluster cannot be updated. |
DELETING |
The cluster is being deleted. It cannot be used. |
UPDATING |
The cluster is being updated. It continues to accept and process jobs. |
STOPPING |
The cluster is being stopped. It cannot be used. |
STOPPED |
The cluster is currently stopped. It is not ready for use. |
STARTING |
The cluster is being started. It is not ready for use. |
SCHEDULED |
Cluster creation is currently waiting for resources to be available. Once all resources are available, it will transition to CREATING and then RUNNING. |
Substate
The cluster substate.
Enums | |
---|---|
UNSPECIFIED |
The cluster substate is unknown. |
UNHEALTHY |
The cluster is known to be in an unhealthy state (for example, critical daemons are not running or HDFS capacity is exhausted). Applies to RUNNING state. |
STALE_STATUS |
The agent-reported status is out of date (may occur if Dataproc loses communication with Agent). Applies to RUNNING state. |
ClusterMetrics
Contains cluster daemon metrics, such as HDFS and YARN stats.
Beta Feature: This report is available for testing purposes only. It may be changed before final release.
JSON representation |
---|
{ "hdfsMetrics": { string: string, ... }, "yarnMetrics": { string: string, ... } } |
Fields | |
---|---|
hdfs |
The HDFS metrics. An object containing a list of |
yarn |
YARN metrics. An object containing a list of |
Methods |
|
---|---|
|
Creates a cluster in a project. |
|
Deletes a cluster in a project. |
|
Gets cluster diagnostic information. |
|
Gets the resource representation for a cluster in a project. |
|
Gets the access control policy for a resource. |
|
Lists all regions/{region}/clusters in a project alphabetically. |
|
Updates a cluster in a project. |
|
Sets the access control policy on the specified resource. |
|
Starts a cluster in a project. |
|
Stops a cluster in a project. |
|
Returns permissions that a caller has on the specified resource. |