REST Resource: projects.regions.clusters

Resource: Cluster

Describes the identifying information, config, and status of a Dataproc cluster

JSON representation
{
  "projectId": string,
  "clusterName": string,
  "config": {
    object (ClusterConfig)
  },
  "virtualClusterConfig": {
    object (VirtualClusterConfig)
  },
  "labels": {
    string: string,
    ...
  },
  "status": {
    object (ClusterStatus)
  },
  "statusHistory": [
    {
      object (ClusterStatus)
    }
  ],
  "clusterUuid": string,
  "metrics": {
    object (ClusterMetrics)
  }
}
Fields
projectId

string

Required. The Google Cloud Platform project ID that the cluster belongs to.

clusterName

string

Required. The cluster name, which must be unique within a project. The name must start with a lowercase letter, and can contain up to 51 lowercase letters, numbers, and hyphens. It cannot end with a hyphen. The name of a deleted cluster can be reused.

config

object (ClusterConfig)

Optional. The cluster config for a cluster of Compute Engine Instances. Note that Dataproc may set default values, and values may change when clusters are updated.

Exactly one of ClusterConfig or VirtualClusterConfig must be specified.

virtualClusterConfig

object (VirtualClusterConfig)

Optional. The virtual cluster config is used when creating a Dataproc cluster that does not directly control the underlying compute resources, for example, when creating a Dataproc-on-GKE cluster. Dataproc may set default values, and values may change when clusters are updated. Exactly one of config or virtualClusterConfig must be specified.

labels

map (key: string, value: string)

Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035. Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035. No more than 32 labels can be associated with a cluster.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

status

object (ClusterStatus)

Output only. Cluster status.

statusHistory[]

object (ClusterStatus)

Output only. The previous cluster status.

clusterUuid

string

Output only. A cluster UUID (Unique Universal Identifier). Dataproc generates this value when it creates the cluster.

metrics

object (ClusterMetrics)

Output only. Contains cluster daemon metrics such as HDFS and YARN stats.

Beta Feature: This report is available for testing purposes only. It may be changed before final release.

VirtualClusterConfig

The Dataproc cluster config for a cluster that does not directly control the underlying compute resources, such as a Dataproc-on-GKE cluster.

JSON representation
{
  "stagingBucket": string,
  "auxiliaryServicesConfig": {
    object (AuxiliaryServicesConfig)
  },

  // Union field infrastructure_config can be only one of the following:
  "kubernetesClusterConfig": {
    object (KubernetesClusterConfig)
  }
  // End of list of possible types for union field infrastructure_config.
}
Fields
stagingBucket

string

Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets). This field requires a Cloud Storage bucket name, not a gs://... URI to a Cloud Storage bucket.

auxiliaryServicesConfig

object (AuxiliaryServicesConfig)

Optional. Configuration of auxiliary services used by this cluster.

Union field infrastructure_config.

infrastructure_config can be only one of the following:

kubernetesClusterConfig

object (KubernetesClusterConfig)

Required. The configuration for running the Dataproc cluster on Kubernetes.

KubernetesClusterConfig

The configuration for running the Dataproc cluster on Kubernetes.

JSON representation
{
  "kubernetesNamespace": string,
  "kubernetesSoftwareConfig": {
    object (KubernetesSoftwareConfig)
  },

  // Union field config can be only one of the following:
  "gkeClusterConfig": {
    object (GkeClusterConfig)
  }
  // End of list of possible types for union field config.
}
Fields
kubernetesNamespace

string

Optional. A namespace within the Kubernetes cluster to deploy into. If this namespace does not exist, it is created. If it exists, Dataproc verifies that another Dataproc VirtualCluster is not installed into it. If not specified, the name of the Dataproc Cluster is used.

kubernetesSoftwareConfig

object (KubernetesSoftwareConfig)

Optional. The software configuration for this Dataproc cluster running on Kubernetes.

Union field config.

config can be only one of the following:

gkeClusterConfig

object (GkeClusterConfig)

Required. The configuration for running the Dataproc cluster on GKE.

GkeClusterConfig

The cluster's GKE config.

JSON representation
{
  "namespacedGkeDeploymentTarget": {
    object (NamespacedGkeDeploymentTarget)
  },
  "gkeClusterTarget": string,
  "nodePoolTarget": [
    {
      object (GkeNodePoolTarget)
    }
  ]
}
Fields
namespacedGkeDeploymentTarget
(deprecated)

object (NamespacedGkeDeploymentTarget)

Optional. Deprecated. Use gkeClusterTarget. Used only for the deprecated beta. A target for the deployment.

gkeClusterTarget

string

Optional. A target GKE cluster to deploy to. It must be in the same project and region as the Dataproc cluster (the GKE cluster can be zonal or regional). Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'

nodePoolTarget[]

object (GkeNodePoolTarget)

Optional. GKE node pools where workloads will be scheduled. At least one node pool must be assigned the DEFAULT GkeNodePoolTarget.Role. If a GkeNodePoolTarget is not specified, Dataproc constructs a DEFAULT GkeNodePoolTarget. Each role can be given to only one GkeNodePoolTarget. All node pools must have the same location settings.

NamespacedGkeDeploymentTarget

Deprecated. Used only for the deprecated beta. A full, namespace-isolated deployment target for an existing GKE cluster.

JSON representation
{
  "targetGkeCluster": string,
  "clusterNamespace": string
}
Fields
targetGkeCluster

string

Optional. The target GKE cluster to deploy to. Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'

clusterNamespace

string

Optional. A namespace within the GKE cluster to deploy into.

GkeNodePoolTarget

GKE node pools that Dataproc workloads run on.

JSON representation
{
  "nodePool": string,
  "roles": [
    enum (Role)
  ],
  "nodePoolConfig": {
    object (GkeNodePoolConfig)
  }
}
Fields
nodePool

string

Required. The target GKE node pool. Format: 'projects/{project}/locations/{location}/clusters/{cluster}/nodePools/{nodePool}'

roles[]

enum (Role)

Required. The roles associated with the GKE node pool.

nodePoolConfig

object (GkeNodePoolConfig)

Input only. The configuration for the GKE node pool.

If specified, Dataproc attempts to create a node pool with the specified shape. If one with the same name already exists, it is verified against all specified fields. If a field differs, the virtual cluster creation will fail.

If omitted, any node pool with the specified name is used. If a node pool with the specified name does not exist, Dataproc create a node pool with default values.

This is an input only field. It will not be returned by the API.

Role

Role specifies the tasks that will run on the node pool. Roles can be specific to workloads. Exactly one GkeNodePoolTarget within the virtual cluster must have the DEFAULT role, which is used to run all workloads that are not associated with a node pool.

Enums
ROLE_UNSPECIFIED Role is unspecified.
DEFAULT At least one node pool must have the DEFAULT role. Work assigned to a role that is not associated with a node pool is assigned to the node pool with the DEFAULT role. For example, work assigned to the CONTROLLER role will be assigned to the node pool with the DEFAULT role if no node pool has the CONTROLLER role.
CONTROLLER Run work associated with the Dataproc control plane (for example, controllers and webhooks). Very low resource requirements.
SPARK_DRIVER Run work associated with a Spark driver of a job.
SPARK_EXECUTOR Run work associated with a Spark executor of a job.

GkeNodePoolConfig

The configuration of a GKE node pool used by a Dataproc-on-GKE cluster.

JSON representation
{
  "config": {
    object (GkeNodeConfig)
  },
  "locations": [
    string
  ],
  "autoscaling": {
    object (GkeNodePoolAutoscalingConfig)
  }
}
Fields
config

object (GkeNodeConfig)

Optional. The node pool configuration.

locations[]

string

Optional. The list of Compute Engine zones where node pool nodes associated with a Dataproc on GKE virtual cluster will be located.

Note: All node pools associated with a virtual cluster must be located in the same region as the virtual cluster, and they must be located in the same zone within that region.

If a location is not specified during node pool creation, Dataproc on GKE will choose the zone.

autoscaling

object (GkeNodePoolAutoscalingConfig)

Optional. The autoscaler configuration for this node pool. The autoscaler is enabled only when a valid configuration is present.

GkeNodeConfig

Parameters that describe cluster nodes.

JSON representation
{
  "machineType": string,
  "localSsdCount": integer,
  "preemptible": boolean,
  "accelerators": [
    {
      object (GkeNodePoolAcceleratorConfig)
    }
  ],
  "minCpuPlatform": string,
  "spot": boolean
}
Fields
machineType

string

Optional. The name of a Compute Engine machine type.

localSsdCount

integer

Optional. The number of local SSD disks to attach to the node, which is limited by the maximum number of disks allowable per zone (see Adding Local SSDs).

preemptible

boolean

Optional. Whether the nodes are created as legacy preemptible VM instances. Also see Spot VMs, preemptible VM instances without a maximum lifetime. Legacy and Spot preemptible nodes cannot be used in a node pool with the CONTROLLER role or in the DEFAULT node pool if the CONTROLLER role is not assigned (the DEFAULT node pool will assume the CONTROLLER role).

accelerators[]

object (GkeNodePoolAcceleratorConfig)

Optional. A list of hardware accelerators to attach to each node.

minCpuPlatform

string

Optional. Minimum CPU platform to be used by this instance. The instance may be scheduled on the specified or a newer CPU platform. Specify the friendly names of CPU platforms, such as "Intel Haswell"` or Intel Sandy Bridge".

spot

boolean

Optional. Whether the nodes are created as Spot VM instances. Spot VMs are the latest update to legacy preemptible VMs. Spot VMs do not have a maximum lifetime. Legacy and Spot preemptible nodes cannot be used in a node pool with the CONTROLLER role or in the DEFAULT node pool if the CONTROLLER role is not assigned (the DEFAULT node pool will assume the CONTROLLER role).

GkeNodePoolAcceleratorConfig

A GkeNodeConfigAcceleratorConfig represents a Hardware Accelerator request for a node pool.

JSON representation
{
  "acceleratorCount": string,
  "acceleratorType": string,
  "gpuPartitionSize": string
}
Fields
acceleratorCount

string (int64 format)

The number of accelerator cards exposed to an instance.

acceleratorType

string

The accelerator type resource namename (see GPUs on Compute Engine).

gpuPartitionSize

string

Size of partitions to create on the GPU. Valid values are described in the NVIDIA mig user guide.

GkeNodePoolAutoscalingConfig

GkeNodePoolAutoscaling contains information the cluster autoscaler needs to adjust the size of the node pool to the current cluster usage.

JSON representation
{
  "minNodeCount": integer,
  "maxNodeCount": integer
}
Fields
minNodeCount

integer

The minimum number of nodes in the node pool. Must be >= 0 and <= maxNodeCount.

maxNodeCount

integer

The maximum number of nodes in the node pool. Must be >= minNodeCount, and must be > 0. Note: Quota must be sufficient to scale up the cluster.

KubernetesSoftwareConfig

The software configuration for this Dataproc cluster running on Kubernetes.

JSON representation
{
  "componentVersion": {
    string: string,
    ...
  },
  "properties": {
    string: string,
    ...
  }
}
Fields
componentVersion

map (key: string, value: string)

The components that should be installed in this Dataproc cluster. The key must be a string from the KubernetesComponent enumeration. The value is the version of the software to be installed. At least one entry must be specified.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

properties

map (key: string, value: string)

The properties to set on daemon config files.

Property keys are specified in prefix:property format, for example spark:spark.kubernetes.container.image. The following are supported prefixes and their mappings:

  • spark: spark-defaults.conf

For more information, see Cluster properties.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

AuxiliaryServicesConfig

Auxiliary services configuration for a Cluster.

JSON representation
{
  "metastoreConfig": {
    object (MetastoreConfig)
  },
  "sparkHistoryServerConfig": {
    object (SparkHistoryServerConfig)
  }
}
Fields
metastoreConfig

object (MetastoreConfig)

Optional. The Hive Metastore configuration for this workload.

sparkHistoryServerConfig

object (SparkHistoryServerConfig)

Optional. The Spark History Server configuration for the workload.

SparkHistoryServerConfig

Spark History Server configuration for the workload.

JSON representation
{
  "dataprocCluster": string
}
Fields
dataprocCluster

string

Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.

Example:

  • projects/[projectId]/regions/[region]/clusters/[clusterName]

ClusterStatus

The status of a cluster and its instances.

JSON representation
{
  "state": enum (State),
  "detail": string,
  "stateStartTime": string,
  "substate": enum (Substate)
}
Fields
state

enum (State)

Output only. The cluster's state.

detail

string

Optional. Output only. Details of cluster's state.

stateStartTime

string (Timestamp format)

Output only. Time when this state was entered (see JSON representation of Timestamp).

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

substate

enum (Substate)

Output only. Additional state information that includes status reported by the agent.

State

The cluster state.

Enums
UNKNOWN The cluster state is unknown.
CREATING The cluster is being created and set up. It is not ready for use.
RUNNING

The cluster is currently running and healthy. It is ready for use.

Note: The cluster state changes from "creating" to "running" status after the master node(s), first two primary worker nodes (and the last primary worker node if primary workers > 2) are running.

ERROR The cluster encountered an error. It is not ready for use.
ERROR_DUE_TO_UPDATE The cluster has encountered an error while being updated. Jobs can be submitted to the cluster, but the cluster cannot be updated.
DELETING The cluster is being deleted. It cannot be used.
UPDATING The cluster is being updated. It continues to accept and process jobs.
STOPPING The cluster is being stopped. It cannot be used.
STOPPED The cluster is currently stopped. It is not ready for use.
STARTING The cluster is being started. It is not ready for use.
SCHEDULED Cluster creation is currently waiting for resources to be available. Once all resources are available, it will transition to CREATING and then RUNNING.

Substate

The cluster substate.

Enums
UNSPECIFIED The cluster substate is unknown.
UNHEALTHY

The cluster is known to be in an unhealthy state (for example, critical daemons are not running or HDFS capacity is exhausted).

Applies to RUNNING state.

STALE_STATUS

The agent-reported status is out of date (may occur if Dataproc loses communication with Agent).

Applies to RUNNING state.

ClusterMetrics

Contains cluster daemon metrics, such as HDFS and YARN stats.

Beta Feature: This report is available for testing purposes only. It may be changed before final release.

JSON representation
{
  "hdfsMetrics": {
    string: string,
    ...
  },
  "yarnMetrics": {
    string: string,
    ...
  }
}
Fields
hdfsMetrics

map (key: string, value: string (int64 format))

The HDFS metrics.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

yarnMetrics

map (key: string, value: string (int64 format))

YARN metrics.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

Methods

create

Creates a cluster in a project.

delete

Deletes a cluster in a project.

diagnose

Gets cluster diagnostic information.

get

Gets the resource representation for a cluster in a project.

getIamPolicy

Gets the access control policy for a resource.

list

Lists all regions/{region}/clusters in a project alphabetically.

patch

Updates a cluster in a project.

setIamPolicy

Sets the access control policy on the specified resource.

start

Starts a cluster in a project.

stop

Stops a cluster in a project.

testIamPermissions

Returns permissions that a caller has on the specified resource.