Job's Task is Legacy (spark_submit_task)
- Query id: 375cdab9-3f94-4ae0-b1e3-8fbdf9cdf4d7
- Query name: Job's Task is Legacy (spark_submit_task)
- Platform: Terraform
- Severity: Medium
- Category: Best Practices
- URL: Github
Description¶
Job's Task Is spark_submit_task
Documentation
Code samples¶
Code samples with security vulnerabilities¶
Positive test num. 1 - tf file
resource "databricks_job" "positive" {
name = "Job with multiple tasks"
job_cluster {
job_cluster_key = "j"
new_cluster {
num_workers = 2
spark_version = data.databricks_spark_version.latest.id
node_type_id = data.databricks_node_type.smallest.id
}
}
task {
task_key = "a"
new_cluster {
num_workers = 1
spark_version = data.databricks_spark_version.latest.id
node_type_id = data.databricks_node_type.smallest.id
}
notebook_task {
notebook_path = databricks_notebook.this.path
}
}
task {
task_key = "b"
//this task will only run after task a
depends_on {
task_key = "a"
}
existing_cluster_id = databricks_cluster.shared.id
spark_submit_task {
main_class_name = "com.acme.data.Main"
}
}
task {
task_key = "c"
job_cluster_key = "j"
notebook_task {
notebook_path = databricks_notebook.this.path
}
}
//this task starts a Delta Live Tables pipline update
task {
task_key = "d"
pipeline_task {
pipeline_id = databricks_pipeline.this.id
}
}
}
Positive test num. 2 - tf file
resource "databricks_job" "positive" {
name = "Job with multiple tasks"
job_cluster {
job_cluster_key = "j"
new_cluster {
num_workers = 2
spark_version = data.databricks_spark_version.latest.id
node_type_id = data.databricks_node_type.smallest.id
}
}
task {
task_key = "a"
existing_cluster_id = databricks_cluster.shared.id
spark_submit_task {
main_class_name = "com.acme.data.Main"
}
}
}
Code samples without security vulnerabilities¶
Negative test num. 1 - tf file
resource "databricks_job" "negative1" {
name = "Job with multiple tasks"
job_cluster {
job_cluster_key = "j"
new_cluster {
num_workers = 2
spark_version = data.databricks_spark_version.latest.id
node_type_id = data.databricks_node_type.smallest.id
}
}
task {
task_key = "a"
new_cluster {
num_workers = 1
spark_version = data.databricks_spark_version.latest.id
node_type_id = data.databricks_node_type.smallest.id
}
notebook_task {
notebook_path = databricks_notebook.this.path
}
}
task {
task_key = "b"
//this task will only run after task a
depends_on {
task_key = "a"
}
existing_cluster_id = databricks_cluster.shared.id
spark_jar_task {
main_class_name = "com.acme.data.Main"
}
}
task {
task_key = "c"
job_cluster_key = "j"
notebook_task {
notebook_path = databricks_notebook.this.path
}
}
//this task starts a Delta Live Tables pipline update
task {
task_key = "d"
pipeline_task {
pipeline_id = databricks_pipeline.this.id
}
}
}
Negative test num. 2 - tf file
resource "databricks_job" "negative1" {
name = "Job with multiple tasks"
job_cluster {
job_cluster_key = "j"
new_cluster {
num_workers = 2
spark_version = data.databricks_spark_version.latest.id
node_type_id = data.databricks_node_type.smallest.id
}
}
task {
task_key = "a"
new_cluster {
num_workers = 1
spark_version = data.databricks_spark_version.latest.id
node_type_id = data.databricks_node_type.smallest.id
}
notebook_task {
notebook_path = databricks_notebook.this.path
}
}
}