Skip to content

Commit

Permalink
Merge pull request #115 from databricks-industry-solutions/new_sat_setup
Browse files Browse the repository at this point in the history
New SAT Setup
  • Loading branch information
arunpamulapati authored Jun 4, 2024
2 parents 465fd5c + ff78bf6 commit 72a66a1
Show file tree
Hide file tree
Showing 47 changed files with 7,724 additions and 1,116 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,9 @@ dmypy.json

# Pyre type checker
.pyre/

# DABs Generated Template
dabs/dabs_template/template/tmp

**/.terraform*
**/terraform.tfstate*
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@
# Security Analysis Tool (SAT)
<img src="./images/sat_icon.jpg" width="10%" height="10%">

Refer to [manual setup guide](./docs/setup.md) or [Terraform](./terraform) to setup and optional [video](https://www.youtube.com/watch?v=xAav6GslSd8) overview with follow along instruction.
Refer to specific use-case:
- [Standard setup guide](./docs/setup.md)
- [Terraform](./terraform/README.md)
- [Deprecated: Manual setup](./docs/deprecated_old_setup.md)

## Introduction

Expand Down Expand Up @@ -85,7 +88,7 @@ For example, The diagram below shows the individual checks in various categories


## Configuration and Usage instructions
Refer to [manul setup guide](./docs/setup.md) or [Terraform](./terraform) to setup
Refer to [Standard setup guide](./docs/setup.md) or [Terraform](./terraform) to setup

## Project support

Expand Down
36 changes: 36 additions & 0 deletions configs/sat_dasf_mapping.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
sat_id,dasf_control_id,dasf_control_name
1,DASF 33,Manage credentials securely
2,DASF 8,Encrypt data at rest
2,DASF 30,Encrypt models
3,DASF 8,Encrypt data at rest
3,DASF 30,Encrypt models
4,DASF 8,Encrypt data at rest
8,DASF 14,Audit actions performed on datasets
8,DASF 55,Monitor Audit logs
9,DASF 38,Platform security — vulnerability management
10,DASF 38,Platform security — vulnerability management
18,DASF 1,SSO with IdP and MFA
19,DASF 2,Sync users and groups
29,DASF 43,Use access control lists
30,DASF 43,Use access control lists
31,DASF 43,Use access control lists
32,DASF 52,Source code control
35,DASF 4,Restrict access using private link
37,DASF 3,Restrict access using IP access lists
52,DASF 52,Source code control
53,DASF 5,Control access to data and other objects
53,DASF 16,Secure model features
53,DASF 24,Control access to models and model assets
53,DASF 43,Use access control lists
54,DASF 51,Share data and AI assets securely
55,DASF 51,Share data and AI assets securely
56,DASF 51,Share data and AI assets securely
89,DASF 31,Secure model serving endpoints
90,DASF 32,Streamline the usage and management of various large language model (LLM) providers
101,DASF 46,Store and retrieve embeddings securely
103,DASF 50,Platform compliance
104,DASF 53,Third-party library control
105,DASF 55,Monitor Audit logs
107,DASF 38,Platform security — vulnerability management
108,DASF 50,Platform compliance
109,DASF 50,Platform compliance
2 changes: 1 addition & 1 deletion configs/security_best_practices.csv
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ id,check_id,category,check,evaluation_value,severity,recommendation,aws,azure,gc
65,GOV-26,Governance,Legacy cluster-named init scripts,-1,High,Databricks recommends that you migrate legacy cluster-named init scripts to the new cluster-scoped init scripts framework and then disable legacy cluster named init scripts,1,1,0,1,0,Check workspace-conf for enableDeprecatedClusterNamedInitScripts setting,curl -n -X GET 'https://<workspace_url>/api/2.0/preview/workspace-conf?keys= enableDeprecatedClusterNamedInitScripts',https://docs.databricks.com/clusters/init-scripts.html#cluster-scoped-init-scripts,https://learn.microsoft.com/en-us/azure/databricks/clusters/init-scripts,N/A
78,GOV-28,Governance,Govern model assets,-1,Medium,Manage model lifecycle in Unity Catalog,1,1,1,1,0, List the registered models and check if there are any models in UC,curl -n -X GET 'https://<workspace_url>/api/2.1/unity-catalog/models',https://docs.databricks.com/en/machine-learning/manage-model-lifecycle/index.html,https://learn.microsoft.com/en-us/azure/databricks/machine-learning/manage-model-lifecycle/,https://docs.gcp.databricks.com/en/machine-learning/manage-model-lifecycle/index.html
89,NS-7,Network Security,Secure model serving endpoints,-1,High,Secure endpoints of the underlying models served as REST API endpoints using Private connectivity or IP Access Lists,1,1,0,1,0,Check if there are any model serving endpoints configured then check if IP Access lists or PL is deployed to prevent the endpoints from being accessed from the public internet,curl -n -X GET 'https://<workspace_url>/api/2.0/serving-endpoints',https://docs.databricks.com/en/machine-learning/model-serving/manage-serving-endpoints.html,https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/manage-serving-endpoints,N/A
90,INFO-29,Informational,Streamline the usage and management of various large language model(LLM) providers,-1, Medium,Secure third-party models hosted outside of Databricks with Model Serving to streamline the usage and management of various large language model (LLM) providers such as OpenAI and Anthropic within an organization by storing API keys in one secure location,1,1,0,1,0,Check if there is not at least one EXTERNAL_MODEL in model serving endpoints configured and alert incase of none,curl -n -X GET 'https://<workspace_url>/api/2.0/serving-endpoints',https://docs.databricks.com/en/generative-ai/external-models/index.html,https://learn.microsoft.com/en-us/azure/databricks/generative-ai/external-models,N/A
90,INFO-29,Informational,Streamline the usage and management of various large language model(LLM) providers,-1,Medium,Secure third-party models hosted outside of Databricks with Model Serving to streamline the usage and management of various large language model (LLM) providers such as OpenAI and Anthropic within an organization by storing API keys in one secure location,1,1,0,1,0,Check if there is not at least one EXTERNAL_MODEL in model serving endpoints configured and alert incase of none,curl -n -X GET 'https://<workspace_url>/api/2.0/serving-endpoints',https://docs.databricks.com/en/generative-ai/external-models/index.html,https://learn.microsoft.com/en-us/azure/databricks/generative-ai/external-models,N/A
101,DP-14,Data Protection,Store and retrieve embeddings securely,-1,Low,Store and retrieve embeddings securely using the Vector Search,1,1,0,1,0,List all the Vector Search endpoints and see if at least one endpoint is configured,curl -n -X GET 'https://<workspace_url>/api/2.0/vector-search/endpoints',https://docs.databricks.com/en/generative-ai/vector-search.html,https://learn.microsoft.com/en-us/azure/databricks/generative-ai/vector-search,N/A
103,INFO-37,Informational,Compliance security profile for new workspaces,-1,Low,Validate and deploy on a platform that has put in place controls to meet the unique compliance needs of highly regulated industries,1,0,0,1,0,Check if compliance security profile for new workspaces is enabled,curl -n -X GET 'https://accounts.cloud.databricks.com/api/2.0/accounts/<account_id>//accounts/{accountid}/settings/types/shield_csp_enablement_ac/names/default',https://docs.databricks.com/en/security/privacy/security-profile.html,https://learn.microsoft.com/en-us/azure/databricks/security/privacy/security-profile,N/A
104,INFO-38,Informational,Third-party library control,-1,Low,Add libraries and init scripts to the allowlist in Unity Catalog,1,1,1,1,0,Get the artifact allowlist of and check if any allowed artifacts are configured,curl -n -X GET 'https://<workspace_url>/api/2.1/unity-catalog/artifact-allowlists/{artifact_type},https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html,https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-privileges/allowlist,https://docs.gcp.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html
Expand Down
6 changes: 3 additions & 3 deletions configs/self_assessment_checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
check: Object storage encryption for data sources

- id: 18
enabled: true
enabled: false
check: Enable single sign-on

- id: 19
enabled: true
enabled: false
check: SCIM for user provisioning

- id: 20
enabled: true
enabled: false
check: Table Access Control for clusters that don't use Unity Catalog

- id: 28
Expand Down
26 changes: 26 additions & 0 deletions dabs/dabs_template/databricks_template_schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"welcome_message": "",
"properties": {
"catalog": {
"type": "string",
"description": "The catalog for SAT"
},
"cloud": {
"type": "string",
"description": "Cloud type"
},
"google_service_account": {
"type": "string",
"description": "Google service account"
},
"latest_lts": {
"type": "string",
"description": "Latest LTS version"
},
"node_type": {
"type": "string",
"description": "Node Type"
}
},
"success_message": ""
}
139 changes: 139 additions & 0 deletions dabs/dabs_template/initialize.py.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Databricks notebook source
# MAGIC %md
# MAGIC **Notebook name:** initialize
# MAGIC **Functionality:** initializes the necessary configruation values for the rest of the process into a json

# COMMAND ----------

# MAGIC %run ./common

# COMMAND ----------

# replace values for accounts exec
hostname = (
dbutils.notebook.entry_point.getDbutils()
.notebook()
.getContext()
.apiUrl()
.getOrElse(None)
)
cloud_type = getCloudType(hostname)

# COMMAND ----------

# MAGIC %md
# MAGIC ##### Modify JSON values
# MAGIC * **account_id** Account ID. Can get this from the accounts console
# MAGIC * **sql_warehouse_id** SQL Warehouse ID to import dashboard
# MAGIC * **verbosity** (optional). debug, info, warning, error, critical
# MAGIC * **master_name_scope** Secret Scope for Account Name
# MAGIC * **master_name_key** Secret Key for Account Name
# MAGIC * **master_pwd_scope** Secret Scope for Account Password
# MAGIC * **master_pwd_key** Secret Key for Account Password
# MAGIC * **workspace_pat_scope** Secret Scope for Workspace PAT
# MAGIC * **workspace_pat_token_prefix** Secret Key prefix for Workspace PAT. Workspace ID will automatically be appended to this per workspace
# MAGIC * **use_mastercreds** (optional) Use master account credentials for all workspaces

# COMMAND ----------

import json

json_ = {
"account_id": dbutils.secrets.get(scope="sat_scope", key="account-console-id"),
"sql_warehouse_id": dbutils.secrets.get(scope="sat_scope", key="sql-warehouse-id"),
"analysis_schema_name": "{{.catalog}}.security_analysis",
"verbosity": "info",
}

# COMMAND ----------

json_.update(
{
"master_name_scope": "sat_scope",
"master_name_key": "user",
"master_pwd_scope": "sat_scope",
"master_pwd_key": "pass",
"workspace_pat_scope": "sat_scope",
"workspace_pat_token_prefix": "sat-token",
"dashboard_id": "317f4809-8d9d-4956-a79a-6eee51412217",
"dashboard_folder": f"{basePath()}/dashboards/",
"dashboard_tag": "SAT",
"use_mastercreds": True,
"use_parallel_runs": True,
}
)


# COMMAND ----------

# DBTITLE 1,GCP configurations
if cloud_type == "gcp":
json_.update(
{
"service_account_key_file_path": dbutils.secrets.get(
scope="sat_scope", key="gs-path-to-json"
),
"impersonate_service_account": dbutils.secrets.get(
scope="sat_scope", key="impersonate-service-account"
),
"use_mastercreds": False,
}
)


# COMMAND ----------

# DBTITLE 1,Azure configurations
if cloud_type == "azure":
json_.update(
{
"account_id": "azure",
"subscription_id": dbutils.secrets.get(
scope="sat_scope", key="subscription-id"
), # Azure subscriptionId
"tenant_id": dbutils.secrets.get(
scope="sat_scope", key="tenant-id"
), # The Directory (tenant) ID for the application registered in Azure AD.
"client_id": dbutils.secrets.get(
scope="sat_scope", key="client-id"
), # The Application (client) ID for the application registered in Azure AD.
"client_secret_key": "client-secret", # The secret generated by AAD during your confidential app registration
"use_mastercreds": True,
}
)


# COMMAND ----------

# DBTITLE 1,AWS configurations
if cloud_type == "aws":
sp_auth = {
"use_sp_auth": "False",
"client_id": "",
"client_secret_key": "client-secret",
}
try:
use_sp_auth = (
dbutils.secrets.get(scope="sat_scope", key="use-sp-auth").lower() == "true"
)
if use_sp_auth:
sp_auth["use_sp_auth"] = "True"
sp_auth["client_id"] = dbutils.secrets.get(
scope="sat_scope", key="client-id"
)
except:
pass
json_.update(sp_auth)

# COMMAND ----------

create_schema()
create_security_checks_table()
create_account_info_table()
create_account_workspaces_table()
create_workspace_run_complete_table()

# COMMAND ----------

# Initialize best practices if not already loaded into database
readBestPracticesConfigsFile()
19 changes: 19 additions & 0 deletions dabs/dabs_template/template/tmp/databricks.yml.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
bundle:
name: SAT

include:
- resources/*.yml

targets:
sat:
default: true
mode: production
workspace:
host: {{workspace_host}}
root_path: /Applications/${bundle.name}/
run_as:
{{- if is_service_principal}}
service_principal_name: {{user_name}}
{{- else}}
user_name: {{user_name}}
{{- end}}
27 changes: 27 additions & 0 deletions dabs/dabs_template/template/tmp/resources/sat_driver_job.yml.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
resources:
jobs:
sat_driver:
name: "SAT Driver Notebook"
schedule:
quartz_cron_expression: "0 0 8 ? * Mon,Wed,Fri"
timezone_id: "America/New_York"
tasks:
- task_key: "sat_initializer"
job_cluster_key: job_cluster
libraries:
- pypi:
package: dbl-sat-sdk
notebook_task:
notebook_path: "../notebooks/security_analysis_driver.py"

job_clusters:
- job_cluster_key: job_cluster
new_cluster:
num_workers: 5
spark_version: {{.latest_lts}}
runtime_engine: "PHOTON"
node_type_id: {{.node_type}}
{{- if eq .cloud "gcp" }}
gcp_attributes:
google_service_account: {{.google_service_account}}
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
resources:
jobs:
sat_initializer:
name: "SAT Initializer Notebook (one-time)"

tasks:
- task_key: "sat_initializer"
job_cluster_key: job_cluster
libraries:
- pypi:
package: dbl-sat-sdk
notebook_task:
notebook_path: "../notebooks/security_analysis_initializer.py"

job_clusters:
- job_cluster_key: job_cluster
new_cluster:
num_workers: 5
spark_version: {{.latest_lts}}
runtime_engine: "PHOTON"
node_type_id: {{.node_type}}
{{- if eq .cloud "gcp" }}
gcp_attributes:
google_service_account: {{.google_service_account}}
{{- end }}
52 changes: 52 additions & 0 deletions dabs/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import json
import os
import subprocess

from databricks.sdk import WorkspaceClient
from sat.config import form, generate_secrets
from sat.utils import cloud_type


def install(client: WorkspaceClient, answers: dict, profile: str):
cloud = cloud_type(client)
generate_secrets(client, answers, cloud)
config = {
"catalog": answers.get("catalog", None),
"cloud": cloud,
"google_service_account": answers.get("gcp-impersonate-service-account", None),
"latest_lts": client.clusters.select_spark_version(
long_term_support=True,
latest=True,
),
"node_type": client.clusters.select_node_type(
local_disk=True,
min_cores=4,
gb_per_core=8,
photon_driver_capable=True,
photon_worker_capable=True,
),
}

config_file = "tmp_config.json"
with open(config_file, "w") as fp:
json.dump(config, fp)

os.system("clear")
subprocess.call(f"sh ./setup.sh tmp {profile} {config_file}".split(" "))
print("Installation complete.")
print(f"Review workspace -> {client.config.host}")


def setup():
try:
client, answers, profile = form()
install(client, answers, profile)
except KeyboardInterrupt:
print("Installation aborted.")
except Exception as e:
print(f"An error occurred: {e}")


if __name__ == "__main__":
os.system("clear")
setup()
Loading

0 comments on commit 72a66a1

Please sign in to comment.