Skip to content

Commit

Permalink
Merge pull request #171 from databricks-industry-solutions/release/0.3.2
Browse files Browse the repository at this point in the history
Release/0.3.2
  • Loading branch information
arunpamulapati authored Oct 23, 2024
2 parents fd12bdd + 6df9351 commit 61b9122
Show file tree
Hide file tree
Showing 22 changed files with 98 additions and 218 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -147,4 +147,5 @@ dmypy.json
dabs/dabs_template/template/tmp

**/.terraform*
**/terraform.tfvars
**/terraform.tfstate*
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Security Analysis Tool (SAT) analyzes customer's Databricks account and workspac

Databricks has worked with thousands of customers to securely deploy the Databricks platform, with the appropriate security features that meet their architecture requirements. While many organizations deploy security differently, there are guidelines and features that are commonly used by organizations that need a high level of security. This tool checks for typical security features that are deployed by most high-security organizations, and reviews the largest risks and the risks that customers ask about most often. It will then provide a security configuration reference link to Databricks documentation along with a recommendation.

Note: SAT is a productivity tool to help verify security configurations against security best practices of Databricks, its not meant to be used as a certification or an attestation of your deployments. Please review the SAT report with your business stakeholders, administrators, security team and auditors about SAT report and assess your organizational security requirements before making any security improvments bases on the report, not all deviations required to be mitigated. Some of the recommendations may have cost implications, some of the security features recommneded may have dependecny feature limitations, please thorougly review individual feature doucmentation before making changes to your security configurations. SAT project is being regulary updated to improve correctness of checks, add new checks, fix bugs. Please send your feedback and comments to [email protected] or open a git issue.
SAT is a productivity tool to help verify security configurations against security best practices of Databricks, its not meant to be used as a certification or an attestation of your deployments. Please review the SAT report with your business stakeholders, administrators, security team and auditors about SAT report and assess your organizational security requirements before making any security improvements bases on the report, not all deviations required to be mitigated. Some of the recommendations may have cost implications, some of the security features recommended may have dependency feature limitations, please thoroughly review individual feature documentation before making changes to your security configurations. SAT project is being regularly updated to improve correctness of checks, add new checks, fix bugs. Please send your feedback and comments to [email protected] or open a git issue.

## Functionality
Security Analysis Tool (SAT) is an observability tool that aims to improve the security hardening of Databricks deployments by making customers aware of deviations from established security best practices by helping customers monitor the security health of Databricks account workspaces easily. There is a need for a master checklist that prioritizes the checks by severity and running this as a routine scan for all the workspaces helps ensure continuous adherence to best practices. This also helps to build confidence to onboard sensitive datasets.
Expand Down
1 change: 0 additions & 1 deletion configs/sat_dasf_mapping.csv
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ sat_id,dasf_control_id,dasf_control_name
32,DASF-52:Source code control,
35,DASF-4:Restrict access using private link,
37,DASF-3:Restrict access using IP access lists,
52,DASF-52:Source code control,
53,"DASF-5:Control access to data and other objects, DASF-16:Secure model features, DASF-24:Control access to models and model assets, DASF:43-Use access control lists",
54,DASF-51:Share data and AI assets securely,
55,DASF-51:Share data and AI assets securely,
Expand Down
1 change: 0 additions & 1 deletion configs/security_best_practices.csv
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ id,check_id,category,check,evaluation_value,severity,recommendation,aws,azure,gc
49,DP-8,Data Protection,Enable storing interactive notebook results only in the customer account,-1,Medium,Enable store interactive notebook results in your account,1,1,1,1,0,Check workspace-conf for storeInteractiveNotebookResultsInCustomerAccount setting,curl -n -X GET 'https://<workspace_url>/api/2.0/preview/workspace-conf?keys=storeInteractiveNotebookResultsInCustomerAccount',https://docs.databricks.com/administration-guide/workspace/notebooks.html#manage-where-notebook-results-are-stored,https://learn.microsoft.com/en-us/azure/databricks/administration-guide/workspace/notebooks#manage-where-notebook-results-are-stored,https://docs.gcp.databricks.com/administration-guide/workspace/notebooks.html#manage-where-notebook-results-are-stored
50,GOV-15,Governance,"Enable verbose audit logs (on Azure, diagnostic logs)",-1,Medium,"Enable verbose audit logs (on Azure, diagnostic logs)",1,1,1,1,0,Check workspace-conf for enableVerboseAuditLogs setting,curl -n -X GET 'https://<workspace_url>/api/2.0/preview/workspace-conf?keys=enableVerboseAuditLogs',https://docs.databricks.com/en/admin/account-settings/verbose-logs.html,https://learn.microsoft.com/en-us/azure/databricks/admin/account-settings/verbose-logs,https://docs.gcp.databricks.com/en/admin/account-settings/verbose-logs.html
51,DP-9,Data Protection,FileStore endpoint for HTTPS file serving,-1,Medium,Review and disable FileStore endpoint in admin console workspace settings,1,1,1,1,0,Check workspace-conf for enableFileStoreEndpoint setting,curl -n -X GET 'https://<workspace_url>/api/2.0/preview/workspace-conf?keys=enableFileStoreEndpoint',https://docs.databricks.com/dbfs/filestore.html#filestore,https://learn.microsoft.com/en-us/azure/databricks/dbfs/filestore,https://docs.gcp.databricks.com/dbfs/filestore.html#filestore
52,INFO-15,Informational,Store code in Git for notebooks,-1,High,Enable git versioning for notebooks,1,1,1,0,0,Check workspace-conf for enableNotebookGitVersioning setting,curl -n -X GET 'https://<workspace_url>/api/2.0/preview/workspace-conf?keys=enableNotebookGitVersioning',https://docs.databricks.com/repos/index.html,https://learn.microsoft.com/en-us/azure/databricks/repos/index,https://docs.gcp.databricks.com/repos/index.html
53,GOV-16,Governance,Workspace Unity Catalog metastore assignment,-1,Medium,Enable a workspace for Unity Catalog by assigning a Unity Catalog metastore,1,1,1,1,0,Check if current-metastore-assignment has the workspace assigned to metastore_id,curl --netrc -X GET \ https://<workspace_url>/api/2.1/unity-catalog/current-metastore-assignment,https://docs.databricks.com/data-governance/unity-catalog/enable-workspaces.html,https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/enable-workspaces,https://docs.gcp.databricks.com/data-governance/unity-catalog/enable-workspaces.html
54,GOV-17,Governance,Limit the lifetime (expiration) of metastore Delta Sharing recipient token,-1,High,Set the lifetime of Delta Sharing recipient tokens,1,1,1,1,0,Check if delta_sharing_recipient_token_lifetime_in_seconds is set less than 90 days where delta_sharing_scope is INTERNAL_AND_EXTERNAL,curl --netrc -X GET \ https://<workspace_url>/api/2.1/unity-catalog/metastore_summary,https://docs.databricks.com/data-sharing/create-recipient.html#modify-the-recipient-token-lifetime,https://learn.microsoft.com/en-us/azure/databricks/data-sharing/create-recipient#modify-recipient-token-lifetime,https://docs.gcp.databricks.com/data-sharing/create-recipient.html#modify-the-recipient-token-lifetime
55,GOV-18,Governance,Delta Sharing IP access lists,-1,Medium,Configure Delta Sharing IP access lists to restrict recipient access to trusted IP addresses,1,1,1,1,0,"Check if ip_access_list is present on share recipients for authentication_type ""TOKEN""",curl --netrc -X GET \ https://<workspace_url>/api/2.1/unity-catalog/recipients,https://docs.databricks.com/data-sharing/access-list.html#use-ip-access-lists-to-restrict-delta-sharing-recipient-access-open-sharing,https://learn.microsoft.com/en-gb/azure/databricks/data-sharing/access-list,https://docs.gcp.databricks.com/data-sharing/create-recipient.html#security-considerations-for-tokens
Expand Down
2 changes: 1 addition & 1 deletion docs/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
> **Note**: SAT requires at least one SAT set up in a workspace per **account** in AWS or GCP and at least one SAT set up in a workspace per Azure **subscription**.
> Please make sure to review the SAT report with your business stakeholders, administrators, security team and auditors about SAT report and assess your organizational security requirements before making any security improvments bases on the report, not all deviations required to be mitigated. Some of the recommendations may have cost implications, some of the security features recommneded may have dependecny feature limitations, please thorougly review individual feature doucmentation before making changes to your security configurations.
> Please make sure to review the SAT report with your business stakeholders, administrators, security team and auditors about SAT report and assess your organizational security requirements before making any security improvements bases on the report, not all deviations required to be mitigated. Some of the recommendations may have cost implications, some of the security features recommended may have dependency feature limitations, please thoroughly review individual feature documentation before making changes to your security configurations.
Follow this guide to setup the Security Analysis Tool (SAT) on your Databricks workspace.

Expand Down
2 changes: 1 addition & 1 deletion docs/setup/faqs_and_troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ We created diagnosis notebooks for respective clouds to help troubleshoot your S
%pip install msal --find-links /dbfs/FileStore/wheels/msal-1.22.0-py2.py3-none-any.whl
%pip install dbl-sat-sdk==0.1.34 --find-links /dbfs/FileStore/wheels/dbl_sat_sdk-0.1.34-py3-none-any.whl
%pip install dbl-sat-sdk==0.1.37 --find-links /dbfs/FileStore/wheels/dbl_sat_sdk-0.1.37-py3-none-any.whl
```
7. Make sure the versions for the above libraries match.
4 changes: 2 additions & 2 deletions notebooks/Includes/install_sat_sdk.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@

# COMMAND ----------

SDK_VERSION='0.1.35'
SDK_VERSION='0.1.37'

# COMMAND ----------

#%pip install dbl-sat-sdk=={SDK_VERSION} --find-links /dbfs/FileStore/tables/dbl_sat_sdk-0.1.28-py3-none-any.whl
#%pip install dbl-sat-sdk=={SDK_VERSION} --find-links /dbfs/FileStore/tables/dbl_sat_sdk-0.1.37-py3-none-any.whl

# COMMAND ----------

Expand Down
24 changes: 0 additions & 24 deletions notebooks/Includes/workspace_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,30 +430,6 @@ def enableFileStoreEndpoint(df):

# COMMAND ----------

id = '52' # Enable git versioning for notebooks
enabled, sbp_rec = getSecurityBestPracticeRecord(id, cloud_type)

def enableNotebookGitVersioning(df):
value = 'false'
defn = {'defn' : ''}
for row in df.rdd.collect():
value = row.value
defn = {'defn' : row.defn.replace("'", '')}
if(value == None or value == 'true'):
return (id, 0, defn)
else:
return (id, 1, defn)

if enabled:
tbl_name = 'global_temp.workspacesettings' + '_' + workspace_id
sql = f'''
SELECT * FROM {tbl_name}
WHERE name="enableNotebookGitVersioning"
'''
sqlctrl(workspace_id, sql, enableNotebookGitVersioning)

# COMMAND ----------

id = '63' # Legacy Global Init Scripts
enabled, sbp_rec = getSecurityBestPracticeRecord(id, cloud_type)

Expand Down
32 changes: 2 additions & 30 deletions notebooks/diagnosis/sat_diagnosis_azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,7 @@
dbutils.secrets.get(scope=json_['master_name_scope'], key='tenant-id')
dbutils.secrets.get(scope=json_['master_name_scope'], key='client-id')
dbutils.secrets.get(scope=json_['master_name_scope'], key='client-secret')
tokenkey = f"{json_['workspace_pat_token_prefix']}-{current_workspace}"
dbutils.secrets.get(scope=json_['master_name_scope'], key=tokenkey)
dbutils.secrets.get(scope=json_['master_name_scope'], key="analysis_schema_name")
print("Your SAT configuration has required secret names")
except Exception as e:
dbutils.notebook.exit(f'Your SAT configuration is missing required secret, please review setup instructions {e}')
Expand All @@ -79,33 +78,6 @@
print(" ".join(secretvalue))


# COMMAND ----------

# MAGIC %md
# MAGIC ### Check to see if the PAT token are valid

# COMMAND ----------

import requests

access_token = dbutils.secrets.get(scope=json_['master_name_scope'], key=tokenkey)

# Define the URL and headers
workspaceUrl = spark.conf.get('spark.databricks.workspaceUrl')


url = f'https://{workspaceUrl}/api/2.0/clusters/spark-versions'
headers = {
'Authorization': f'Bearer {access_token}'
}

# Make the GET request
response = requests.get(url, headers=headers)

# Print the response
print(response.json())


# COMMAND ----------

# MAGIC %md
Expand Down Expand Up @@ -194,7 +166,7 @@

# Define the URL and headers
DATABRICKS_ACCOUNT_ID = dbutils.secrets.get(scope=sat_scope, key="account-console-id")
url = f'https://accounts.azuredatabricks.net/api/2.0/accounts/{DATABRICKS_ACCOUNT_ID}'
url = f'https://accounts.azuredatabricks.net/api/2.0/accounts/{DATABRICKS_ACCOUNT_ID}/workspaces'

## Note: The access token should be generated for a SP which is an account admin to run this command.

Expand Down
53 changes: 11 additions & 42 deletions notebooks/diagnosis/sat_diagnosis_gcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,6 @@



# COMMAND ----------

import json
#Get current workspace id
context = json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())
current_workspace = context['tags']['orgId']
print(current_workspace)

# COMMAND ----------

# MAGIC %md
Expand All @@ -59,27 +51,19 @@
dbutils.secrets.get(scope=json_['master_name_scope'], key='sql-warehouse-id')
dbutils.secrets.get(scope=json_['master_name_scope'], key='gs-path-to-json')
dbutils.secrets.get(scope=json_['master_name_scope'], key='impersonate-service-account')
tokenkey = f"{json_['workspace_pat_token_prefix']}-{current_workspace}"
dbutils.secrets.get(scope=json_['master_name_scope'], key=tokenkey)
dbutils.secrets.get(scope=json_['master_name_scope'], key="analysis_schema_name")
print("Your SAT configuration has required secret names")
except Exception as e:
dbutils.notebook.exit(f'Your SAT configuration is missing required secret, please review setup instructions {e}')

# COMMAND ----------

# MAGIC %md
# MAGIC ### Validate the following Values and make sure they are correct

# COMMAND ----------

sat_scope = json_['master_name_scope']
import requests,json

for key in dbutils.secrets.list(sat_scope):
if key.key == tokenkey or not key.key.startswith("sat-token-"):
print(key.key)
secretvalue = dbutils.secrets.get(scope=sat_scope, key=key.key)
print(" ".join(secretvalue))

# Define the URL and headers
workspaceUrl = json.loads(dbutils.notebook.entry_point.getDbutils().notebook() \
.getContext().toJson())['tags']['browserHostName']

# COMMAND ----------

Expand All @@ -90,25 +74,11 @@

import requests,json

access_token = dbutils.secrets.get(scope=json_['master_name_scope'], key=tokenkey)

# Define the URL and headers
workspaceUrl = json.loads(dbutils.notebook.entry_point.getDbutils().notebook() \
.getContext().toJson())['tags']['browserHostName']


url = f'https://{workspaceUrl}/api/2.0/clusters/spark-versions'
headers = {
'Authorization': f'Bearer {access_token}'
}

# Make the GET request
response = requests.get(url, headers=headers)

# Print the response
print(response.json())


# COMMAND ----------

gcp_accounts_url = 'https://accounts.gcp.databricks.com'
Expand Down Expand Up @@ -197,7 +167,7 @@ def getGCSAccessToken(cred_file_path,target_principal):
import requests

# Define the URL and headers
DATABRICKS_ACCOUNT_ID = dbutils.secrets.get(scope=sat_scope, key="account-console-id")
DATABRICKS_ACCOUNT_ID = dbutils.secrets.get(scope=json_['master_name_scope'], key="account-console-id")
url = f'https://accounts.gcp.databricks.com/api/2.0/accounts/{DATABRICKS_ACCOUNT_ID}/workspaces'

## Note: The access token should be generated for a SP which is an account admin to run this command.
Expand Down Expand Up @@ -228,7 +198,7 @@ def getGCSAccessToken(cred_file_path,target_principal):
# COMMAND ----------

# MAGIC %sh
# MAGIC curl -X GET --header 'Authorization: Bearer <identity_token>' --header 'X-Databricks-GCP-SA-Access-Token: <access_token>' https://accounts.gcp.databricks.com/api/2.0/accounts/<accounts_console_id>/workspaces
# MAGIC #curl -X GET --header 'Authorization: Bearer <identity_token>' --header 'X-Databricks-GCP-SA-Access-Token: <access_token>' https://accounts.gcp.databricks.com/api/2.0/accounts/<accounts_console_id>/workspaces

# COMMAND ----------

Expand Down Expand Up @@ -274,7 +244,6 @@ def generateWSToken(deployment_url, cred_file_path,target_principal):

import requests,json

access_token = dbutils.secrets.get(scope=json_['master_name_scope'], key=tokenkey)

# Define the URL and headers
workspaceUrl = json.loads(dbutils.notebook.entry_point.getDbutils().notebook() \
Expand Down Expand Up @@ -305,9 +274,9 @@ def generateWSToken(deployment_url, cred_file_path,target_principal):

# COMMAND ----------

# MAGIC %sh
# MAGIC
# MAGIC curl -X GET --header 'Authorization: Bearer access_token' 'https://<workspace_id>.gcp.databricks.com/api/2.0/clusters/list'
#%sh

#curl -X GET --header 'Authorization: Bearer access_token' 'https://<workspace_id>.gcp.databricks.com/api/2.0/clusters/list'

# COMMAND ----------

Expand Down Expand Up @@ -348,7 +317,7 @@ def openssl_connect(host, port):

# COMMAND ----------

openssl_connect('accounts.azuredatabricks.net', 443)
openssl_connect('accounts.gcp.databricks.com', 443)

# COMMAND ----------

Expand Down
12 changes: 9 additions & 3 deletions src/securityanalysistoolproject/clientpkgs/ws_settings_client.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
'''Workspace settings module'''
from core.dbclient import SatDBClient
import json
from core.logging_utils import LoggingUtils


LOGGR=None

if LOGGR is None:
LOGGR = LoggingUtils.get_logger()

class WSSettingsClient(SatDBClient):
'''workspace setting helper'''

def get_wssettings_list(self):
"""
Returns an array of json objects for workspace settings.
Expand Down Expand Up @@ -50,7 +56,6 @@ def get_wssettings_list(self):
{"name": "enableLibraryAndInitScriptOnSharedCluster", "defn":"Enable libraries and init scripts on shared Unity Catalog clusters"}
]
# pylint: enable=line-too-long

for keyn in ws_keymap:
valn={}
try:
Expand All @@ -62,7 +67,8 @@ def get_wssettings_list(self):
valins = {}
valins['name']=keyn['name']
valins['defn']=keyn['defn']
valins['value']=None if valn[keyn['name']] is None else valn[keyn['name']]
#fixed feature/SFE-3483
valins['value']=None if keyn['name'] not in valn or valn[keyn['name']] is None else valn[keyn['name']]
all_result.append(valins)
return all_result

Expand Down
Loading

0 comments on commit 61b9122

Please sign in to comment.