Skip to content
This repository has been archived by the owner on Aug 2, 2024. It is now read-only.

Latest commit

 

History

History
110 lines (71 loc) · 6.92 KB

silo_vnet_existingstorage.md

File metadata and controls

110 lines (71 loc) · 6.92 KB

Create a silo with compute behind a vnet accessing your existing storage account

This tutorial applies in the case you want to create a compute to process data from an existing storage account located in the same tenant (different subscription or resource group).

Typically, this would happen when using internal silos corresponding to various regions, with storages not managed by a single entity but scattered accross multiple subscriptions in a single company.

Table of contents

Prerequisites

To run these deployment options, you first need:

  • an existing Azure ML workspace (see cookbook)
  • an existing private DNS zone for storage, named privatelink.blob.core.windows.net (see below)
  • have permissions to create resources, set permissions, and create identities in this subscription (or at least in one resource group),
    • Note that to set permissions, you typically need Owner role in the subscription or resource group - Contributor role is not enough. This is key for being able to secure the setup.
  • Optional: install the Azure CLI.

To create a private DNS zone
If you don't already have one, you will need to manually create a private DNS zone for the storage account and compute of this pair.
To do that, go to the Azure portal, and in the resource group of your AzureML workspace, create a new private DNS zone named privatelink.blob.core.windows.net.
You only need one unique zone for all the pairs you create (both orchestrator and silos). All private DNS entries will be written in that single zone.

Important: understand the design

The design we used for this tutorial is identical to a silo provisioned with a new storage. What is different in this case is that we do not provision the storage, so we rely on the previously configured storage account.

It is important in this case that you set the following on your existing storage account:

  • make sure the storage account is in the same tenant as the AzureML workspace
  • in the networking settings of the storage, it is recommended to set Public network access to "Disabled" (access will be allowed only via a private endpoint)
  • create a container in this storage account for your fl data

Create a compute pair for the silo, attach storage as datastore

:important: make sure the subnet address space is not overlapping with any other subnet in your vnet, in particular that it is unique accross all your silos and orchestrator. For instance you can use 10.0.0.0/24 for the orchestrator, then 10.0.N.0/24 for each silo, with a distinct N value.

Using one click deployment

  1. Click on Deploy to Azure

  2. Adjust parameters, in particular:

    • Region: this will be set by Azure to the region of your resource group.
    • Machine Learning Name: need to match the name of the AzureML workspace in the resource group.
    • Machine Learning Region: the region in which the AzureML workspace was deployed (default: same as resource group).
    • Pair Region: the region where the compute and storage will be deployed (default: same as resource group), make sure this matches with the location of your storage account.
    • Pair Base Name: a unique name for the silo, example silo1-westus. This will be used to create all other resources (storage name, compute name, etc.).
    • Existing Storage Account Name: name of the storage account to attach to this silo.
    • Existing Storage Account Resource Group: name of the resource group in which the storage is provisioned.
    • Existing Storage Account Subscription Id: id of the subscription in which the storage is provisioned.
    • Existing Storage Container Name: name of container where the data will be located.

Using az cli

In the resource group of your AzureML workspace, use the following command with parameters corresponding to your setup:

az deployment group create --template-file ./mlops/bicep/modules/fl_pairs/vnet_compute_existing_storage.bicep --resource-group <resource group name> --parameters pairBaseName="silo1-westus" pairRegion="westus" machineLearningName="aml-fldemo" machineLearningRegion="eastus" subnetPrefix="10.0.1.0/24" existingStorageAccountName="..." existingStorageAccountResourceGroup="..." existingStorageAccountSubscriptionId="..."

Make sure pairRegion matches with the region of your storage account.

Set up interactions within the silo

Let's set required permissions between the silo's compute and the silo's existing storage account.

  1. Navigate the Azure portal to find your resource group.

  2. Look for a resource of type Managed Identity in the region of the silo named like uai-<pairBaseName>. It should have been created by the instructions above.

  3. Open this identity and click on Azure role assignments. You should see the list of assignments for this identity.

    It should contain 3 roles towards the storage account of the silo itself:

    • Storage Blob Data Contributor
    • Reader and Data Access
    • Storage Account Key Operator Service Role
  4. Click on Add role assignment and add each of these same role towards the storage account of your orchestrator.

Set up interactions with the orchestrator

Option 1: public storage account

All you'll have to set are permissions for the silo's compute to R/W from/to the orchestrator.

  1. Navigate the Azure portal to find your resource group.

  2. Look for a resource of type Managed Identity in the region of the silo named like uai-<pairBaseName>. It should have been created by the instructions above.

  3. Open this identity and click on Azure role assignments. You should see the list of assignments for this identity.

    It should contain 3 roles towards the storage account of the silo itself:

    • Storage Blob Data Contributor
    • Reader and Data Access
    • Storage Account Key Operator Service Role
  4. Click on Add role assignment and add each of these same role towards the storage account of your orchestrator.

Option 2: private storage with endpoints

🚧 work in progress 🚧