Azure self-hosted onboarding checklist
To proceed with a self-hosted deployment, your organization must first sign a self-hosting agreement with Unstructured.
If you have not yet signed this agreement, stop here, and begin the self-hosting agreement process by contacting your Unstructured sales representative, emailing Unstructured Sales at sales@unstructured.io, or filling out the contact form on the Unstructured website.
After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options:
- Do it all for me: Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure.
- Bring my own infrastructure: Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure.
Questions? Need help?
If you have questions or need help as you go, contact your Unstructured sales representative or technical enablement contact. If you do not know who they are, email Unstructured Sales at sales@unstructured.io, or fill out the contact form on the Unstructured website, and a member of the Unstructured sales or technical enablement teams will get back to you as soon as possible.
Do it all for me
If you want Unstructured to set up the required infrastructure for you into your Azure account and then deploy the Unstructured UI and API into that newly created infrastrucrure, then provide your Unstructured sales representative or technical enablement contact with the access credentials for a Microsoft Entra ID user or service principal in your Azure account that has the following required permissions.
Subscription and resource group
Microsoft.Resources/subscriptions/resourceGroups/write
(to create the resource group)Microsoft.Resources/subscriptions/resourceGroups/read
(to read the resource group)
VNet and networking
Microsoft.Network/virtualNetworks/write
(to create the VNet)Microsoft.Network/virtualNetworks/read
(to read the VNet)Microsoft.Network/publicIPAddresses/write
(to create the public IPs)Microsoft.Network/publicIPAddresses/read
(to read the public IPs)Microsoft.Network/natGateways/write
(to create the NAT Gateway)Microsoft.Network/natGateways/read
(to read the NAT Gateway)Microsoft.Network/routeTables/write
(to create the route tables)Microsoft.Network/routeTables/read
(to read the route tables)Microsoft.Network/networkSecurityGroups/write
(to create the NSGs)Microsoft.Network/networkSecurityGroups/read
(to read the NSGs)
AKS cluster
Microsoft.ContainerService/managedClusters/write
(to create the AKS cluster)Microsoft.ContainerService/managedClusters/read
(to read the AKS cluster)Microsoft.ContainerService/agentPools/write
(to create the node pools)Microsoft.ContainerService/agentPools/read
(to read the node pools)
Managed identities and RBAC
-
Microsoft.ManagedIdentity/userAssignedIdentities/write
(to create the managed identities) -
Microsoft.ManagedIdentity/userAssignedIdentities/read
(to read managed identities) -
Assign built-in roles such as:
- Contributor or scoped Network Contributor for the AKS cluster identity
- Monitoring Metrics Publisher, AcrPull, and Storage Blob Data Reader for the node pool identity
- Storage Blob Data Contributor for workload identities
Kubernetes add-ons
Permissions depend on the Helm/YAML installation, but Azure RBAC integration requires Microsoft.ContainerService/managedClusters/accessProfiles/*/read
(to access kubeconfig)
Storage class
Microsoft.Storage/storageAccounts/write
(to create the storage account for CSI driver provisioning)Microsoft.Storage/storageAccounts/read
PostgreSQL database
Microsoft.DBforPostgreSQL/flexibleServers/write
(to create the PostgreSQL server)Microsoft.DBforPostgreSQL/flexibleServers/read
- NSG permissions for database access: allow traffic from the VNet CIDR
Bring my own infrastructure
If you want to set up the required infrastructure yourself, set things up as follows within your Azure account for Unstructured to deploy the Unstructured UI and API into.
You must also provide your Unstructured sales representative or technical enablement contact with the access credentials for an IAM user or service principal in your AWS account that has access to the target Azure Kubernetes Service (AKS) cluster to deploy the Unstructured UI and API into.
Azure subscription and resource group
-
Subscription
- Ensure you have access to a valid Azure subscription
- You will need the
subscription_id
if deploying via CLI or Pulumi
-
Resource Group
- Name:
u10d-{env}-rg
- Region: e.g.,
eastus2
- All resources (VNet, AKS, PostgreSQL, Storage, etc.) will be created inside this group
- Name:
VNet and networking
-
Virtual Network (VNet)
- Address space:
10.0.0.0/16
- DNS Hostnames: Enabled
- DNS Support: Enabled
- Address space:
-
Internet Access
- Handled via Azure’s default gateway and public IPs
-
Public Subnet
- Address:
10.0.0.0/24
- Assign Public IP: true
- Availability Zone:
${region}a
- Address:
-
NAT Gateway + Public IP
- NAT Gateway in the public subnet
- Public IP resource attached
-
Private Subnets (x2)
- Addresses:
10.0.1.0/24
,10.0.2.0/24
- AZs:
${region}a
and${region}b
- Addresses:
-
Route Tables
- Public: route
0.0.0.0/0
via internet - Private: route
0.0.0.0/0
via NAT Gateway
- Public: route
Managed identities and RBAC
-
AKS Cluster Managed Identity
-
Assign roles:
Contributor
or more scoped roleNetwork Contributor
-
-
Node Pool Managed Identity
-
Assign roles:
Monitoring Metrics Publisher
AcrPull
(if using ACR)Storage Blob Data Reader
-
-
Workload Identity Bindings (x3)
- Namespaces:
recommender
,etl-operator
,data-broker
- Use Azure AD Workload Identity Federation
- Assign
Storage Blob Data Contributor
to required containers
- Namespaces:
AKS Cluster
-
Control Plane
- Version:
1.31
or higher - API authorized IPs: optional
- Private cluster networking recommended
- Version:
-
Node Pool
- VM Size:
Standard_D16s_v5
- Disk Size: 100 GB
- Desired Size: 2 (min: 2, max: 5)
- SSH: Enabled via key pair
- SSH key exported in PEM format
- VM Size:
-
NSGs (Network Security Groups)
- Allow intra-cluster traffic (
10.0.0.0/16
) - Allow all egress
- Allow intra-cluster traffic (
Kubernetes Add-ons
Install via Helm or YAML:
-
Workload Identity Webhook
-
Metrics Server —
v0.7.2
-
Azure Disk CSI Driver
- Provisioner:
disk.csi.azure.com
- Provisioner:
Storage class
Secrets and ConfigMaps
After your infrastructure is set up, but before Unstructured can deploy the Unstructured UI and API into your insfrastructure, Unstructured will need to know the values of the following Secrets and ConfigMaps. These must be provided to Unstructured as a set of YAML files in Kubernetes Secret and ConfigMap format.
Capture these during setup
- DB host, username, password
- Container names
- SSH private key
- Auth secrets
The Secrets are as follows.
Blob storage credentials (Azure)
BLOB_STORAGE_ADAPTER_ACCOUNT_NAME
BLOB_STORAGE_ADAPTER_ACCOUNT_KEY
BLOB_STORAGE_ADAPTER_CONTAINER_REGION
(optional)
Database credentials
DB_USERNAME
DB_PASSWORD
DB_HOST
DB_NAME
DB_DATABASE
Authentication
JWT_SECRET_KEY
AUTH_STRATEGY
SESSION_SECRET
SHARED_SECRET
KEYCLOAK_CLIENT_SECRET
KEYCLOAK_ADMIN_SECRET
KEYCLOAK_ADMIN
KEYCLOAK_ADMIN_PASSWORD
API_BEARER_TOKEN
The ConfigMaps are as follows.
Blob storage settings
BLOB_STORAGE_ADAPTER_TYPE
:azure
BLOB_STORAGE_ADAPTER_BUCKET
ETL_BLOB_CACHE_BUCKET_NAME
ETL_API_BLOB_STORAGE_ADAPTER_BUCKET
ETL_API_BLOB_STORAGE_ADAPTER_TYPE
:azure
ETL_API_DB_REMOTE_BUCKET_NAME
ETL_API_JOB_STATUS_DEST_BUCKET_NAME
JOB_STATUS_BUCKET_NAME
JOB_DB_BUCKET_NAME
Environment
ENV
,ENVIRONMENT
JOB_ENV
,JOB_ENVIRONMENT
Observability and OpenTelementry (OTel)
JOB_OTEL_EXPORTER_OTLP_ENDPOINT
JOB_OTEL_METRICS_EXPORTER
JOB_OTEL_TRACES_EXPORTER
OTEL_EXPORTER_OTLP_ENDPOINT
OTEL_METRICS_EXPORTER
OTEL_TRACES_EXPORTER
Unstructured API and authentication
UNSTRUCTURED_API_URL
JWKS_URL
JWT_ISSUER
JWT_AUDIENCE
SINGLE_PLANE_DEPLOYMENT
Front end and dashboard
API_BASE_URL
API_CLIENT_BASE_URL
API_URL
APM_SERVICE_NAME
APM_SERVICE_NAME_CLIENT
AUTH_STRATEGY
FRONTEND_BASE_URL
KEYCLOAK_CALLBACK_URL
KEYCLOAK_CLIENT_ID
KEYCLOAK_DOMAIN
KEYCLOAK_REALM
KEYCLOAK_SSL_ENABLED
KEYCLOAK_TRUST_ISSUER
PUBLIC_BASE_URL
PUBLIC_RELEASE_CHANNEL
Redis
REDIS_DSN
Other
IMAGE_PULL_SECRETS
PRIVATE_KEY_SECRETS_ADAPTER_TYPE
:azure
PRIVATE_KEY_SECRETS_ADAPTER_AZURE_REGION
SECRETS_ADAPTER_TYPE
:azure
SECRETS_ADAPTER_AZURE_REGION
The preceding Secrets and ConfigMaps must be added to the following files:
File Name | Type | Resource name | Namespace | Data keys |
---|---|---|---|---|
data-broker-env-cm.yaml | ConfigMap | data-broker-env | api | JOB_STATUS_BUCKET_NAME , JOB_DB_BUCKET_NAME , BLOB_STORAGE_ADAPTER_TYPE |
data-broker-env-secret.yaml | Secret | data-broker-env | api | BLOB_STORAGE_ADAPTER_ACCOUNT_NAME , BLOB_STORAGE_ADAPTER_ACCOUNT_KEY , BLOB_STORAGE_ADAPTER_CONTAINER_REGION |
dataplane-api-env-cm.yaml | Secret | dataplane-api-env | api | DB_PASSWORD , DB_USERNAME , DB_HOST , DB_NAME |
etl-operator-env-cm.yaml | ConfigMap | etl-operator-env | etl-operator | BLOB_STORAGE_ADAPTER_BUCKET , JOB_STATUS_BUCKET_NAME , JOB_DB_BUCKET_NAME , BLOB_STORAGE_ADAPTER_TYPE , ENV , ENVIRONMENT , REDIS_DSN , ETL_API_BLOB_STORAGE_ADAPTER_BUCKET , ETL_API_BLOB_STORAGE_ADAPTER_TYPE , ETL_API_DB_REMOTE_BUCKET_NAME , ETL_API_JOB_STATUS_DEST_BUCKET_NAME (x2), ETL_BLOB_CACHE_BUCKET_NAME , IMAGE_PULL_SECRETS , JOB_ENV , JOB_ENVIRONMENT , JOB_OTEL_EXPORTER_OTLP_ENDPOINT , JOB_OTEL_METRICS_EXPORTER , JOB_OTEL_TRACES_EXPORTER , OTEL_EXPORTER_OTLP_ENDPOINT , OTEL_METRICS_EXPORTER , OTEL_TRACES_EXPORTER , UNSTRUCTURED_API_URL |
etl-operator-env-secret.yaml | Secret | etl-operator-env | etl-operator | BLOB_STORAGE_ADAPTER_ACCOUNT_NAME , BLOB_STORAGE_ADAPTER_ACCOUNT_KEY , BLOB_STORAGE_ADAPTER_CONTAINER_REGION |
frontend-env-cm.yaml | ConfigMap | frontend-env | www | API_BASE_URL , API_CLIENT_BASE_URL , API_URL , APM_SERVICE_NAME , APM_SERVICE_NAME_CLIENT , AUTH_STRATEGY , ENV , FRONTEND_BASE_URL , KEYCLOAK_CALLBACK_URL , KEYCLOAK_CLIENT_ID , KEYCLOAK_DOMAIN , KEYCLOAK_REALM , KEYCLOAK_SSL_ENABLED , KEYCLOAK_TRUST_ISSUER , PUBLIC_BASE_URL , PUBLIC_RELEASE_CHANNEL , SENTRY_DSN , SENTRY_SAMPLE_RATE , WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM , CUSTOM_WORKFLOW_FF_REQUEST_FORM |
frontend-env-secret.yaml | Secret | frontend-env | www | API_BEARER_TOKEN , KEYCLOAK_ADMIN_SECRET , KEYCLOAK_CLIENT_SECRET , SESSION_SECRET , SHARED_SECRET |
keycloak-secret.yaml | Secret | phasetwo-keycloak-env | www | KEYCLOAK_ADMIN , KEYCLOAK_ADMIN_PASSWORD |
platform-api-env-cm.yaml | ConfigMap | platform-api-env | api | JWKS_URL , JWT_ISSUER , JWT_AUDIENCE , SINGLE_PLANE_DEPLOYMENT |
platform-api-env-secret.yaml | Secret | platform-api-env | api | DB_PASSWORD , DB_USERNAME , DB_HOST , DB_NAME , DB_DATABASE , JWT_SECRET_KEY , AUTH_STRATEGY |
recommender-env-cm.yaml | ConfigMap | recommender-env | recommender | BLOB_STORAGE_ADAPTER_TYPE , ETL_BLOB_CACHE_BUCKET_NAME |
recommender-env-secret.yaml | Secret | recommender-env | recommender | BLOB_STORAGE_ADAPTER_ACCOUNT_NAME , BLOB_STORAGE_ADAPTER_ACCOUNT_KEY , BLOB_STORAGE_ADAPTER_CONTAINER_REGION |
secret-provider-api-env-cm.yaml | ConfigMap | secrets-provider-api-env | secrets | ENV , ENVIRONMENT , OTEL_EXPORTER_OTLP_ENDPOINT , OTEL_METRICS_EXPORTER , OTEL_TRACES_EXPORTER , PRIVATE_KEY_SECRETS_ADAPTER_AZURE_REGION , PRIVATE_KEY_SECRETS_ADAPTER_TYPE , SECRETS_ADAPTER_AZURE_REGION , SECRETS_ADAPTER_TYPE |
secret-provider-api-env-secret.yaml | Secret | secrets-provider-api-env | secrets | BLOB_STORAGE_ADAPTER_ACCOUNT_NAME , BLOB_STORAGE_ADAPTER_ACCOUNT_KEY , BLOB_STORAGE_ADAPTER_CONTAINER_REGION |
usage-collector-env-secret.yaml | Secret | usage-collector-env | api | DB_PASSWORD , DB_USERNAME , DB_HOST , DB_NAME , BLOB_STORAGE_ADAPTER_TYPE |
For example, for the data-broker-env-cm.yaml
ConfigMap file, the contents would look like this:
The data-broker-env-secret.yaml
Secret file would look like this:
Was this page helpful?