Azure Integration

Native Azure integration for importing Parquet files from AWS S3

This guide provides step-by-step instructions for integrating Azure with Haltian IoT to automatically retrieve Parquet files from AWS S3 and upload them to your Azure infrastructure.

Overview

The integration uses an Azure Function App to:

  1. List and retrieve new Parquet files from the Haltian IoT S3 bucket
  2. Authenticate using Microsoft Entra ID (Azure AD)
  3. Upload files to your chosen destination
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#73F9C1', 'primaryTextColor': '#143633', 'primaryBorderColor': '#143633', 'lineColor': '#143633', 'secondaryColor': '#F6FAFA', 'tertiaryColor': '#ffffff', 'background': '#ffffff', 'mainBkg': '#73F9C1', 'secondBkg': '#F6FAFA' }}}%%
flowchart TB
    subgraph "Haltian IoT"
        S3["AWS S3 Bucket<br/><i>Parquet Files</i>"]
    end
    
    subgraph "Customer Azure"
        FUNC["Azure Function App<br/><i>Scheduled Transfer</i>"]
        
        subgraph "Destination Options"
            ONELAKE["Microsoft Fabric<br/><i>OneLake</i>"]
            STORAGE["Azure Storage Account<br/><i>Blob Storage</i>"]
        end
    end
    
    S3 -->|"List & Get Objects"| FUNC
    FUNC -->|"DFS API"| ONELAKE
    FUNC -->|"Blob Service"| STORAGE

Destination Options

Choose the destination that fits your analytics infrastructure:

OptionBest ForFeatures
Microsoft Fabric OneLakeEnterprise analytics, Power BIUnified lakehouse, built-in analytics
Azure Storage AccountCustom pipelines, flexibilityLower cost, broad compatibility

Prerequisites

Before starting, ensure you have:

Required

  • Terraform (≥ 1.5.0)
  • Azure CLI (az) installed and authenticated
  • Azure subscription with permissions to create resource groups
  • IAM credentials from Haltian for S3 bucket access

For OneLake Destination

  • Microsoft Fabric capacity (existing or new)
  • Global Administrator access for AAD permissions

For Storage Account Destination

  • Storage Account Contributor permissions

Optional (for development)

  • Python 3.10+
  • Azure Functions Core Tools

S3 Access Credentials

Haltian provides two options for S3 bucket access:

MethodSetupBest For
Access Key + SecretHaltian provides credentialsQuick setup
Bring-your-own IAM RoleYou provide ARN to HaltianEnterprise security policies

Infrastructure Setup

Terraform Modules

The integration uses these Terraform modules:

ModulePurpose
azure-function/terraformAzure Function App with dual upload modes
infra/onelakeFabric Capacity, Workspace, Lakehouse, AAD app
infra/storageaccountStorage Account with data containers

Deployment Steps

Option A: OneLake Destination

  1. Deploy infrastructure

    Navigate to the infra/onelake module and configure:

    • Azure subscription and tenant
    • Resource group naming
    • Fabric capacity (create new or use existing)
    cd infra/onelake
    terraform init
    terraform plan
    terraform apply
    
  2. Deploy Function App

    Configure the azure-function/terraform module with:

    • S3 credentials (from Haltian)
    • Fabric/OneLake credentials
    • Upload mode: onelake
    cd azure-function/terraform
    terraform init
    terraform plan
    terraform apply
    

Option B: Storage Account Destination

  1. Deploy infrastructure

    Navigate to the infra/storageaccount module:

    cd infra/storageaccount
    terraform init
    terraform plan
    terraform apply
    
  2. Deploy Function App

    Configure with upload mode: storage_account

    cd azure-function/terraform
    terraform init
    terraform plan
    terraform apply
    

Configuration Reference

Azure Function Variables

VariableDescriptionRequired
s3_bucket_nameHaltian S3 bucket name
s3_access_keyAWS access key (from Haltian)
s3_secret_keyAWS secret key (from Haltian)
s3_regionS3 bucket region
upload_modeonelake or storage_account
organization_idYour Haltian organization UUID

OneLake-Specific Variables

VariableDescription
fabric_workspace_idFabric workspace GUID
fabric_lakehouse_idLakehouse GUID
aad_client_idAzure AD app client ID
aad_client_secretAzure AD app secret
aad_tenant_idAzure AD tenant ID

Storage Account Variables

VariableDescription
storage_account_nameTarget storage account
storage_container_nameContainer for Parquet files
storage_connection_stringConnection string

Data Flow

Once deployed, the Azure Function runs on a schedule (configurable, default: hourly):

  1. List Objects - Queries S3 for new Parquet files since last run
  2. Download - Retrieves each new file from S3
  3. Authenticate - Uses MSAL client credentials for Azure auth
  4. Upload - Transfers to OneLake (DFS API) or Storage Account (Blob API)
  5. Track Progress - Records last processed timestamp

Verification

After deployment, verify the integration:

# Check Function App status
az functionapp show --name <function-app-name> --resource-group <rg-name>

# View recent executions
az monitor activity-log list --resource-group <rg-name> --offset 1h

# Check uploaded files (Storage Account)
az storage blob list --container-name <container> --account-name <account>

For OneLake, verify files in the Fabric workspace Lakehouse.

Cleanup

To remove all deployed resources:

cd azure-function/terraform
terraform destroy

cd ../infra/onelake  # or infra/storageaccount
terraform destroy

Next Steps

Once data is flowing to Azure:

  • OneLake: Create Power BI reports, run Spark notebooks, use SQL analytics
  • Storage Account: Configure Azure Data Factory, Synapse, or Databricks pipelines

Troubleshooting

IssueSolution
Function not triggeringCheck timer trigger configuration and Function App logs
S3 access deniedVerify IAM credentials with Haltian
OneLake upload failsCheck AAD app permissions and Fabric workspace access
Missing filesVerify organization ID matches your Haltian organization