Azure Integration

Native Azure integration for importing Parquet files from AWS S3

This guide provides step-by-step instructions for integrating Azure with Haltian IoT to automatically retrieve Parquet files from AWS S3 and upload them to your Azure infrastructure.

Overview

The integration uses an Azure Function App to:

List and retrieve new Parquet files from the Haltian IoT S3 bucket
Authenticate using Microsoft Entra ID (Azure AD)
Upload files to your chosen destination

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#73F9C1', 'primaryTextColor': '#143633', 'primaryBorderColor': '#143633', 'lineColor': '#143633', 'secondaryColor': '#F6FAFA', 'tertiaryColor': '#ffffff', 'background': '#ffffff', 'mainBkg': '#73F9C1', 'secondBkg': '#F6FAFA' }}}%%
flowchart TB
    subgraph "Haltian IoT"
        S3["AWS S3 Bucket<br/><i>Parquet Files</i>"]
    end
    
    subgraph "Customer Azure"
        FUNC["Azure Function App<br/><i>Scheduled Transfer</i>"]
        
        subgraph "Destination Options"
            ONELAKE["Microsoft Fabric<br/><i>OneLake</i>"]
            STORAGE["Azure Storage Account<br/><i>Blob Storage</i>"]
        end
    end
    
    S3 -->|"List & Get Objects"| FUNC
    FUNC -->|"DFS API"| ONELAKE
    FUNC -->|"Blob Service"| STORAGE

Destination Options

Choose the destination that fits your analytics infrastructure:

Option	Best For	Features
Microsoft Fabric OneLake	Enterprise analytics, Power BI	Unified lakehouse, built-in analytics
Azure Storage Account	Custom pipelines, flexibility	Lower cost, broad compatibility

Prerequisites

Before starting, ensure you have:

Required

Terraform (≥ 1.5.0)
Azure CLI (az) installed and authenticated
Azure subscription with permissions to create resource groups
IAM credentials from Haltian for S3 bucket access

For OneLake Destination

Microsoft Fabric capacity (existing or new)
Global Administrator access for AAD permissions

For Storage Account Destination

Storage Account Contributor permissions

Optional (for development)

Python 3.10+
Azure Functions Core Tools

S3 Access Credentials

Haltian provides two options for S3 bucket access:

Method	Setup	Best For
Access Key + Secret	Haltian provides credentials	Quick setup
Bring-your-own IAM Role	You provide ARN to Haltian	Enterprise security policies

IAM Role Option

If using your own IAM role, you’ll need to modify the Python code in the Azure Function to use AWS IAM role authentication, and provide the role ARN to Haltian for bucket policy configuration.

Infrastructure Setup

Cost Notice

All costs for Azure resources (resource groups, storage accounts, Function Apps, Fabric capacity) are your responsibility. Review potential charges before proceeding.

Terraform Modules

The integration uses these Terraform modules:

Module	Purpose
`azure-function/terraform`	Azure Function App with dual upload modes
`infra/onelake`	Fabric Capacity, Workspace, Lakehouse, AAD app
`infra/storageaccount`	Storage Account with data containers

Deployment Steps

Option A: OneLake Destination

Deploy infrastructure
Navigate to the infra/onelake module and configure:
- Azure subscription and tenant
- Resource group naming
- Fabric capacity (create new or use existing)
```
cd infra/onelake
terraform init
terraform plan
terraform apply
```
Deploy Function App
Configure the azure-function/terraform module with:
- S3 credentials (from Haltian)
- Fabric/OneLake credentials
- Upload mode: onelake
```
cd azure-function/terraform
terraform init
terraform plan
terraform apply
```

Option B: Storage Account Destination

Deploy infrastructure

Navigate to the infra/storageaccount module:

cd infra/storageaccount
terraform init
terraform plan
terraform apply

Deploy Function App

Configure with upload mode: storage_account

cd azure-function/terraform
terraform init
terraform plan
terraform apply

Configuration Reference

Azure Function Variables

Variable	Description	Required
`s3_bucket_name`	Haltian S3 bucket name	✓
`s3_access_key`	AWS access key (from Haltian)	✓
`s3_secret_key`	AWS secret key (from Haltian)	✓
`s3_region`	S3 bucket region	✓
`upload_mode`	`onelake` or `storage_account`	✓
`organization_id`	Your Haltian organization UUID	✓

OneLake-Specific Variables

Variable	Description
`fabric_workspace_id`	Fabric workspace GUID
`fabric_lakehouse_id`	Lakehouse GUID
`aad_client_id`	Azure AD app client ID
`aad_client_secret`	Azure AD app secret
`aad_tenant_id`	Azure AD tenant ID

Storage Account Variables

Variable	Description
`storage_account_name`	Target storage account
`storage_container_name`	Container for Parquet files
`storage_connection_string`	Connection string

Data Flow

Once deployed, the Azure Function runs on a schedule (configurable, default: hourly):

List Objects - Queries S3 for new Parquet files since last run
Download - Retrieves each new file from S3
Authenticate - Uses MSAL client credentials for Azure auth
Upload - Transfers to OneLake (DFS API) or Storage Account (Blob API)
Track Progress - Records last processed timestamp

Verification

After deployment, verify the integration:

# Check Function App status
az functionapp show --name <function-app-name> --resource-group <rg-name>

# View recent executions
az monitor activity-log list --resource-group <rg-name> --offset 1h

# Check uploaded files (Storage Account)
az storage blob list --container-name <container> --account-name <account>

For OneLake, verify files in the Fabric workspace Lakehouse.

Cleanup

To remove all deployed resources:

cd azure-function/terraform
terraform destroy

cd ../infra/onelake  # or infra/storageaccount
terraform destroy

Next Steps

Once data is flowing to Azure:

OneLake: Create Power BI reports, run Spark notebooks, use SQL analytics
Storage Account: Configure Azure Data Factory, Synapse, or Databricks pipelines

Troubleshooting

Issue	Solution
Function not triggering	Check timer trigger configuration and Function App logs
S3 access denied	Verify IAM credentials with Haltian
OneLake upload fails	Check AAD app permissions and Fabric workspace access
Missing files	Verify organization ID matches your Haltian organization