Data API (Parquet)

Bulk data export in Apache Parquet format for analytics and data lakes

Beta

This API is currently in beta. Schema changes may occur with advance notice.

Overview

The Haltian IoT Data API provides organization-wide data exports in Apache Parquet format-a columnar storage format optimized for analytics workloads.

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk.

Design Philosophy

The Data API is designed as intermediate data storage for customers to download and import into their own data infrastructure:

Aspect	Approach
Purpose	Download and import, not query in place
Control	Customer-controlled processing in their preferred tools
Structure	Simple folder hierarchy for easy understanding
Partitioning	`{year}/{month}/` format (not Hive-style `year=YYYY/`)

Supported Platforms

Data can be imported into any Parquet-compatible platform:

Export Schedule

Setting	Value
Minimum frequency	Daily
Maximum frequency	Hourly
Format	Organization-wide snapshots
Delivery	AWS S3 bucket

Note

Parquet exports are not real-time. New files are delivered at minimum daily, or maximum hourly intervals.

Data Categories

The Data API exports three categories of data:

Entity Data

Core business entities and their relationships:

Entity	Description
Organization	Organization metadata
Device	IoT devices and sensors
Device Group	Collections of related devices
Device Keyword	Tags and labels for devices
Space	Locations, buildings, floors
Zone	Areas within spaces

Measurement Data

Time-series sensor data from devices:

Measurement	Description
Ambient Temperature	Temperature readings (°C)
Battery Percentage	Battery level (0-100%)
Battery Voltage	Battery voltage (V)
Boot Count	Device restart events
CO₂	Carbon dioxide (ppm)
Directional Movement	Entry/exit counts
Distance	Distance measurements
Occupancy Seconds	Occupancy duration
Occupancy Status	Occupied/unoccupied flag
Occupants Count	Number of people
Position	Device location
Position Zone	Zone-relative position
TVOC	Volatile organic compounds

Relationship Data

Junction tables linking entities:

Table	Description
Device Group Devices	Device-to-group mappings

Entity Relationships

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#73F9C1', 'primaryTextColor': '#143633', 'primaryBorderColor': '#143633', 'lineColor': '#143633', 'secondaryColor': '#F6FAFA', 'tertiaryColor': '#ffffff', 'background': '#ffffff', 'mainBkg': '#73F9C1', 'secondBkg': '#F6FAFA' }}}%%
erDiagram
    Organization ||--o{ Device : manages
    Organization ||--o{ DeviceGroup : contains
    Organization ||--o{ Space : contains
    Organization ||--o{ Zone : contains
    
    Device ||--o{ DeviceKeyword : "tagged with"
    Device ||--o{ Measurements : generates
    
    DeviceGroup ||--o{ DeviceGroupDevices : contains
    Device ||--o{ DeviceGroupDevices : "member of"
    
    Space ||--o{ Space : "parent-child"
    Space ||--o{ Zone : contains

Quick Start

1. Request Access

Contact Haltian to enable Parquet exports for your organization. You’ll receive:

AWS S3 bucket credentials or IAM role configuration
Your organization ID (UUID)
Access instructions

2. Connect to S3

Configure your data platform to access the S3 bucket using the provided credentials.

3. Import Data

Download and import Parquet files into your analytics platform. Files are organized by entity type and time period.

Next Steps

Folder Structure - Understand the S3 organization
Schema Reference - Detailed column definitions
Integration Examples - Code samples for common platforms
Azure Integration - Native Azure Function for automated transfers