Data API (Parquet)

Bulk data export in Apache Parquet format for analytics and data lakes

Overview

The Haltian IoT Data API provides organization-wide data exports in Apache Parquet format-a columnar storage format optimized for analytics workloads.

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk.

Design Philosophy

The Data API is designed as intermediate data storage for customers to download and import into their own data infrastructure:

AspectApproach
PurposeDownload and import, not query in place
ControlCustomer-controlled processing in their preferred tools
StructureSimple folder hierarchy for easy understanding
Partitioning{year}/{month}/ format (not Hive-style year=YYYY/)

Supported Platforms

Data can be imported into any Parquet-compatible platform:

Export Schedule

SettingValue
Minimum frequencyDaily
Maximum frequencyHourly
FormatOrganization-wide snapshots
DeliveryAWS S3 bucket

Data Categories

The Data API exports three categories of data:

Entity Data

Core business entities and their relationships:

EntityDescription
OrganizationOrganization metadata
DeviceIoT devices and sensors
Device GroupCollections of related devices
Device KeywordTags and labels for devices
SpaceLocations, buildings, floors
ZoneAreas within spaces

Measurement Data

Time-series sensor data from devices:

MeasurementDescription
Ambient TemperatureTemperature readings (°C)
Battery PercentageBattery level (0-100%)
Battery VoltageBattery voltage (V)
Boot CountDevice restart events
CO₂Carbon dioxide (ppm)
Directional MovementEntry/exit counts
DistanceDistance measurements
Occupancy SecondsOccupancy duration
Occupancy StatusOccupied/unoccupied flag
Occupants CountNumber of people
PositionDevice location
Position ZoneZone-relative position
TVOCVolatile organic compounds

Relationship Data

Junction tables linking entities:

TableDescription
Device Group DevicesDevice-to-group mappings

Entity Relationships

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#73F9C1', 'primaryTextColor': '#143633', 'primaryBorderColor': '#143633', 'lineColor': '#143633', 'secondaryColor': '#F6FAFA', 'tertiaryColor': '#ffffff', 'background': '#ffffff', 'mainBkg': '#73F9C1', 'secondBkg': '#F6FAFA' }}}%%
erDiagram
    Organization ||--o{ Device : manages
    Organization ||--o{ DeviceGroup : contains
    Organization ||--o{ Space : contains
    Organization ||--o{ Zone : contains
    
    Device ||--o{ DeviceKeyword : "tagged with"
    Device ||--o{ Measurements : generates
    
    DeviceGroup ||--o{ DeviceGroupDevices : contains
    Device ||--o{ DeviceGroupDevices : "member of"
    
    Space ||--o{ Space : "parent-child"
    Space ||--o{ Zone : contains

Quick Start

1. Request Access

Contact Haltian to enable Parquet exports for your organization. You’ll receive:

  • AWS S3 bucket credentials or IAM role configuration
  • Your organization ID (UUID)
  • Access instructions

2. Connect to S3

Configure your data platform to access the S3 bucket using the provided credentials.

3. Import Data

Download and import Parquet files into your analytics platform. Files are organized by entity type and time period.

Next Steps


Folder Structure

S3 bucket organization and file naming conventions

Schema Reference

Complete column definitions for all Parquet entities and measurements

Integration Examples

Code samples for importing Parquet data into common analytics platforms

Azure Integration

Native Azure integration for importing Parquet files from AWS S3