Data API (Parquet)

Bulk data export in Apache Parquet format for analytics and data lakes

Overview

The Haltian IoT Data API provides data exports in Apache Parquet format — a columnar storage format optimized for analytics workloads.

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk.

Design Philosophy

The Data API is designed as intermediate data storage for customers to download and import into their own data infrastructure:

AspectApproach
PurposeDownload and import, not query in place
ControlCustomer-controlled processing in their preferred tools
StructureSimple folder hierarchy for easy understanding
Partitioning{year}/{month}/ format (not Hive-style year=YYYY/)

Supported Platforms

Data can be imported into any Parquet-compatible platform:

Export Schedule

SettingValue
Minimum frequencyDaily
Maximum frequencyHourly
FormatFull snapshots
DeliveryAWS S3 bucket

Data Categories

The Data API exports three categories of data:

Entity Data

Core business entities and their relationships:

EntityDescription
OrganizationOrganization metadata
DeviceIoT devices and sensors
Device GroupCollections of related devices
Device KeywordTags and labels for devices
SpaceLocations, buildings, floors
ZoneAreas within spaces

Measurement Data

Time-series sensor data from devices:

MeasurementDescription
Ambient TemperatureTemperature readings (°C)
Battery PercentageBattery level (0-100%)
Battery VoltageBattery voltage (V)
Boot CountDevice restart events
CO₂Carbon dioxide (ppm)
Directional MovementEntry/exit counts
DistanceDistance measurements
Occupancy SecondsOccupancy duration
Occupancy StatusOccupied/unoccupied flag
Occupants CountNumber of people
PositionDevice location
Position ZoneZone-relative position
TVOCVolatile organic compounds

Relationship Data

Junction tables linking entities:

TableDescription
Device Group DevicesDevice-to-group mappings

Understanding Event-Based Measurements

Key Principles

PrincipleDescription
Value PersistenceA measurement value is considered valid until the next event for that device. If a sensor reports status=1 (occupied) at 10:00, that status remains true until a new event changes it.
No Regular IntervalsEvents occur only when state changes, not at fixed time intervals. Gaps between events mean “no change” — not “missing data”.
Sparse StorageThis event-driven approach minimizes storage by only recording changes, making Parquet files compact and efficient.

Measurement Types & Behavior

MeasurementEvent TriggerInterpretation
measurementOccupancyStatusState change (occupied ↔ unoccupied)Binary value valid until next change
measurementOccupantsCountCount changeLast known count is current count
measurementAmbientTemperaturePeriodic or thresholdInterpolate between readings
measurementBatteryPercentageSignificant change (~1%)Last value is current state
measurementOccupancySecondsAggregated periodicallyAlready time-bucketed, use directly

Calculating Time-in-State

When analyzing event-based data, calculate duration from one event to the next:

import pandas as pd

# Load occupancy events
df = pd.read_parquet("measurementOccupancyStatus/2026/01/*.parquet")

# Sort by device and timestamp
df = df.sort_values(['deviceId', 'ts'])

# Calculate duration until next event (per device)
df['next_ts'] = df.groupby('deviceId')['ts'].shift(-1)
df['duration_ms'] = df['next_ts'] - df['ts']

# For last event in each device, assume 1 hour duration or until period end
df['duration_ms'] = df['duration_ms'].fillna(pd.Timedelta(hours=1))

# Calculate total occupied time per device
occupied_time = df[df['status'] == 1].groupby('deviceId')['duration_ms'].sum()

Best Practices

  1. Don’t assume missing data — No events between timestamps means the previous value is still valid
  2. Forward-fill for aggregations — When creating time-series charts, use forward-fill to carry values across time
  3. Consider period boundaries — The state at the start of a period may come from an earlier event
  4. Use time-weighted averages — For numeric values, weight by duration between events

Entity Relationships

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#F6FAFA', 'primaryTextColor': '#143633', 'primaryBorderColor': '#143633', 'lineColor': '#143633', 'secondaryColor': '#C7FDE6', 'tertiaryColor': '#73F9C1', 'clusterBkg': '#ffffff', 'clusterBorder': '#143633', 'edgeLabelBackground': '#ffffff'}}}%%
erDiagram
    Organization ||--o{ Device : manages
    Organization ||--o{ DeviceGroup : contains
    Organization ||--o{ Space : contains
    Organization ||--o{ Zone : contains

    Device ||--o{ DeviceKeyword : "tagged with"
    Device ||--o{ Measurements : generates

    DeviceGroup ||--o{ DeviceGroupDevices : contains
    Device ||--o{ DeviceGroupDevices : "member of"

    Space ||--o{ Space : "parent-child"
    Space ||--o{ Zone : contains

Try It Now

import pandas as pd

# Load device data directly from our GitHub repository
url = "https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet/device/2026/01/2026_01_15_08_device.parquet"
devices = pd.read_parquet(url)
print(devices.head())

Quick Start

1. Request Access

Contact Haltian to enable Parquet exports. You’ll receive:

  • AWS S3 bucket credentials or IAM role configuration
  • Your organization ID (UUID)
  • Access instructions

2. Connect to S3

Configure your data platform to access the S3 bucket using the provided credentials.

3. Import Data

Download and import Parquet files into your analytics platform. Files are organized by entity type and time period.

Next Steps

Documentation

Hands-On Examples


AWS SSO Setup

Configure AWS SSO to access the Haltian IoT Parquet S3 bucket

Folder Structure

S3 bucket organization and file naming conventions

Downloading Parquet files with AWS CLI

Browse and download Haltian IoT Parquet files from S3 using the AWS CLI

Schema Reference

Complete column definitions for all Parquet entities and measurements

Accessing Parquet Files via IAM Role

Set up cross-account IAM role access to download Haltian IoT Parquet files from S3

Integration Examples

Code samples for importing Parquet data into common analytics platforms

Sample Data

Explore real Haltian IoT data from our Oulu office - available for immediate experimentation

Web Data Access

Load and analyze Parquet files directly in the browser using JavaScript