AWS SSO Setup
Configure AWS SSO to access the Haltian IoT Parquet S3 bucket
This API is in alpha. It is under active development and features and schema may change significantly based on testing and feedback.
The Haltian IoT Data API provides data exports in Apache Parquet format — a columnar storage format optimized for analytics workloads.
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk.
The Data API is designed as intermediate data storage for customers to download and import into their own data infrastructure:
| Aspect | Approach |
|---|---|
| Purpose | Download and import, not query in place |
| Control | Customer-controlled processing in their preferred tools |
| Structure | Simple folder hierarchy for easy understanding |
| Partitioning | {year}/{month}/ format (not Hive-style year=YYYY/) |
Data can be imported into any Parquet-compatible platform:
| Setting | Value |
|---|---|
| Minimum frequency | Daily |
| Maximum frequency | Hourly |
| Format | Full snapshots |
| Delivery | AWS S3 bucket |
Parquet exports are not real-time. New files are delivered at minimum daily, or maximum hourly intervals.
The Data API exports three categories of data:
Core business entities and their relationships:
| Entity | Description |
|---|---|
| Organization | Organization metadata |
| Device | IoT devices and sensors |
| Device Group | Collections of related devices |
| Device Keyword | Tags and labels for devices |
| Space | Locations, buildings, floors |
| Zone | Areas within spaces |
Time-series sensor data from devices:
| Measurement | Description |
|---|---|
| Ambient Temperature | Temperature readings (°C) |
| Battery Percentage | Battery level (0-100%) |
| Battery Voltage | Battery voltage (V) |
| Boot Count | Device restart events |
| CO₂ | Carbon dioxide (ppm) |
| Directional Movement | Entry/exit counts |
| Distance | Distance measurements |
| Occupancy Seconds | Occupancy duration |
| Occupancy Status | Occupied/unoccupied flag |
| Occupants Count | Number of people |
| Position | Device location |
| Position Zone | Zone-relative position |
| TVOC | Volatile organic compounds |
Junction tables linking entities:
| Table | Description |
|---|---|
| Device Group Devices | Device-to-group mappings |
Parquet measurement data is event-based, not time-series sampled at regular intervals. A measurement value remains valid until the next event occurs.
| Principle | Description |
|---|---|
| Value Persistence | A measurement value is considered valid until the next event for that device. If a sensor reports status=1 (occupied) at 10:00, that status remains true until a new event changes it. |
| No Regular Intervals | Events occur only when state changes, not at fixed time intervals. Gaps between events mean “no change” — not “missing data”. |
| Sparse Storage | This event-driven approach minimizes storage by only recording changes, making Parquet files compact and efficient. |
| Measurement | Event Trigger | Interpretation |
|---|---|---|
measurementOccupancyStatus | State change (occupied ↔ unoccupied) | Binary value valid until next change |
measurementOccupantsCount | Count change | Last known count is current count |
measurementAmbientTemperature | Periodic or threshold | Interpolate between readings |
measurementBatteryPercentage | Significant change (~1%) | Last value is current state |
measurementOccupancySeconds | Aggregated periodically | Already time-bucketed, use directly |
When analyzing event-based data, calculate duration from one event to the next:
import pandas as pd
# Load occupancy events
df = pd.read_parquet("measurementOccupancyStatus/2026/01/*.parquet")
# Sort by device and timestamp
df = df.sort_values(['deviceId', 'ts'])
# Calculate duration until next event (per device)
df['next_ts'] = df.groupby('deviceId')['ts'].shift(-1)
df['duration_ms'] = df['next_ts'] - df['ts']
# For last event in each device, assume 1 hour duration or until period end
df['duration_ms'] = df['duration_ms'].fillna(pd.Timedelta(hours=1))
# Calculate total occupied time per device
occupied_time = df[df['status'] == 1].groupby('deviceId')['duration_ms'].sum()
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#F6FAFA', 'primaryTextColor': '#143633', 'primaryBorderColor': '#143633', 'lineColor': '#143633', 'secondaryColor': '#C7FDE6', 'tertiaryColor': '#73F9C1', 'clusterBkg': '#ffffff', 'clusterBorder': '#143633', 'edgeLabelBackground': '#ffffff'}}}%%
erDiagram
Organization ||--o{ Device : manages
Organization ||--o{ DeviceGroup : contains
Organization ||--o{ Space : contains
Organization ||--o{ Zone : contains
Device ||--o{ DeviceKeyword : "tagged with"
Device ||--o{ Measurements : generates
DeviceGroup ||--o{ DeviceGroupDevices : contains
Device ||--o{ DeviceGroupDevices : "member of"
Space ||--o{ Space : "parent-child"
Space ||--o{ Zone : containsExplore real Haltian IoT data immediately! We provide a complete sample dataset from our Oulu office that you can access directly from your browser or any Parquet-compatible tool.
import pandas as pd
# Load device data directly from our GitHub repository
url = "https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet/device/2026/01/2026_01_15_08_device.parquet"
devices = pd.read_parquet(url)
print(devices.head())
Contact Haltian to enable Parquet exports. You’ll receive:
Configure your data platform to access the S3 bucket using the provided credentials.
Download and import Parquet files into your analytics platform. Files are organized by entity type and time period.
Configure AWS SSO to access the Haltian IoT Parquet S3 bucket
S3 bucket organization and file naming conventions
Browse and download Haltian IoT Parquet files from S3 using the AWS CLI
Complete column definitions for all Parquet entities and measurements
Set up cross-account IAM role access to download Haltian IoT Parquet files from S3
Code samples for importing Parquet data into common analytics platforms
Explore real Haltian IoT data from our Oulu office - available for immediate experimentation
Load and analyze Parquet files directly in the browser using JavaScript