Migration Guide: Old to New Parquet Exporter
This page describes the differences between the old and new Parquet exporter. It covers breaking changes that require updates to your data pipelines, and new capabilities you can take advantage of.
Breaking Changes
The following changes affect all existing pipelines that read from the Parquet S3 bucket. You must update your S3 paths and column references before switching to the new exporter output.
1. S3 Path Prefix Changed
The folder structure under the S3 bucket has changed. A schema version prefix (v1) has been inserted after parquet/.
| Path pattern | |
|---|---|
| Old | parquet/{organizationId}/{dataType}/{YYYY}/{MM}/{YYYY_MM_DD_HH}_{dataType}.parquet |
| New | parquet/v1/{organizationId}/{dataType}/{YYYY}/{MM}/{YYYY_MM_DD_HHMM}_{YYYY_MM_DD_HHMM}_{dataType}.parquet |
Notable differences:
v1/segment added afterparquet/- Filename now uses
HHMMprecision and includes both start and end timestamps instead of just the start hour
Action: Update all S3 path references, glob patterns, and Athena/Glue table locations in your pipelines.
2. Column Names and Types Changed
Timestamp columns — all entities except deviceKeywords and deviceNote:
| Old column | Type | New column | Type |
|---|---|---|---|
createdTs | timestamp[ms, tz=UTC] | createdAt | timestamp[ms, tz=UTC] |
updatedTs | timestamp[ms, tz=UTC] | updatedAt | timestamp[ms, tz=UTC] |
Measurement timestamp column — all measurement types:
| Old column | Type | New column | Type |
|---|---|---|---|
ts | timestamp[ms, tz=UTC] | measuredAt | timestamp[ms, tz=UTC] |
Entity-specific renames:
| Entity | Old column | Type | New column | Type |
|---|---|---|---|---|
device | model | string | modelType | string |
device | bluetoothMac | string | bleMac | string |
device | bluetoothMac2 | string | bleMac2 | string |
deviceGroup | model | string | modelType | string |
space | parentSpaceId | string | parentId | string |
measurementOccupancyStatus | status | int8 | isOccupied | bool |
Columns removed:
organizationId has been removed from all files (both entity and measurement types). To resolve an organization, join on id in organization.parquet.
| Entity | Removed column |
|---|---|
device | installationStatus |
deviceGroup | installationStatus |
Action: Update all column references in queries, schemas, and downstream transformations. See the Schema Reference for the full column definitions of every entity and measurement type.
What Changed
| Dimension | Old exporter | New exporter |
|---|---|---|
| Export config | Fixed per-deployment — requires a configuration change to add/remove exporters | Dynamic — configured via GraphQL API |
| Scheduling | Fixed: every hour | Configurable schedule; metadata and measurement exports run on independent intervals |
| Metadata vs. measurement interval | Single interval for all data types | Separate configurable intervals for metadata snapshots and measurements |
| Minimum interval | Fixed - 1 hour | 15 minutes |
| Maximum interval | Fixed - 1 hour | 24 hours |
| S3 path format | parquet/{org}/{type}/{YYYY}/{MM}/{YYYY_MM_DD_HH}_{type}.parquet | parquet/v1/{org}/{type}/{YYYY}/{MM}/{YYYY_MM_DD_HHMM}_{YYYY_MM_DD_HHMM}_{type}.parquet |
| Column naming | camelCase with Ts suffix for timestamps, ts for measurement time | camelCase with At suffix for timestamps, measuredAt for measurement time |
| Metadata deduplication | None | Metadata exports skip S3 write when data is unchanged; content_hash embedded in file footer. Measurement exports always write a file when there are new measurements. |
| Parquet file footer metadata | None | Row count, schema version, data type description, content description in all files; start_ts/end_ts in measurement files; snapshot_ts and content_hash in metadata files |
New Capabilities
Configurable Export Intervals
You can set how often metadata and measurements are exported, from every 15 minutes up to once per day. Changes take effect within a couple of minutes, so you can easily adjust the frequency based on your needs and data change patterns.
Automatic Catch-Up for Missed Windows
If the exporter is temporarily unavailable, it automatically detects and reruns missed measurement export windows when it comes back online. No manual intervention or data gap is created for short outages.
Metadata Deduplication
For metadata snapshots (devices, spaces, zones, etc.), the exporter computes a content hash of the query result. If the data is identical to the previous export, the S3 write is skipped. This reduces unnecessary S3 operations and storage costs when entity data has not changed. Note that in some cases, such as after a service restart, a file will be written regardless of content changes to ensure the latest state is captured. Every exported file includes the contentHash in its Parquet footer metadata, so downstream consumers can easily determine whether the data actually changed between two exports by comparing hashes.
Parquet File Footer Metadata
Every exported file now embeds structured metadata in the Parquet file footer, readable by any Parquet-compatible tool:
| Key | Description |
|---|---|
queryName | Data type name (e.g. measurementCO2) |
queryType | metadata or measurements |
organizationId | Organization UUID |
schemaVersion | Schema version label (e.g. v1) |
exportedAt | ISO 8601 UTC timestamp of export |
rowCount | Number of rows in the file |
intervalStart / intervalEnd | Measurement window boundaries (measurements only) |
snapshotAt | Snapshot time (metadata only) |
contentHash | SHA-256 of the data (metadata only) |
queryDescription | Human-readable description of the data type |
New Data Type: measurementSystemTemperature
The new exporter adds measurementSystemTemperature, which was not exported by the old exporter. It contains device internal (system) temperature measurements.
| Column | Type | Description |
|---|---|---|
deviceId | string | Device UUID |
measuredAt | timestamp[ms, tz=UTC] | Time of measurement |
systemTemperature | float | Device internal temperature in °C |