Migration Guide: Old to New Parquet Exporter

What changed between the old and new Parquet exporter, and what action is required

This page describes the differences between the old and new Parquet exporter. It covers breaking changes that require updates to your data pipelines, and new capabilities you can take advantage of.

Breaking Changes

Action required

The following changes affect all existing pipelines that read from the Parquet S3 bucket. You must update your S3 paths and column references before switching to the new exporter output.

1. S3 Path Prefix Changed

The folder structure under the S3 bucket has changed. A schema version prefix (v1) has been inserted after parquet/.

	Path pattern
Old	`parquet/{organizationId}/{dataType}/{YYYY}/{MM}/{YYYY_MM_DD_HH}_{dataType}.parquet`
New	`parquet/v1/{organizationId}/{dataType}/{YYYY}/{MM}/{YYYY_MM_DD_HHMM}_{YYYY_MM_DD_HHMM}_{dataType}.parquet`

Notable differences:

v1/ segment added after parquet/
Filename now uses HHMM precision and includes both start and end timestamps instead of just the start hour

Action: Update all S3 path references, glob patterns, and Athena/Glue table locations in your pipelines.

2. Column Names and Types Changed

Timestamp columns — all entities except deviceKeywords and deviceNote:

Old column	Type	New column	Type
`createdTs`	`timestamp[ms, tz=UTC]`	`createdAt`	`timestamp[ms, tz=UTC]`
`updatedTs`	`timestamp[ms, tz=UTC]`	`updatedAt`	`timestamp[ms, tz=UTC]`

Measurement timestamp column — all measurement types:

Old column	Type	New column	Type
`ts`	`timestamp[ms, tz=UTC]`	`measuredAt`	`timestamp[ms, tz=UTC]`

Entity-specific renames:

Entity	Old column	Type	New column	Type
`device`	`model`	`string`	`modelType`	`string`
`device`	`bluetoothMac`	`string`	`bleMac`	`string`
`device`	`bluetoothMac2`	`string`	`bleMac2`	`string`
`deviceGroup`	`model`	`string`	`modelType`	`string`
`space`	`parentSpaceId`	`string`	`parentId`	`string`
`measurementOccupancyStatus`	`status`	`int8`	`isOccupied`	`bool`

Columns removed:

organizationId has been removed from all files (both entity and measurement types). To resolve an organization, join on id in organization.parquet.

Entity	Removed column
`device`	`installationStatus`
`deviceGroup`	`installationStatus`

Action: Update all column references in queries, schemas, and downstream transformations. See the Schema Reference for the full column definitions of every entity and measurement type.

What Changed

Dimension	Old exporter	New exporter
Export config	Fixed per-deployment — requires a configuration change to add/remove exporters	Dynamic — configured via GraphQL API
Scheduling	Fixed: every hour	Configurable schedule; metadata and measurement exports run on independent intervals
Metadata vs. measurement interval	Single interval for all data types	Separate configurable intervals for metadata snapshots and measurements
Minimum interval	Fixed - 1 hour	15 minutes
Maximum interval	Fixed - 1 hour	24 hours
S3 path format	`parquet/{org}/{type}/{YYYY}/{MM}/{YYYY_MM_DD_HH}_{type}.parquet`	`parquet/v1/{org}/{type}/{YYYY}/{MM}/{YYYY_MM_DD_HHMM}_{YYYY_MM_DD_HHMM}_{type}.parquet`
Column naming	camelCase with `Ts` suffix for timestamps, `ts` for measurement time	camelCase with `At` suffix for timestamps, `measuredAt` for measurement time
Metadata deduplication	None	Metadata exports skip S3 write when data is unchanged; `content_hash` embedded in file footer. Measurement exports always write a file when there are new measurements.
Parquet file footer metadata	None	Row count, schema version, data type description, content description in all files; `start_ts`/`end_ts` in measurement files; `snapshot_ts` and `content_hash` in metadata files

New Capabilities

Configurable Export Intervals

You can set how often metadata and measurements are exported, from every 15 minutes up to once per day. Changes take effect within a couple of minutes, so you can easily adjust the frequency based on your needs and data change patterns.

Automatic Catch-Up for Missed Windows

If the exporter is temporarily unavailable, it automatically detects and reruns missed measurement export windows when it comes back online. No manual intervention or data gap is created for short outages.

Metadata Deduplication

For metadata snapshots (devices, spaces, zones, etc.), the exporter computes a content hash of the query result. If the data is identical to the previous export, the S3 write is skipped. This reduces unnecessary S3 operations and storage costs when entity data has not changed. Note that in some cases, such as after a service restart, a file will be written regardless of content changes to ensure the latest state is captured. Every exported file includes the contentHash in its Parquet footer metadata, so downstream consumers can easily determine whether the data actually changed between two exports by comparing hashes.

Parquet File Footer Metadata

Every exported file now embeds structured metadata in the Parquet file footer, readable by any Parquet-compatible tool:

Key	Description
`queryName`	Data type name (e.g. `measurementCO2`)
`queryType`	`metadata` or `measurements`
`organizationId`	Organization UUID
`schemaVersion`	Schema version label (e.g. `v1`)
`exportedAt`	ISO 8601 UTC timestamp of export
`rowCount`	Number of rows in the file
`intervalStart` / `intervalEnd`	Measurement window boundaries (measurements only)
`snapshotAt`	Snapshot time (metadata only)
`contentHash`	SHA-256 of the data (metadata only)
`queryDescription`	Human-readable description of the data type

New Data Type: `measurementSystemTemperature`

The new exporter adds measurementSystemTemperature, which was not exported by the old exporter. It contains device internal (system) temperature measurements.

Column	Type	Description
`deviceId`	`string`	Device UUID
`measuredAt`	`timestamp[ms, tz=UTC]`	Time of measurement
`systemTemperature`	`float`	Device internal temperature in °C