Migration Guide: Old to New Parquet Exporter

What changed between the old and new Parquet exporter, and what action is required

This page describes the differences between the old and new Parquet exporter. It covers breaking changes that require updates to your data pipelines, and new capabilities you can take advantage of.

Breaking Changes

1. S3 Path Prefix Changed

The folder structure under the S3 bucket has changed. A schema version prefix (v1) has been inserted after parquet/.

Path pattern
Oldparquet/{organizationId}/{dataType}/{YYYY}/{MM}/{YYYY_MM_DD_HH}_{dataType}.parquet
Newparquet/v1/{organizationId}/{dataType}/{YYYY}/{MM}/{YYYY_MM_DD_HHMM}_{YYYY_MM_DD_HHMM}_{dataType}.parquet

Notable differences:

  • v1/ segment added after parquet/
  • Filename now uses HHMM precision and includes both start and end timestamps instead of just the start hour

Action: Update all S3 path references, glob patterns, and Athena/Glue table locations in your pipelines.

2. Column Names and Types Changed

Timestamp columns — all entities except deviceKeywords and deviceNote:

Old columnTypeNew columnType
createdTstimestamp[ms, tz=UTC]createdAttimestamp[ms, tz=UTC]
updatedTstimestamp[ms, tz=UTC]updatedAttimestamp[ms, tz=UTC]

Measurement timestamp column — all measurement types:

Old columnTypeNew columnType
tstimestamp[ms, tz=UTC]measuredAttimestamp[ms, tz=UTC]

Entity-specific renames:

EntityOld columnTypeNew columnType
devicemodelstringmodelTypestring
devicebluetoothMacstringbleMacstring
devicebluetoothMac2stringbleMac2string
deviceGroupmodelstringmodelTypestring
spaceparentSpaceIdstringparentIdstring
measurementOccupancyStatusstatusint8isOccupiedbool

Columns removed:

organizationId has been removed from all files (both entity and measurement types). To resolve an organization, join on id in organization.parquet.

EntityRemoved column
deviceinstallationStatus
deviceGroupinstallationStatus

Action: Update all column references in queries, schemas, and downstream transformations. See the Schema Reference for the full column definitions of every entity and measurement type.

What Changed

DimensionOld exporterNew exporter
Export configFixed per-deployment — requires a configuration change to add/remove exportersDynamic — configured via GraphQL API
SchedulingFixed: every hourConfigurable schedule; metadata and measurement exports run on independent intervals
Metadata vs. measurement intervalSingle interval for all data typesSeparate configurable intervals for metadata snapshots and measurements
Minimum intervalFixed - 1 hour15 minutes
Maximum intervalFixed - 1 hour24 hours
S3 path formatparquet/{org}/{type}/{YYYY}/{MM}/{YYYY_MM_DD_HH}_{type}.parquetparquet/v1/{org}/{type}/{YYYY}/{MM}/{YYYY_MM_DD_HHMM}_{YYYY_MM_DD_HHMM}_{type}.parquet
Column namingcamelCase with Ts suffix for timestamps, ts for measurement timecamelCase with At suffix for timestamps, measuredAt for measurement time
Metadata deduplicationNoneMetadata exports skip S3 write when data is unchanged; content_hash embedded in file footer. Measurement exports always write a file when there are new measurements.
Parquet file footer metadataNoneRow count, schema version, data type description, content description in all files; start_ts/end_ts in measurement files; snapshot_ts and content_hash in metadata files

New Capabilities

Configurable Export Intervals

You can set how often metadata and measurements are exported, from every 15 minutes up to once per day. Changes take effect within a couple of minutes, so you can easily adjust the frequency based on your needs and data change patterns.

Automatic Catch-Up for Missed Windows

If the exporter is temporarily unavailable, it automatically detects and reruns missed measurement export windows when it comes back online. No manual intervention or data gap is created for short outages.

Metadata Deduplication

For metadata snapshots (devices, spaces, zones, etc.), the exporter computes a content hash of the query result. If the data is identical to the previous export, the S3 write is skipped. This reduces unnecessary S3 operations and storage costs when entity data has not changed. Note that in some cases, such as after a service restart, a file will be written regardless of content changes to ensure the latest state is captured. Every exported file includes the contentHash in its Parquet footer metadata, so downstream consumers can easily determine whether the data actually changed between two exports by comparing hashes.

Every exported file now embeds structured metadata in the Parquet file footer, readable by any Parquet-compatible tool:

KeyDescription
queryNameData type name (e.g. measurementCO2)
queryTypemetadata or measurements
organizationIdOrganization UUID
schemaVersionSchema version label (e.g. v1)
exportedAtISO 8601 UTC timestamp of export
rowCountNumber of rows in the file
intervalStart / intervalEndMeasurement window boundaries (measurements only)
snapshotAtSnapshot time (metadata only)
contentHashSHA-256 of the data (metadata only)
queryDescriptionHuman-readable description of the data type

New Data Type: measurementSystemTemperature

The new exporter adds measurementSystemTemperature, which was not exported by the old exporter. It contains device internal (system) temperature measurements.

ColumnTypeDescription
deviceIdstringDevice UUID
measuredAttimestamp[ms, tz=UTC]Time of measurement
systemTemperaturefloatDevice internal temperature in °C