Downloading Parquet files with AWS CLI

Browse and download Haltian IoT Parquet files from S3 using the AWS CLI

What You Need from Haltian

Before you start, contact Haltian to receive the following credentials. All three values are provided by Haltian — you cannot look them up yourself.

ValueExampleDescription
Bucket namehaltian-parquet-exports-prodThe S3 bucket where your data is stored
Organisation ID29e95a47-c992-497a-b78e-072d70aa67a7Your organisation UUID
Access methodAccess Key, IAM Role, or SSOHow you authenticate (see below)

Access Methods

Haltian provides one of the following access methods depending on your setup:

MethodWhat Haltian ProvidesSetup
Static Access KeyAWS Access Key ID + Secret Access KeyConfigure with aws configure (see below)
IAM RoleBucket policy update for your role ARNIAM Role Access guide — access from your own AWS account
AWS SSOSSO start URL + role assignmentAWS SSO Setup guide — browser-based login via Microsoft Entra ID

Option A: Static Access Key

If Haltian provided you with an Access Key ID and Secret Access Key, configure them with:

aws configure --profile haltian-parquet

When prompted, enter:

PromptValue
AWS Access Key IDThe access key provided by Haltian
AWS Secret Access KeyThe secret key provided by Haltian
Default region nameeu-west-1
Default output formatjson

Then use --profile haltian-parquet in all commands below instead of --profile parquet-access-orgname.

Option B: IAM Role

If you have your own AWS account, you can create an IAM role and share its ARN with Haltian. See the IAM Role Access guide for setup instructions.

Option C: AWS SSO

Complete the AWS SSO Setup first — you need a working SSO profile and an active session before running the commands on this page.


1. Understand the S3 data layout

Legacy unversioned layout

s3://<BUCKET_NAME>/parquet/{organizationId}/<table>/<year>/<month>/<start>_<table>.parquet

Where <start> is a UTC timestamp at hour resolution (YYYY_mm_dd_HH). Example:

s3://<BUCKET_NAME>/parquet/{organizationId}/measurementOccupancyStatus/2026/02/2026_02_24_08_measurementOccupancyStatus.parquet

Versioned layout

s3://<BUCKET_NAME>/parquet/v1/{organizationId}/<table>/<year>/<month>/<start>_<end>_<table>.parquet

Where <start> and <end> are UTC timestamps at minute resolution (YYYY_MM_DD_HHMM). The version segment (v1) allows schema changes to be introduced in a new version while keeping the old one available during a transition period. Example:

s3://<BUCKET_NAME>/parquet/v1/{organizationId}/measurementOccupancyStatus/2026/02/2026_02_24_0800_2026_02_24_0900_measurementOccupancyStatus.parquet

Available tables

CategoryTable names
Spatialspace, zone
Devicesdevice, deviceGroup, deviceGroupDevices, deviceKeyword
OccupancymeasurementOccupancyStatus, measurementOccupantsCount, measurementOccupancySeconds
MovementmeasurementDirectionalMovement, measurementPosition, measurementPositionZone
EnvironmentalmeasurementAmbientTemperature, measurementCO2, measurementTVOC
Device healthmeasurementBatteryPercentage, measurementBatteryVoltage, measurementBootCount, measurementDistance
Otherorganization, deviceNote

2. Browse available data

Step 1: List all tables

This shows the table folders available — not individual files:

aws s3 ls "s3://${BUCKET}/parquet/${ORG_ID}/" \
  --profile ${PROFILE} \
  --region eu-west-1

Expected output — a list of table folders:

                           PRE device/
                           PRE deviceGroup/
                           PRE deviceGroupDevices/
                           PRE deviceKeyword/
                           PRE measurementAmbientTemperature/
                           PRE measurementBatteryPercentage/
                           PRE measurementBatteryVoltage/
                           PRE measurementBootCount/
                           PRE measurementCO2/
                           PRE measurementOccupancySeconds/
                           PRE measurementOccupancyStatus/
                           PRE measurementOccupantsCount/
                           PRE organization/
                           PRE space/
                           PRE zone/

PRE means “prefix” — these are folders, not files.

Step 2: List available months for a specific table

Pick a table from the list above and drill into it. Inside each table you will find year folders, and inside those, month folders:

aws s3 ls "s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/" \
  --profile ${PROFILE} \
  --region eu-west-1

Expected output:

                           PRE 2025/
                           PRE 2026/

Drill into a year to see months:

aws s3 ls "s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/2026/" \
  --profile ${PROFILE} \
  --region eu-west-1
                           PRE 01/
                           PRE 02/

Step 3: List individual files in a month

Inside each month folder you will find the actual Parquet files — typically one file per hour:

aws s3 ls "s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/" \
  --profile ${PROFILE} \
  --region eu-west-1

Expected output — one file per hour (default export frequency):

2026-02-01 01:15:00      12458 2026_02_01_00_measurementOccupancyStatus.parquet
2026-02-01 02:15:00      11830 2026_02_01_01_measurementOccupancyStatus.parquet
2026-02-01 03:15:00      13102 2026_02_01_02_measurementOccupancyStatus.parquet
...
2026-02-24 09:15:00      14567 2026_02_24_08_measurementOccupancyStatus.parquet

The file name format is YYYY_mm_dd_HH_<table>.parquet where HH is the hour in UTC.


3. Download data

Download everything (all tables, all dates)

aws s3 sync \
  "s3://${BUCKET}/parquet/${ORG_ID}/" \
  "data/parquet/${ORG_ID}/" \
  --profile ${PROFILE} \
  --region eu-west-1

Download a single table

aws s3 sync \
  "s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/" \
  "data/parquet/${ORG_ID}/measurementOccupancyStatus/" \
  --profile ${PROFILE} \
  --region eu-west-1

Download a specific month

aws s3 sync \
  "s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/" \
  "data/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/" \
  --profile ${PROFILE} \
  --region eu-west-1

Download a single file

Use aws s3 cp with the full path to the file (including year, month, and the exact filename you found in Step 3 above):

aws s3 cp \
  "s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/2026_02_24_08_measurementOccupancyStatus.parquet" \
  "data/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/" \
  --profile ${PROFILE} \
  --region eu-west-1

The full S3 path breaks down as:

s3://haltian-parquet-exports-prod/parquet/29e95a47-c992-497a-b78e-072d70aa67a7/measurementOccupancyStatus/2026/02/2026_02_24_08_measurementOccupancyStatus.parquet
     └── bucket ──────────────────┘        └── org ID ────────────────────────┘ └── table ──────────────┘ └──┘ └┘ └── filename ───────────────────────────────────────────┘
                                                                                                         year  month

Download only new/changed files (incremental sync)

aws s3 sync skips files that already exist locally with matching size, so re-running the same command only fetches new data:

aws s3 sync \
  "s3://${BUCKET}/parquet/${ORG_ID}/" \
  "data/parquet/${ORG_ID}/" \
  --profile ${PROFILE} \
  --region eu-west-1

Download multiple specific tables

for TABLE in space zone device measurementOccupancyStatus; do
  echo "Downloading ${TABLE}..."
  aws s3 sync \
    "s3://${BUCKET}/parquet/${ORG_ID}/${TABLE}/" \
    "data/parquet/${ORG_ID}/${TABLE}/" \
    --profile ${PROFILE} \
    --region eu-west-1
done

Troubleshooting

IssueSolution
Access DeniedVerify your credentials are correct. For SSO: re-run aws sso login. For access keys: check key/secret with aws configure list --profile ${PROFILE}
NoSuchBucketDouble-check the bucket name provided by Haltian
NoSuchKeyVerify the full file path — use aws s3 ls to browse and find the exact filename first
Empty listingYour org may not have data for the requested table/period yet. Try listing the root parquet/${ORG_ID}/ first
SSO session expiredRun aws sso login --profile ${PROFILE} or see AWS SSO Setup — Troubleshooting
Slow downloadsAdd --only-show-errors to suppress per-file output for faster syncs
Need to see what would be downloadedAdd --dryrun flag to preview without downloading
Windows: command not foundEnsure AWS CLI is in your PATH. Restart PowerShell after installation