Downloading Parquet files with AWS CLI
This page is in alpha status. The content may change without notice.
The AWS CLI works identically on all platforms. Install from the AWS CLI install guide.
On Windows, use PowerShell or Command Prompt — the aws commands are the same.
What You Need from Haltian
Before you start, contact Haltian to receive the following credentials. All three values are provided by Haltian — you cannot look them up yourself.
| Value | Example | Description |
|---|---|---|
| Bucket name | haltian-parquet-exports-prod | The S3 bucket where your data is stored |
| Organisation ID | 29e95a47-c992-497a-b78e-072d70aa67a7 | Your organisation UUID |
| Access method | Access Key, IAM Role, or SSO | How you authenticate (see below) |
Access Methods
Haltian provides one of the following access methods depending on your setup:
| Method | What Haltian Provides | Setup |
|---|---|---|
| Static Access Key | AWS Access Key ID + Secret Access Key | Configure with aws configure (see below) |
| IAM Role | Bucket policy update for your role ARN | IAM Role Access guide — access from your own AWS account |
| AWS SSO | SSO start URL + role assignment | AWS SSO Setup guide — browser-based login via Microsoft Entra ID |
Option A: Static Access Key
If Haltian provided you with an Access Key ID and Secret Access Key, configure them with:
aws configure --profile haltian-parquet
When prompted, enter:
| Prompt | Value |
|---|---|
| AWS Access Key ID | The access key provided by Haltian |
| AWS Secret Access Key | The secret key provided by Haltian |
| Default region name | eu-west-1 |
| Default output format | json |
Then use --profile haltian-parquet in all commands below instead of --profile parquet-access-orgname.
Option B: IAM Role
If you have your own AWS account, you can create an IAM role and share its ARN with Haltian. See the IAM Role Access guide for setup instructions.
Option C: AWS SSO
Complete the AWS SSO Setup first — you need a working SSO profile and an active session before running the commands on this page.
1. Understand the S3 data layout
Legacy unversioned layout
This layout is being phased out in favor of the versioned layout below.
s3://<BUCKET_NAME>/parquet/{organizationId}/<table>/<year>/<month>/<start>_<table>.parquet
Where <start> is a UTC timestamp at hour resolution (YYYY_mm_dd_HH). Example:
s3://<BUCKET_NAME>/parquet/{organizationId}/measurementOccupancyStatus/2026/02/2026_02_24_08_measurementOccupancyStatus.parquet
Versioned layout
s3://<BUCKET_NAME>/parquet/v1/{organizationId}/<table>/<year>/<month>/<start>_<end>_<table>.parquet
Where <start> and <end> are UTC timestamps at minute resolution (YYYY_MM_DD_HHMM). The version segment (v1) allows schema changes to be introduced in a new version while keeping the old one available during a transition period. Example:
s3://<BUCKET_NAME>/parquet/v1/{organizationId}/measurementOccupancyStatus/2026/02/2026_02_24_0800_2026_02_24_0900_measurementOccupancyStatus.parquet
Available tables
| Category | Table names |
|---|---|
| Spatial | space, zone |
| Devices | device, deviceGroup, deviceGroupDevices, deviceKeyword |
| Occupancy | measurementOccupancyStatus, measurementOccupantsCount, measurementOccupancySeconds |
| Movement | measurementDirectionalMovement, measurementPosition, measurementPositionZone |
| Environmental | measurementAmbientTemperature, measurementCO2, measurementTVOC |
| Device health | measurementBatteryPercentage, measurementBatteryVoltage, measurementBootCount, measurementDistance |
| Other | organization, deviceNote |
2. Browse available data
Set these variables once at the start of your session so you can copy-paste the commands below directly:
BUCKET="haltian-parquet-exports-prod" # Replace with your bucket name
ORG_ID="29e95a47-c992-497a-b78e-072d70aa67a7" # Replace with your org ID
PROFILE="parquet-access-orgname" # Your AWS CLI profile name
Step 1: List all tables
This shows the table folders available — not individual files:
aws s3 ls "s3://${BUCKET}/parquet/${ORG_ID}/" \
--profile ${PROFILE} \
--region eu-west-1
Expected output — a list of table folders:
PRE device/
PRE deviceGroup/
PRE deviceGroupDevices/
PRE deviceKeyword/
PRE measurementAmbientTemperature/
PRE measurementBatteryPercentage/
PRE measurementBatteryVoltage/
PRE measurementBootCount/
PRE measurementCO2/
PRE measurementOccupancySeconds/
PRE measurementOccupancyStatus/
PRE measurementOccupantsCount/
PRE organization/
PRE space/
PRE zone/
PRE means “prefix” — these are folders, not files.
Step 2: List available months for a specific table
Pick a table from the list above and drill into it. Inside each table you will find year folders, and inside those, month folders:
aws s3 ls "s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/" \
--profile ${PROFILE} \
--region eu-west-1
Expected output:
PRE 2025/
PRE 2026/
Drill into a year to see months:
aws s3 ls "s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/2026/" \
--profile ${PROFILE} \
--region eu-west-1
PRE 01/
PRE 02/
Step 3: List individual files in a month
Inside each month folder you will find the actual Parquet files — typically one file per hour:
aws s3 ls "s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/" \
--profile ${PROFILE} \
--region eu-west-1
Expected output — one file per hour (default export frequency):
2026-02-01 01:15:00 12458 2026_02_01_00_measurementOccupancyStatus.parquet
2026-02-01 02:15:00 11830 2026_02_01_01_measurementOccupancyStatus.parquet
2026-02-01 03:15:00 13102 2026_02_01_02_measurementOccupancyStatus.parquet
...
2026-02-24 09:15:00 14567 2026_02_24_08_measurementOccupancyStatus.parquet
The file name format is YYYY_mm_dd_HH_<table>.parquet where HH is the hour in UTC.
s3://<BUCKET>/parquet/{organizationId}/
└── measurementOccupancyStatus/ ← table
└── 2026/ ← year
└── 02/ ← month
├── 2026_02_01_00_measurementOccupancyStatus.parquet
├── 2026_02_01_01_measurementOccupancyStatus.parquet
├── ... ← one file per hour
└── 2026_02_24_08_measurementOccupancyStatus.parquet
3. Download data
Download everything (all tables, all dates)
aws s3 sync \
"s3://${BUCKET}/parquet/${ORG_ID}/" \
"data/parquet/${ORG_ID}/" \
--profile ${PROFILE} \
--region eu-west-1
This downloads all tables and all history. Depending on how long exports have been running, this could be a large amount of data. Consider starting with a single table or month instead.
Download a single table
aws s3 sync \
"s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/" \
"data/parquet/${ORG_ID}/measurementOccupancyStatus/" \
--profile ${PROFILE} \
--region eu-west-1
Download a specific month
aws s3 sync \
"s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/" \
"data/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/" \
--profile ${PROFILE} \
--region eu-west-1
Download a single file
Use aws s3 cp with the full path to the file (including year, month, and the exact filename you found in Step 3 above):
aws s3 cp \
"s3://${BUCKET}/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/2026_02_24_08_measurementOccupancyStatus.parquet" \
"data/parquet/${ORG_ID}/measurementOccupancyStatus/2026/02/" \
--profile ${PROFILE} \
--region eu-west-1
The full S3 path breaks down as:
s3://haltian-parquet-exports-prod/parquet/29e95a47-c992-497a-b78e-072d70aa67a7/measurementOccupancyStatus/2026/02/2026_02_24_08_measurementOccupancyStatus.parquet
└── bucket ──────────────────┘ └── org ID ────────────────────────┘ └── table ──────────────┘ └──┘ └┘ └── filename ───────────────────────────────────────────┘
year month
Download only new/changed files (incremental sync)
aws s3 sync skips files that already exist locally with matching size, so re-running the same command only fetches new data:
aws s3 sync \
"s3://${BUCKET}/parquet/${ORG_ID}/" \
"data/parquet/${ORG_ID}/" \
--profile ${PROFILE} \
--region eu-west-1
Download multiple specific tables
for TABLE in space zone device measurementOccupancyStatus; do
echo "Downloading ${TABLE}..."
aws s3 sync \
"s3://${BUCKET}/parquet/${ORG_ID}/${TABLE}/" \
"data/parquet/${ORG_ID}/${TABLE}/" \
--profile ${PROFILE} \
--region eu-west-1
done
Troubleshooting
| Issue | Solution |
|---|---|
Access Denied | Verify your credentials are correct. For SSO: re-run aws sso login. For access keys: check key/secret with aws configure list --profile ${PROFILE} |
NoSuchBucket | Double-check the bucket name provided by Haltian |
NoSuchKey | Verify the full file path — use aws s3 ls to browse and find the exact filename first |
| Empty listing | Your org may not have data for the requested table/period yet. Try listing the root parquet/${ORG_ID}/ first |
| SSO session expired | Run aws sso login --profile ${PROFILE} or see AWS SSO Setup — Troubleshooting |
| Slow downloads | Add --only-show-errors to suppress per-file output for faster syncs |
| Need to see what would be downloaded | Add --dryrun flag to preview without downloading |
Windows: command not found | Ensure AWS CLI is in your PATH. Restart PowerShell after installation |