Web Data Access
Modern browsers can read Parquet files directly using JavaScript libraries. This enables building interactive analytics dashboards without any backend infrastructure.
Browser Libraries
hyparquet (Recommended for Simple Use)
hyparquet is a pure JavaScript parquet reader that works everywhere:
- No WebAssembly or Workers - works in any browser context
- Simple API for reading parquet files
- Lightweight (~50KB)
- Best choice for embedding in documentation or simple apps
DuckDB-WASM (For Advanced SQL)
DuckDB-WASM provides full SQL capabilities but has hosting requirements:
- Full SQL support including JOINs, aggregations, window functions
- Requires proper CORS headers and often SharedArrayBuffer support
- ~8MB bundle size
- Best for standalone applications where you control server configuration
DuckDB-WASM requires specific server headers (COOP/COEP) for full functionality. It may not work in embedded documentation demos or restrictive hosting environments. For simple parquet reading, use hyparquet instead.
hyparquet Example (Recommended)
Setup
Include via CDN ESM import:
import { parquetRead } from 'https://cdn.jsdelivr.net/npm/hyparquet@1.1.3/+esm';
Or install via npm:
npm install hyparquet
Basic Usage
import { parquetRead } from 'hyparquet';
const BASE_URL = 'https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet';
// Fetch and parse a parquet file
async function loadParquet(url) {
const response = await fetch(url);
const buffer = await response.arrayBuffer();
let rows = [];
await parquetRead({
file: buffer,
onComplete: data => { rows = data; }
});
return rows;
}
// Load zone data
const zones = await loadParquet(`${BASE_URL}/zone/2026/01/2026_01_01_00_zone.parquet`);
console.log('Zones:', zones);
// Load occupancy measurements
const occupancy = await loadParquet(`${BASE_URL}/measurementOccupancyStatus/2026/01/2026_01_15_10_measurementOccupancyStatus.parquet`);
console.log('Occupancy readings:', occupancy.length);
Processing Data in JavaScript
Since hyparquet returns plain JavaScript arrays, use standard array methods for analysis:
// Filter to business hours
const businessHours = occupancy.filter(r => {
const hour = new Date(r.ts).getHours();
return hour >= 8 && hour <= 17;
});
// Calculate occupancy percentage
const totalReadings = businessHours.length;
const occupiedReadings = businessHours.filter(r => r.status === 1).length;
const occupancyPct = Math.round(100 * occupiedReadings / totalReadings);
console.log(`Occupancy: ${occupancyPct}%`);
// Group by hour
const byHour = {};
businessHours.forEach(r => {
const hour = new Date(r.ts).getHours();
if (!byHour[hour]) byHour[hour] = { total: 0, occupied: 0 };
byHour[hour].total++;
if (r.status === 1) byHour[hour].occupied++;
});
Object.entries(byHour).forEach(([hour, stats]) => {
const pct = Math.round(100 * stats.occupied / stats.total);
console.log(`${hour}:00 - ${pct}% occupied`);
});
DuckDB-WASM Example (Advanced)
Setup
Include DuckDB-WASM via CDN:
<script src="https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@1.28.0/dist/duckdb-browser.min.js"></script>
Or install via npm:
npm install @duckdb/duckdb-wasm
Basic Usage
import * as duckdb from '@duckdb/duckdb-wasm';
// Initialize DuckDB
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
const worker = new Worker(bundle.mainWorker);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
const conn = await db.connect();
// Query Parquet files directly from URL
const result = await conn.query(`
SELECT *
FROM 'https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet/device/2026/01/2026_01_15_08_device.parquet'
LIMIT 10
`);
console.log(result.toArray());
Loading Multiple Files
Use wildcards to query multiple Parquet files at once:
// Query all January 2026 occupancy data
const occupancyData = await conn.query(`
SELECT
DATE_TRUNC('hour', ts) as hour,
COUNT(*) as events,
SUM(status) as occupied_count
FROM 'https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet/measurementOccupancyStatus/2026/01/*.parquet'
GROUP BY DATE_TRUNC('hour', ts)
ORDER BY hour
`);
Joining Entity and Measurement Data
// Join zones with occupancy to get room names
const roomOccupancy = await conn.query(`
WITH zones AS (
SELECT DISTINCT id, name
FROM 'https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet/zone/2026/01/2026_01_15_08_zone.parquet'
),
devices AS (
SELECT DISTINCT id, name
FROM 'https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet/device/2026/01/2026_01_15_08_device.parquet'
),
occupancy AS (
SELECT
deviceId,
DATE_TRUNC('day', ts) as date,
COUNT(*) as total_readings,
SUM(status) as occupied_readings
FROM 'https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet/measurementOccupancyStatus/2026/01/*.parquet'
GROUP BY deviceId, DATE_TRUNC('day', ts)
)
SELECT
d.name as device_name,
o.date,
o.total_readings,
o.occupied_readings,
ROUND(100.0 * o.occupied_readings / o.total_readings, 1) as occupancy_pct
FROM occupancy o
JOIN devices d ON o.deviceId = d.id
ORDER BY d.name, o.date
`);
Complete HTML Example
<!DOCTYPE html>
<html>
<head>
<title>Haltian Parquet Demo</title>
<script src="https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@1.28.0/dist/duckdb-browser.min.js"></script>
<style>
body { font-family: system-ui, sans-serif; padding: 20px; }
table { border-collapse: collapse; width: 100%; margin-top: 20px; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background: #f4f4f4; }
.loading { color: #666; font-style: italic; }
</style>
</head>
<body>
<h1>Haltian IoT Data Explorer</h1>
<p class="loading" id="status">Loading DuckDB-WASM...</p>
<div id="results"></div>
<script type="module">
const BASE_URL = 'https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet';
async function initDuckDB() {
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
const worker = new Worker(bundle.mainWorker);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
return await db.connect();
}
function renderTable(result) {
const data = result.toArray();
if (data.length === 0) return '<p>No data found</p>';
const columns = Object.keys(data[0]);
let html = '<table><thead><tr>';
columns.forEach(col => html += `<th>${col}</th>`);
html += '</tr></thead><tbody>';
data.forEach(row => {
html += '<tr>';
columns.forEach(col => html += `<td>${row[col]}</td>`);
html += '</tr>';
});
html += '</tbody></table>';
return html;
}
async function main() {
const status = document.getElementById('status');
const results = document.getElementById('results');
try {
status.textContent = 'Initializing DuckDB-WASM...';
const conn = await initDuckDB();
status.textContent = 'Loading device data...';
const devices = await conn.query(`
SELECT id, name, model, installationStatus
FROM '${BASE_URL}/device/2026/01/2026_01_15_08_device.parquet'
WHERE model LIKE '%presence%' OR model LIKE '%pir%'
LIMIT 20
`);
status.textContent = 'Data loaded successfully!';
results.innerHTML = '<h2>Devices</h2>' + renderTable(devices);
// Load occupancy summary
status.textContent = 'Calculating occupancy statistics...';
const occupancy = await conn.query(`
SELECT
DAYNAME(ts) as day_of_week,
HOUR(ts) as hour,
COUNT(*) as readings,
SUM(status) as occupied,
ROUND(100.0 * SUM(status) / COUNT(*), 1) as occupancy_pct
FROM '${BASE_URL}/measurementOccupancyStatus/2026/01/2026_01_15*.parquet'
GROUP BY DAYNAME(ts), HOUR(ts)
ORDER BY hour
`);
results.innerHTML += '<h2>Occupancy by Hour (Jan 15, 2026)</h2>' + renderTable(occupancy);
status.textContent = 'Done!';
} catch (error) {
status.textContent = 'Error: ' + error.message;
console.error(error);
}
}
main();
</script>
</body>
</html>
Apache Arrow Example
For simpler use cases, Apache Arrow provides direct Parquet reading:
import { tableFromIPC } from 'apache-arrow';
import { parquetRead } from 'hyparquet';
// Fetch and parse a single Parquet file
const response = await fetch(
'https://raw.githubusercontent.com/haltian/dev-demos/main/haltiansalesdemo-parquet/zone/2026/01/2026_01_15_08_zone.parquet'
);
const buffer = await response.arrayBuffer();
const result = await parquetRead({
file: new Uint8Array(buffer),
onComplete: (data) => {
console.log('Zones:', data);
}
});
Performance Tips
1. Use Column Selection
Only request columns you need:
// Instead of SELECT *
const result = await conn.query(`
SELECT deviceId, ts, status
FROM '${url}'
`);
2. Use Date Filtering
Parquet files are partitioned by date - query specific time ranges:
// Good: specific date range
const result = await conn.query(`
SELECT * FROM '${BASE_URL}/measurementOccupancyStatus/2026/01/2026_01_15*.parquet'
WHERE ts >= '2026-01-15 08:00:00' AND ts < '2026-01-15 18:00:00'
`);
// Avoid: loading all months
// const result = await conn.query(`SELECT * FROM '${BASE_URL}/measurementOccupancyStatus/*/*/**.parquet'`);
3. Cache DuckDB Instance
Reuse the database connection across queries:
// Initialize once
const db = await initDuckDB();
const conn = await db.connect();
// Reuse for multiple queries
const result1 = await conn.query('...');
const result2 = await conn.query('...');
CORS Considerations
When fetching Parquet files from a different domain, ensure CORS headers are set. The sample data at developer.haltian.io includes proper CORS headers for browser access.
For your own S3 buckets, configure CORS:
{
"CORSRules": [{
"AllowedOrigins": ["*"],
"AllowedMethods": ["GET", "HEAD"],
"AllowedHeaders": ["*"],
"ExposeHeaders": ["Content-Length", "Content-Type"]
}]
}
Next Steps
- Occupancy Dashboard Demo - See a complete browser-based dashboard
- Sample Data - Dataset overview and structure
- Integration Examples - Python, DuckDB, and platform-specific code