Why Enrich.sh

Why another data pipeline? Because the existing ones are built for a world that no longer exists.

Typical Modern Data Pipeline

┌──────────────────┐
│  Event Producers │
│──────────────────│
│ Frontend         │
│ Backend APIs     │
│ Mobile Apps      │
│ AI Inference     │
└─────────┬────────┘
          │
          ▼
┌──────────────────┐
│  Ingestion Layer │
│──────────────────│
│ Kafka            │
│ HTTP Collectors  │
│ Custom Loaders   │
│ Segment          │
│ Logstash         │
└─────────┬────────┘
          │
          ▼
┌──────────────────┐
│ Stream Processing│
│──────────────────│
│ Kafka Streams    │
│ Flink            │
│ Spark Streaming  │
│ Custom Python    │
└─────────┬────────┘
          │
          ▼
┌──────────────────┐
│ Validation /     │
│ Schema Control   │
│──────────────────│
│ JSON Schema      │
│ Data Contracts   │
│ Custom Checks    │
└─────────┬────────┘
          │
          ▼
┌──────────────────┐
│ Storage Layer    │
│──────────────────│
│ S3 / R2          │
│ Parquet          │
│ Delta Lake       │
└─────────┬────────┘
          │
          ▼
┌──────────────────┐
│ Warehouse / OLAP │
│──────────────────│
│ Snowflake        │
│ BigQuery         │
│ ClickHouse       │
└─────────┬────────┘
          │
          ▼
┌──────────────────┐
│ Analytics / ML   │
│──────────────────│
│ dbt              │
│ BI Tools         │
│ Training Jobs    │
└──────────────────┘

6–12 months of engineering. 3+ full-time engineers. $10K+/month in infra.

And every time a source system changes a field, your pipeline breaks at 2 AM.

With Enrich.sh

We collapse the ingestion, processing, validation, and storage layers into one:

┌──────────────────┐
│  Event Producers │
└─────────┬────────┘
          │
          ▼
┌─────────────────────────────────────┐
│            enrich.sh                │
│─────────────────────────────────────│
│ ✓ HTTP ingestion                    │
│ ✓ Schema definition                 │
│ ✓ Validation (flex/evolve/strict)   │
│ ✓ Dead letter queue                 │
│ ✓ Stream mapping                    │
│ ✓ Enrichment (UA, Geo, IP)          │
│ ✓ Partitioned Parquet to R2         │
└─────────┬───────────────────────────┘
          │
          ▼
┌──────────────────┐
│ Warehouse / OLAP │
└──────────────────┘

No Kafka. No Flink. No Airflow. No connectors. No sync jobs.

Proof Points

Metric	Value
Ingest latency	<50ms (p99, 300+ edge locations)
Throughput	5,000 events/sec per stream. Ask us for more.
Storage format	Parquet (Snappy compression, ~10x smaller than JSON)
Cold start	<50ms zero JVM warm-up
Warehouse support	ClickHouse, BigQuery, DuckDB, Snowflake, Spark
Protocol	HTTPS POST — works from anywhere

Who Uses This

Adtech & Data Companies

Track ad impressions, conversions, and attribution events across millions of daily requests. Schema evolve mode detects when ad networks change their callback formats.

bash

curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{
    "stream_id": "impressions",
    "data": [{
      "ad_id": "ad_9x7k",
      "campaign": "retarget_q1",
      "placement": "feed_top",
      "bid_price": 0.42,
      "ts": 1738776000
    }]
  }'

AI & ML Teams

Log inference results, model inputs/outputs, and training metrics. Replay historical data for model retraining.

bash

curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{
    "stream_id": "inferences",
    "data": [{
      "model_id": "gpt-4o-mini",
      "prompt_tokens": 1250,
      "completion_tokens": 340,
      "latency_ms": 892,
      "user_id": "user_abc",
      "ts": 1738776000
    }]
  }'

IoT & Sensor Data

Ingest telemetry from thousands of devices. Evolve mode auto-adapts when new device types send different fields.

bash

curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{
    "stream_id": "sensors",
    "data": [{
      "device_id": "temp_sensor_042",
      "reading": 23.7,
      "unit": "celsius",
      "battery": 0.89,
      "location": {"lat": 52.52, "lng": 13.405},
      "ts": 1738776000
    }]
  }'

Product Analytics & Logs

Track user behavior, feature usage, and application logs without Segment's pricing.

bash

curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{
    "stream_id": "product_events",
    "data": [{
      "event": "feature_activated",
      "feature": "dark_mode",
      "user_id": "user_789",
      "plan": "pro",
      "ts": 1738776000
    }]
  }'

vs. The Alternatives

	Enrich.sh	Segment	RudderStack	DIY (Kafka + Flink)
Setup time	5 minutes	1 hour	1 day	3–6 months
Monthly cost (10M events)	$49	$1,200+	$500+	$2,000+ infra
Infrastructure	Zero (serverless)	Managed	Self-host or cloud	Self-managed
Data format	Parquet (open)	Proprietary	JSON/Parquet	Your choice
Warehouse access	Direct S3 read	Sync connectors	Sync connectors	Custom ETL
Schema enforcement	Flex / Evolve / Strict	Basic	Basic	Manual
Dead Letter Queue	Built-in	❌	❌	Build it yourself
Vendor lock-in	None — files are Parquet on S3	High	Medium	Low

Pain Trigger → Feature Map

"We're dealing with..."	Enrich.sh solves it with
Running Kafka just for event logging	Direct HTTP ingest → Parquet. No brokers.
Pipeline breaks when sources change fields	`evolve` mode detects schema drift automatically
Paying $1K+/mo for Segment	Same functionality, 10x cheaper
Building custom S3 writers + Flink jobs	Built-in buffering, batching, and Parquet compression
No visibility into failed events	Dead Letter Queue — nothing is lost
Can't replay historical data	Stream Replay API — re-send any time range
Connecting warehouse to data	Dashboard → Connect — copy-paste SQL for any warehouse
GA4 sampling ruining analytics	Raw event data, no sampling, you own the data

How It Works

1. Send Events

POST JSON to /ingest. From any language, any platform, any edge.

2. We Enrich & Store

Events are buffered, enriched with geo/device/session data, compressed as Parquet, and flushed to your dedicated R2 bucket.

3. Query From Your Warehouse

Connect ClickHouse, BigQuery, DuckDB, or Snowflake directly to your bucket. No sync jobs. No connectors.

Get Started

bash

# 1. Create a stream
curl -X POST https://enrich.sh/streams \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{ "stream_id": "events", "schema_mode": "evolve" }'

# 2. Send events
curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{ "stream_id": "events", "data": [{ "event": "signup", "plan": "pro" }] }'

# 3. Query with DuckDB
# duckdb -c "SELECT * FROM read_parquet('s3://enrich-you/events/2026/**/*.parquet')"

Full quickstart guide →

Why Enrich.sh ​

Typical Modern Data Pipeline ​

With Enrich.sh ​

Proof Points ​

Who Uses This ​

Adtech & Data Companies ​

AI & ML Teams ​

IoT & Sensor Data ​

Product Analytics & Logs ​

vs. The Alternatives ​

Pain Trigger → Feature Map ​

How It Works ​

1. Send Events ​

2. We Enrich & Store ​

3. Query From Your Warehouse ​

Get Started ​

Why Enrich.sh

Typical Modern Data Pipeline

With Enrich.sh

Proof Points

Who Uses This

Adtech & Data Companies

AI & ML Teams

IoT & Sensor Data

Product Analytics & Logs

vs. The Alternatives

Pain Trigger → Feature Map

How It Works

1. Send Events

2. We Enrich & Store

3. Query From Your Warehouse

Get Started