Skip to content

Why Enrich.sh

Why another data pipeline? Because the existing ones are built for a world that no longer exists.

Typical Modern Data Pipeline

┌──────────────────┐
│  Event Producers │
│──────────────────│
│ Frontend         │
│ Backend APIs     │
│ Mobile Apps      │
│ AI Inference     │
└─────────┬────────┘


┌──────────────────┐
│  Ingestion Layer │
│──────────────────│
│ Kafka            │
│ HTTP Collectors  │
│ Custom Loaders   │
│ Segment          │
│ Logstash         │
└─────────┬────────┘


┌──────────────────┐
│ Stream Processing│
│──────────────────│
│ Kafka Streams    │
│ Flink            │
│ Spark Streaming  │
│ Custom Python    │
└─────────┬────────┘


┌──────────────────┐
│ Validation /     │
│ Schema Control   │
│──────────────────│
│ JSON Schema      │
│ Data Contracts   │
│ Custom Checks    │
└─────────┬────────┘


┌──────────────────┐
│ Storage Layer    │
│──────────────────│
│ S3 / R2          │
│ Parquet          │
│ Delta Lake       │
└─────────┬────────┘


┌──────────────────┐
│ Warehouse / OLAP │
│──────────────────│
│ Snowflake        │
│ BigQuery         │
│ ClickHouse       │
└─────────┬────────┘


┌──────────────────┐
│ Analytics / ML   │
│──────────────────│
│ dbt              │
│ BI Tools         │
│ Training Jobs    │
└──────────────────┘

6–12 months of engineering. 3+ full-time engineers. $10K+/month in infra.

And every time a source system changes a field, your pipeline breaks at 2 AM.

With Enrich.sh

We collapse the ingestion, processing, validation, and storage layers into one:

┌──────────────────┐
│  Event Producers │
└─────────┬────────┘


┌─────────────────────────────────────┐
│            enrich.sh                │
│─────────────────────────────────────│
│ ✓ HTTP ingestion                    │
│ ✓ Schema definition                 │
│ ✓ Validation (flex/evolve/strict)   │
│ ✓ Dead letter queue                 │
│ ✓ Stream mapping                    │
│ ✓ Enrichment (UA, Geo, IP)          │
│ ✓ Partitioned Parquet to R2         │
└─────────┬───────────────────────────┘


┌──────────────────┐
│ Warehouse / OLAP │
└──────────────────┘

No Kafka. No Flink. No Airflow. No connectors. No sync jobs.


Proof Points

MetricValue
Ingest latency<50ms (p99, 300+ edge locations)
Throughput5,000 events/sec per stream. Ask us for more.
Storage formatParquet (Snappy compression, ~10x smaller than JSON)
Cold start<50ms zero JVM warm-up
Warehouse supportClickHouse, BigQuery, DuckDB, Snowflake, Spark
ProtocolHTTPS POST — works from anywhere

Who Uses This

Adtech & Data Companies

Track ad impressions, conversions, and attribution events across millions of daily requests. Schema evolve mode detects when ad networks change their callback formats.

bash
curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{
    "stream_id": "impressions",
    "data": [{
      "ad_id": "ad_9x7k",
      "campaign": "retarget_q1",
      "placement": "feed_top",
      "bid_price": 0.42,
      "ts": 1738776000
    }]
  }'

AI & ML Teams

Log inference results, model inputs/outputs, and training metrics. Replay historical data for model retraining.

bash
curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{
    "stream_id": "inferences",
    "data": [{
      "model_id": "gpt-4o-mini",
      "prompt_tokens": 1250,
      "completion_tokens": 340,
      "latency_ms": 892,
      "user_id": "user_abc",
      "ts": 1738776000
    }]
  }'

IoT & Sensor Data

Ingest telemetry from thousands of devices. Evolve mode auto-adapts when new device types send different fields.

bash
curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{
    "stream_id": "sensors",
    "data": [{
      "device_id": "temp_sensor_042",
      "reading": 23.7,
      "unit": "celsius",
      "battery": 0.89,
      "location": {"lat": 52.52, "lng": 13.405},
      "ts": 1738776000
    }]
  }'

Product Analytics & Logs

Track user behavior, feature usage, and application logs without Segment's pricing.

bash
curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{
    "stream_id": "product_events",
    "data": [{
      "event": "feature_activated",
      "feature": "dark_mode",
      "user_id": "user_789",
      "plan": "pro",
      "ts": 1738776000
    }]
  }'

vs. The Alternatives

Enrich.shSegmentRudderStackDIY (Kafka + Flink)
Setup time5 minutes1 hour1 day3–6 months
Monthly cost (10M events)$49$1,200+$500+$2,000+ infra
InfrastructureZero (serverless)ManagedSelf-host or cloudSelf-managed
Data formatParquet (open)ProprietaryJSON/ParquetYour choice
Warehouse accessDirect S3 readSync connectorsSync connectorsCustom ETL
Schema enforcementFlex / Evolve / StrictBasicBasicManual
Dead Letter QueueBuilt-inBuild it yourself
Vendor lock-inNone — files are Parquet on S3HighMediumLow

Pain Trigger → Feature Map

"We're dealing with..."Enrich.sh solves it with
Running Kafka just for event loggingDirect HTTP ingest → Parquet. No brokers.
Pipeline breaks when sources change fieldsevolve mode detects schema drift automatically
Paying $1K+/mo for SegmentSame functionality, 10x cheaper
Building custom S3 writers + Flink jobsBuilt-in buffering, batching, and Parquet compression
No visibility into failed eventsDead Letter Queue — nothing is lost
Can't replay historical dataStream Replay API — re-send any time range
Connecting warehouse to dataDashboard → Connect — copy-paste SQL for any warehouse
GA4 sampling ruining analyticsRaw event data, no sampling, you own the data

How It Works

1. Send Events

POST JSON to /ingest. From any language, any platform, any edge.

2. We Enrich & Store

Events are buffered, enriched with geo/device/session data, compressed as Parquet, and flushed to your dedicated R2 bucket.

3. Query From Your Warehouse

Connect ClickHouse, BigQuery, DuckDB, or Snowflake directly to your bucket. No sync jobs. No connectors.


Get Started

bash
# 1. Create a stream
curl -X POST https://enrich.sh/streams \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{ "stream_id": "events", "schema_mode": "evolve" }'

# 2. Send events
curl -X POST https://enrich.sh/ingest \
  -H "Authorization: Bearer sk_live_your_key" \
  -d '{ "stream_id": "events", "data": [{ "event": "signup", "plan": "pro" }] }'

# 3. Query with DuckDB
# duckdb -c "SELECT * FROM read_parquet('s3://enrich-you/events/2026/**/*.parquet')"

Full quickstart guide →

Serverless data ingestion for developers.