Data Engineering
17 items across the graph — tagged with Data Engineering.
From the graph · 17
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Workflow Engine for Kubernetes
🧙 Build, run, and manage data pipelines for integrating and transforming data.
The Open Source Feature Store for AI/ML
LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Maestro: Netflix’s Workflow Orchestrator
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and sca…
Python Streaming DataFrames for Kafka
Semantica 🧠 • Build AI systems that can explain, trace, and justify every decision. Knowledge graphs, context graphs, reasoning engines, provenance, and govern…
🧠Mindmap of 🗺️Software Architecture, Software engineering: An Overview of Software Terminologies and Concepts.
Zero-config entity resolution & record linkage. The zero-tuning Fellegi-Sunter path beats hand-tuned Splink head-to-head and scales from a CSV to a verified 100…
mloda.ai - Open Data Access for AI and ML. Plugin-based. Traceable. Framework-agnostic.
High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized NumPy for deterministic, scalable generation.
Workbench: An easy to use Python API for creating and deploying AWS SageMaker Models
Cloud + Data + AI + Security from zero to hero. 122+ certs across 22 providers, 37 plain-English concepts, 15 hands-on builds, cross-cloud + AI service comparis…
