Skip to main content

Linux Foundation AI & Data

Building end-to-end realtime lakehouse with transactional concurrent upsert, incremental pipeline and SQL for your BI & AI applications


Major Features of LakeSoul

Centralized Metadata Service

LakeSoul uses PostgreSQL to store metadata, improving metadata's scalability and allowing higer concurrency with guaranteed consistency.

Concurrent Writes and ACID

Concurrency control through PG, together with auto conflict resolving, providing a high write concurrency ability.

Upsert and Incremental Read

LakeSoul supports concurrent upsert and incremental read from tables as changelog format

Real-time Lakehouse

LakeSoul helps building large scale real-time lakehouse in both SQL and Python with incremental compute pipeline, on your existing Hadoop or K8s cluster

BI & AI Enabler

Use SQL to analyze your data at scale, at ease. Native support for Python reader allows accessing tables from any data science and AI tools

Architectural Design to Unified Stream and Batch for Both Storage and Computing

Designed for Both BI and AI to Maximize the Value of Your Data

Real-time Data Ingestion

Data from various sources, including Kafka, Debezium and Flink CDC can be easily ingested into LakeSoul in real-time with high concurrency and throughput

Real-time Data Analytics

LakeSoul allows reading its table as incremental stream in both Spark and Flink to build real-time data transformation pipeline and do analytics in SQL

AI Applications

LakeSoul natively supports writing multiple streams into one table with primary key, and enables building real-time tabular dataset for AI applications