Linux Foundation AI & Data
Building end-to-end realtime lakehouse with transactional concurrent upsert, incremental pipeline and SQL for your BI & AI applications
StarMajor Features of LakeSoul
Centralized Metadata Service
LakeSoul uses PostgreSQL to store metadata, improving metadata's scalability and allowing higer concurrency with guaranteed consistency.
Concurrent Writes and ACID
Concurrency control through PG, together with auto conflict resolving, providing a high write concurrency ability.
Upsert and Incremental Read
LakeSoul supports concurrent upsert and incremental read from tables as changelog format
Real-time Lakehouse
LakeSoul helps building large scale real-time lakehouse in both SQL and Python with incremental compute pipeline, on your existing Hadoop or K8s cluster
BI & AI Enabler
Use SQL to analyze your data at scale, at ease. Native support for Python reader allows accessing tables from any data science and AI tools
Architectural Design to Unified Stream and Batch for Both Storage and Computing
Designed for Both BI and AI to Maximize the Value of Your Data
Real-time Data Ingestion
Data from various sources, including Kafka, Debezium and Flink CDC can be easily ingested into LakeSoul in real-time with high concurrency and throughput
Real-time Data Analytics
LakeSoul allows reading its table as incremental stream in both Spark and Flink to build real-time data transformation pipeline and do analytics in SQL
AI Applications
LakeSoul natively supports writing multiple streams into one table with primary key, and enables building real-time tabular dataset for AI applications