We couldn’t be more excited to announce that the schedule for the Data Engineering Summit, co-located with ODSC East this April 23-24, is now live! We’ve got a formidable line-up of experts, thought leaders, and practitioners. Check below for only a taste of what’s in store for you.
Experimentation Platform at DoorDash
Yixin Tang│Engineer Manager │DoorDash
The experimentation platform at DoorDash is an integral part, utilizing big data tools to help with 1000’s of choices day-after-day. Explore how DoorDash leverages the platform to make decisions in business strategies, machine learning models, optimization algorithms and infrastructure changes.
Data Infrastructure through the Lens of Scale, Performance, and Usability
Elliott Cordo │Founder, Architect, Builder │Datafutures
Despite its seeming advantages (saving time, more productivity), monoliths pose more challenges, especially as complexity increases and teams get larger. This session will review strategies and technologies for avoiding monoliths and their pitfalls.
From Research to the Enterprise: Leveraging Foundation Models for Enhanced ETL, Analytics, and Deployment
Ines Chami │Co-founder and Chief Scientist │NUMBERS STATION AI
Join this session to explore recent research on applying foundation models to structured data and their applications in the trendy data stack from Stanford University and Numbers Station AI.
Building Data Contracts with Open-Source Tools
Jean-Georges Perrin │AbeaData │CIO
In this session, you’ll discuss data contracts, starting with an introduction that covers:
- What is a knowledge contract?
- What’s its purpose?
- Why it simplifies data engineers’ lives?
Then you’ll get hands-on and use open-source tools to generate a skeleton of a knowledge contract through which you’ll learn more about their life cycle.
Why the Hype Around dbt is Justified
Dustin Dorsey │Sr. Cloud Data Architect │Onix
In just half-hour, you’ll learn what dbt really is, what makes it unique, and show you why it’s so rather more than simply SQL. You’ll discuss what makes it so popular (and unpopular) as a knowledge transformation tool and the driving aspects behind those opinions, dispelling some mistruths along the best way.
Clean as You Go: Basic Hygiene within the Modern Data Stack
Eric Callahan │Principal, Data Solutions │Pickaxe Foundry
Join this session for an summary of the challenges that arise from the “I’ll clean it up later” mindset. In particular
- Piles of small cleanup tasks for later
- Confusion amongst peers who try to make use of incomplete data assets
- Lack of metadata to activate throughout the Modern Data Stack
And some solutions that may provide long-term advantages.
Unlocking the Unstructured with Generative AI: Trends, Models, and Future Directions
Jay Mishra │Chief Operating Officer │Astera
Join this session to delve into the progressive applications of generative AI in natural language processing and computer vision, highlighting the technologies driving this evolution, including transformer architectures, attention mechanisms, and the combination of OCR for processing scanned documents.
Deciphering Data Architectures
James Serra │Data & AI Architect │Microsoft
Join us for a guided tour of information fabric, data lakehouse, and data mesh that can cover their different pros and cons. You may also examine common data architecture concepts, including data warehouses and data lakes helping you to find out probably the most appropriate data architecture on your needs.
Designing ETL Pipelines with Delta Lake and Structured Streaming — How to Architect Things Right
Tathagata Das │Staff Software Engineer │Databricks
Structured Streaming has proven to be one of the best framework for constructing distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark’s built-in functions make it easy for developers to specific complex computations. Delta Lake, then again, is one of the best approach to store structured data since it is an open-source storage layer that brings ACID transactions to Apache Spark and massive data workloads. Together, these could make it very easy to construct pipelines in lots of common scenarios.
In a posh ecosystem of storage systems and workloads, it’s essential for a developer to know the issue that should be solved. Understanding the necessities of the issue means that you can architect your pipeline in order that it’s probably the most resource-efficient. Join this session to look at quite a lot of common streaming design patterns that may be utilized.
Data Engineering within the Era of Gen AI
Ryan Boyd │Co-founder │MotherDuck
This talk explores the changes in hardware and mindsets enabling a brand new breed of software that’s optimized for the 95% of us who wouldn’t have petabytes to process each day. Instead of specializing in consensus algorithms for large-scale distributed compute, can our engineers as a substitute give attention to making data more accessible, more usable and reduce the time between “problem statement” and “answer?”
The Value of A Semantic Layer for GenAI
Jeff Curran│Senior Data Scientist │AtScale
Krishna Srihasam│Senior Data Scientist │AtScale
In this session, you’ll learn the way you’ll be able to incorporate business terminology and logic into the logic of an LLM, enabling queries to the database using natural language (as a substitute of SQL). In this session, you’ll explore the consequence of coupling this LLM with AtScale’s query engine through an LLM and Semantic Layer backed Chat Bot.
Unlock Safety & Savings: Mastering a Secure, Cost-Effective Cloud Data Lake
Ori Nakar│ Principal Engineer, Threat Research │Imperva
Johnathan Azaria │ Data Science TechLead │Imperva
Explore two novel techniques for data lake monitoring, leveraging each object store logs and query engine logs. Dive deep into our aggregation strategies and discover how anomaly detection may be applied to this consolidated data. You’ll see how enhanced access control mechanisms can fortify your data lake’s security, mitigating the chance of information leaks and data corruption. Additionally, we’ll make clear the best way to harness these insights to reduce the attack surface, discover and fix cost anomalies and system glitches.
Sign me up!
Get your pass to attend these sessions and more on the Data Engineering Summit this April. But you’d higher act fast. Prices go up soon.
This article was originally published at summit.ai