We couldn’t be more excited to announce that the schedule for the Data Engineering Summit, co-located with ODSC East this April 23-24, is now live! We’ve got a formidable line-up of experts, thought leaders, and practitioners. Check below for only a taste of what’s in store for you. 

Experimentation Platform at DoorDash

Yixin Tang│Engineer Manager │DoorDash

The experimentation platform at DoorDash is an integral part, utilizing big data tools to help with 1000’s of choices day-after-day. Explore how DoorDash leverages the platform to make decisions in business strategies, machine learning models, optimization algorithms and infrastructure changes. 

Data Infrastructure through the Lens of Scale, Performance, and Usability

Elliott Cordo │Founder, Architect, Builder │Datafutures 

Despite its seeming advantages (saving time, more productivity), monoliths pose more challenges, especially as complexity increases and teams get larger. This session will review strategies and technologies for avoiding monoliths and their pitfalls. 

From Research to the Enterprise: Leveraging Foundation Models for Enhanced ETL, Analytics, and Deployment

Ines Chami │Co-founder and Chief Scientist │NUMBERS STATION AI

Join this session to explore recent research on applying foundation models to structured data and their applications in the trendy data stack from Stanford University and Numbers Station AI.  

Building Data Contracts with Open-Source Tools

Jean-Georges Perrin │AbeaData │CIO

In this session, you’ll discuss data contracts, starting with an introduction that covers:

  • What is a knowledge contract?
  • What’s its purpose?
  • Why it simplifies data engineers’ lives?

Then you’ll get hands-on and use open-source tools to generate a skeleton of a knowledge contract through which you’ll learn more about their life cycle. 

Why the Hype Around dbt is Justified

Dustin Dorsey │Sr. Cloud Data Architect │Onix

In just half-hour, you’ll learn what dbt really is, what makes it unique, and show you why it’s so rather more than simply SQL. You’ll discuss what makes it so popular (and unpopular) as a knowledge transformation tool and the driving aspects behind those opinions, dispelling some mistruths along the best way. 

Clean as You Go: Basic Hygiene within the Modern Data Stack

Eric Callahan │Principal, Data Solutions │Pickaxe Foundry 

Join this session for an summary of the challenges that arise from the “I’ll clean it up later” mindset. In particular

  • Piles of small cleanup tasks for later
  • Confusion amongst peers who try to make use of incomplete data assets
  • Lack of metadata to activate throughout the Modern Data Stack

And some solutions that may provide long-term advantages. 

Unlocking the Unstructured with Generative AI: Trends, Models, and Future Directions

Jay Mishra │Chief Operating Officer │Astera

Join this session to delve into the progressive applications of generative AI in natural language processing and computer vision, highlighting the technologies driving this evolution, including transformer architectures, attention mechanisms, and the combination of OCR for processing scanned documents. 

Deciphering Data Architectures 

James Serra │Data & AI Architect │Microsoft

Join us for a guided tour of information fabric, data lakehouse, and data mesh that can cover their different pros and cons. You may also examine common data architecture concepts, including data warehouses and data lakes helping you to find out probably the most appropriate data architecture on your needs. 

Designing ETL Pipelines with Delta Lake and Structured Streaming — How to Architect Things Right

Tathagata Das │Staff Software Engineer │Databricks

Structured Streaming has proven to be one of the best framework for constructing distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark’s built-in functions make it easy for developers to specific complex computations. Delta Lake, then again, is one of the best approach to store structured data since it is an open-source storage layer that brings ACID transactions to Apache Spark and massive data workloads. Together, these could make it very easy to construct pipelines in lots of common scenarios.

In a posh ecosystem of storage systems and workloads, it’s essential for a developer to know the issue that should be solved. Understanding the necessities of the issue means that you can architect your pipeline in order that it’s probably the most resource-efficient. Join this session to look at quite a lot of common streaming design patterns that may be utilized. 

Data Engineering within the Era of Gen AI

Ryan Boyd │Co-founder │MotherDuck

This talk explores the changes in hardware and mindsets enabling a brand new breed of software that’s optimized for the 95% of us who wouldn’t have petabytes to process each day. Instead of specializing in consensus algorithms for large-scale distributed compute, can our engineers as a substitute give attention to making data more accessible, more usable and reduce the time between “problem statement” and “answer?”  

The Value of A Semantic Layer for GenAI

Jeff Curran│Senior Data Scientist │AtScale

Krishna Srihasam│Senior Data Scientist │AtScale

In this session, you’ll learn the way you’ll be able to incorporate business terminology and logic into the logic of an LLM, enabling queries to the database using natural language (as a substitute of SQL). In this session, you’ll explore the consequence of coupling this LLM with  AtScale’s query engine through an LLM and Semantic Layer backed Chat Bot.

Unlock Safety & Savings: Mastering a Secure, Cost-Effective Cloud Data Lake

Ori Nakar│ Principal Engineer, Threat Research │Imperva

Johnathan Azaria │ Data Science TechLead │Imperva

Explore two novel techniques for data lake monitoring, leveraging each object store logs and query engine logs. Dive deep into our aggregation strategies and discover how anomaly detection may be applied to this consolidated data. You’ll see how enhanced access control mechanisms can fortify your data lake’s security, mitigating the chance of information leaks and data corruption. Additionally, we’ll make clear the best way to harness these insights to reduce the attack surface, discover and fix cost anomalies and system glitches.

Sign me up!

Get your pass to attend these sessions and more on the Data Engineering Summit this April. But you’d higher act fast. Prices go up soon.

This article was originally published at summit.ai