10 Important Topics Featured on the 2024 Data Engineering Summit

Conferences aren’t just talking heads in front of podiums at venues; they’re representative of the trends, topics, and problems which can be relevant to on a regular basis life. At the Data Engineering Summit, co-located alongside ODSC East from April twenty third to twenty fourth, we’ll be examining several essential topics that may help guide your data engineering team to success. As such, listed here are ten essential topics that will likely be covered on the Data Engineering Summit this April.

Is Gen AI A Data Engineering or Software Engineering Problem?

Generative AI isn’t solely an information engineering or software engineering problem, but quite a collaborative effort requiring each. Data engineers prepare the training data, while software engineers design and construct the models, making generative AI a two-pronged approach. Teams may have to make a decision what aspect of the generative AI pipeline to tackle so it doesn’t change into an everybody problem!

Related Session: Is Gen AI A Data Engineering or Software Engineering Problem?: Barr Moses, Co-Founder & CEO at Monte Carlo

Data Infrastructure

Data engineering teams face headaches like wrangling data from various sources right into a usable format, scaling systems to handle growing data volumes, and ensuring data security and compliance. They also battle technical debt from past shortcuts and maintaining data quality to avoid unreliable results.

Related Session: Data Infrastructure through the Lens of Scale, Performance and Usability: Ryan Boyd, Co-founder of MotherDuck

Foundation Models

Foundation models are game-changers in AI. Trained on massive, diverse data, they’re like super-powered, adaptable AI tools. Unlike single-use models, they will be fine-tuned for a lot of tasks, from language stuff to image generation. Their power is pushing the boundaries of what AI can do.

Related Session: From Research to the Enterprise: Leveraging Foundation Models for Enhanced ETL, Analytics, and Deployment: Ines Chami, Co-founder and Chief Scientist at NUMBERS STATION AI

Data Contracts

An information contract is sort of a handshake for data exchange. It clarifies between provider and consumer: what the info looks like (format), what it means (definitions), how good it’s (quality), and the way it’s delivered (frequency, access). It ensures everyone speaks the identical data language.

Related Session: Building Data Contracts with Open Source Tools: Jean-Georges Perrin, CIO at AbeaData

Semantic Layers

A semantic layer simplifies data evaluation by translating complex data structures into business terms and presenting a unified view from various sources. This empowers users and fosters data-driven decisions.

Related Session: The Value of A Semantic Layer for GenAI: Jeff Curran, Senior Data Scientist at AtScale

Unstructured Data

Unstructured data is information that doesn’t fit neatly right into a pre-defined format like a spreadsheet. Imagine it like a giant pile of documents, emails, videos, and social media posts. While worthwhile, this data will be messy and difficult for computers to research directly.

Related Session: Unlocking the Unstructured with Generative AI: Trends, Models, and Future Directions: Jay Mishra, Chief Operating Officer at Astera

Monolithic Architecture

In software development, a monolithic architecture is a standard approach where the complete application is built as a single, self-contained unit. Imagine an enormous, monolithic rock – all the things is tightly coupled and inseparable. This includes the user interface (what you see and interact with), the business logic (the core functionalities), and the info storage (where information is kept).

Related Session: Data Pipeline Architecture – Stop Building Monoliths: Elliott Cordo, Founder, Architect, and Builder at Datafutures

Experimentation Platforms

An experimentation platform is a tool for running A/B tests on web sites, apps, or marketing campaigns. You create variations of what you must test (e.g., recent layout, pricing), and the platform shows them to different users, analyzes results, and tells you which of them variation works best. It helps make data-driven decisions and improve product performance.

Related Session: Experimentation Platform at DoorDash: Yixin Tang, Engineer Manager at DoorDash

Open Data Lakes

An open data lake is an information lake that prioritizes openness and adaptability. It stores data in vendor-neutral formats and uses open standards for easier access and collaboration, avoiding lock-in to specific vendors. Think of it as a public park in your data, as a substitute of a personal walled garden.

Related Session: Dive into Data: The Future of the Single Source of Truth is an Open Data Lake: Christina Taylor, Senior Staff Engineer at Catalyst Software

Data-Centric AI

Data-centric AI flips the standard approach. Instead of prioritizing models, it focuses on high-quality data (labeling, cleansing, augmentation) to coach them. This iterative cycle repeatedly improves data to recover AI results. Imagine constructing a house: using the perfect tools with bad materials won’t work. Data-centric AI ensures strong data is the inspiration for reliable AI.

Related Session: How to Practice Data-Centric AI and Have AI Improve its Own Dataset: Jonas Mueller, Chief Scientist and Co-Founder at Cleanlab

Sign me up!

As any data engineering skilled knows, the perfect method to stay ahead of the curve is by maintaining with the newest in all things related to data and data engineering. The best method to do this is by joining us at ODSC’s Data Engineering Summit and ODSC East.

At the Data Engineering Summit on April twenty fourth, co-located with ODSC East 2024, you’ll be on the forefront of all the foremost changes coming before it hits. So get your pass today, and keep yourself ahead of the curve.

This article was originally published at summit.ai