AWS Big Data Blog

Call for Papers! DEEM: 1st Workshop on Data Management for End-to-End Machine Learning

DEEM

Amazon and Matroid will hold the first workshop on Data Management for End-to-End Machine Learning (DEEM) on May 14th, 2017 in conjunction with the premier systems conference SIGMOD/PODS 2017 in Raleigh, North Carolina. For more details about the workshop focus, see Challenges and opportunities in machine learning below.

DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management, and systems research to discuss data management issues in ML application scenarios.

We’re soliciting research papers that describe preliminary and ongoing research results. We’re also looking for reports from industry describing end-to-end ML deployments. Submissions can either be short papers (4 pages) or long papers (up to 10 pages) following the ACM proceedings format.

Register and submit: https://cmt3.research.microsoft.com/DEEM2017/ (account needed)

Submission Deadline: February 1, 2017

Notification of Acceptance: March 1, 2017

Final papers due: March 20, 2017

Workshop: May 14th, 2017

Follow us on twitter @deem_workshop.

Challenges and opportunities in machine learning

Applying machine learning (ML) in real-world scenarios is challenging. In recent years, the database community has focused on creating systems and abstractions for efficiently training ML models on large datasets. But model training is only one of many steps in an end-to-end ML application. Many orthogonal data management problems arise from the large-scale use of ML. The data management community needs to focus on these problems.

For example, preprocessing data and extracting feature workloads causes complex pipelines that often require the simultaneous execution of relational and linear algebraic operations. Next, the class of the ML model to use needs to be chosen. For that, a set of popular approaches such as linear models, decision trees, and deep neural networks often must be analyzed, evaluated, and interpreted.

The prediction quality of such ML models depends on the choice of features and hyperparameters, which are typically selected in a costly offline evaluation process. Afterwards, the resulting models must be deployed and integrated into existing business workflows in a way that enables fast and efficient predictions while allowing for the lifecycle of models (that become stale over time) to be managed.

As a further complication, the resulting systems need to take the target audience of ML applications into account. This audience is heterogeneous, ranging from analysts without programming skills that possibly prefer an easy-to-use, cloud-based solution, to teams of data processing experts and statisticians that develop and deploy custom-tailored algorithms.

DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management and systems research to discuss data management issues in ML application scenarios. This workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments.

Questions? Please send them to info@deem-workshop.org

amazon_matroid