Saturday 4 November 2017 9:00 AM - 5:00PM PM

THANK YOU!

Thank you for making Phoenix Data Conference a massive success.
See you next November!

Did you miss it? No worries, the presentations are available below.

View Presentations

SPONSORS

Thank you to all of our Sponsors! Phoenix Data Conference couldn’t happen without you!

Gold Sponsors

Silver Sponsors

Bronze Sponsors

SCHEDULE

Please find the October 27th schedule below. Also feel free to download the PDF version here

  • 8:00 AM

  • 9:00 AM

  • 9:50 AM

  • 10:40 AM

  • 11:30 AM

  • LUNCH

  • SPONSOR

  • 1:30 PM

  • 2:20 PM

  • 3:10 PM

  • 3:50 PM

08:00 - 9:00 AM
speaker photo

Registration

Registration / Breakfast / Swag Bags

Get your name tag and swag bags at the registration desk.

09:00 - 9:40 AM
speaker photo

Lenin Aboagye

Kogni

Autonomous Security: How to use Big Data, Machine Learning and AI to build the Security of the future

The evolution of the general technology landscape continues to influence the security and risk landscape across multiple enterprises. IoT, Cloud, Edge Computing, Containerization, Automation, Big Data, AI continues to influence a whole different breadth of security challenges and opportunities for organizations. In the same breadth with which security teams have to address the emerging technology and risk stack, such technologies should be driving the security technologies landscape to shape the intelligence, velocity, and scale at which security-related decision points can be reached in more accurate, faster, and more efficient manner. This talk will give a sneak peek into thoughts on the technologies that will shape the security capabilities of the future in order to address the challenges of current, emerging technologies including the first introduction on the concept of Autonomous Security and how it is the future of security.

09:00 - 9:40 AM
speaker photo

Charles Zhan

Infusionsoft

Building Data Lake and Deploying Data Products on Google Cloud Platform

We will talk about how Infusionsoft builds a data lake on google cloud platform. We will talk about the challenges that we faced, how we solve the problems using tools like Airflow and Spark and offerings from GCP like Google Cloud Storage and BigQuery. We will also talk about how we deploy data product from problem formulation to deploying in a very cost effective way using BigQuery, Dataflow, App Engine, GCS.

9:00 - 9:40 AM
speaker photo

Brian Jones

Attunity

Commit to Conform for Next Generation Data Lakes

As the demand for large-scale Data Lakes grows exponentially, the need for agility is driving the modernization of end-to-end Data Pipelines. Easily sourcing data from heterogeneous sources in real time (Commit data) and automation of the processes required to land, merge, and conform the data within the Big Data platforms becomes a necessity to drive adoption within the modern enterprise. This session will review market demands and Attunity’s approach for automating the integration and conforming of data in a Big Data environment.

9:00 - 9:40 AM
speaker photo

Gopal Swaminathan

Saama

ML/AI Influence on Insurance Underwriting

Underwriting is one of the key functions to generate & grow revenue for a Insurance company. Machine Learning & Artificial Intelligence will play a major role in this function to change the aspects on Risk Assessment, Coverage Recommendations, etc. In this session, we will deep dive to understand the uses cases & the models that would greatly influence this Business Function.

9:50 - 10:30 AM
speaker photo

Scott Sumner

Unravel Data

Data Application Performance Management

Enterprises are running more and more big data applications in production (like ETL, analytics, machine learning, etc.). These applications have to meet performance-related business needs such as deadlines (e.g., ETL job to finish by market close), fast response time (e.g., Hive queries), and reliability (e.g., fraud app). The data platform team (architect, operations and developers) needs to ensure application reliability, optimize storage and compute while minimizing infrastructure costs, and optimize DevOps productivity. Unravel provides a single pane to optimize, troubleshoot, and analyze the performance of big data applications. Unravel correlates all performance metrics across every layer of the big data stack, from the infrastructure to the services and applications, as well as within each layer to give a true full-stack view of application performance. Unravel then applies artificial intelligence, machine learning, and predictive analytics to empower operations to optimize, troubleshoot and analyze from a single tool.

9:50 - 10:30 AM
speaker photo

Matteo Rebeschini

Elastic

Bots, outliers and outages… Do you know what's lurking in your data?

With the mass amounts of data that are being ingested daily it is nearly impossible by traditional means to understand what is hidden in your data. How do you separate the ordinary from the un-ordinary in a timely fashion? Unsupervised machine learning on time series data enables real-time discovery of those interesting and possibly costly data anomalies. Matteo will describe, build and run several types of machine learning jobs in Elasticsearch that can detect and alert on these anomalies and outliers in real time.

9:50 - 10:30 AM
speaker photo

Nick Dearden

Confluent

Streaming ETL with Apache Kafka and KSQL

Companies new and old are all recognizing the importance of a low-latency, scalable, fault-tolerant data backbone - in the form of the Apache Kafka streaming platform. With Kafka developers can integrate multiple systems and data sources, enabling low latency analytics, event-driven architectures and the population of multiple downstream systems. What’s more, these data pipelines can be built using configuration alone.
In this talk, we’ll see how easy it is to capture a stream of data changes in real-time from a database such as MySQL into Kafka using the Kafka Connect framework and then use KSQL to filter, aggregate and join it to other data, and finally stream the results from Kafka out into multiple targets such as Elasticsearch and MySQL. All of this can be accomplished without a single line of Java code!

9:50 - 10:30 AM
speaker
                                                photo

Raghav Jandhyala

SAP

The Intelligent Enterprise - How SAP is Driving this next frontier of Business Transformation

In this session, we talk about the transformation of Business Enterprises from Industrial Automation to Business Process Automation to Digital Transformation to Intelligent Enterprise. We take a closer look at Machine Learning use cases in Industries, how it is adopted by Customers and how SAP is playing a leading role in digital transformation journey. We will further cover SAP Intelligent Machine Learning Foundation with SAP Leonardo and SAP Enterprise to start your Intelligent Enterprise transformation journey.

10:40 - 11:20 AM
speaker photo

Jaipal Kothakapu

Artha Solutions

Deploying Agile Data Governance Labs in 100 days from Strategy to Execution on Big data Lake

"Developing insights from data assets that are growing in size, scale, variety, and velocity now requires a re-tooling and rethinking of managing information and data in this new world. Hence, the development of an Agile Governance Model that is Fit for Purpose Accurate Complete Timely & Secure (FACTS) to comply with the ever growing privacy, security, and regulatory requirements of managing customer data while allowing you to respond and react at the speed and scale of the changing needs of the market and the competition.
Learn How To: Kick-start your agile data governance initiatives and drive value realization within 100 days of deployment Apply consistent governance strategy, operating model and approach to manage the changing demands of the business Develop a data trust index and funding model to scale governance initiatives across the organization Manage the complexities of Data Governance related issues such as data quality, lineage, metadata and master data in a way that aligns with business value "

10:40 - 11:20 AM
speaker photo

Shekhar Vemuri

Clairvoyant

Operationalizing AI for Scale and Velocity

While a large focus of AI and Machine learning solutions over the last couple of years has been on the experiment design, algorithms, techniques, and frameworks, not enough focus has been on how to operationalize AI in the enterprise. How do you setup pipelines and systems that support repeatability, visibility and governance, quality control, deployment, and tracking? In this talk we will discuss building machine learning platforms to support repeatability, automating experimentation and best practices on setting up AI as an enterprise capability, in production as opposed to a one-off implementation.

We will explore best practices associated with building machine learning platforms that move AI from the experimentation stage to robust and production-ready systems.

10:40 - 11:20 AM
speaker photo

Gaurav Sethi

PayPal
speaker photo

Priyanka Mishra

PayPal

Connected Intelligence

How data lake at PayPal is serving to connect the customers across all the PayPal subsidiaries and assisting merchants in mitigating their risks.

10:40 - 11:20 AM
speaker photo

Jörg Schad

Mesosphere

Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service

The overall evolution towards microservices has caused a lot of IT leaders to radically rethink architectures and platforms. One can hardly keep up with the rapid onslaught on new distributed technologies. The same people who just asked yesterday, "how can we deploy Docker containers?", are now asking "how can we operate Kubernetes-as-a-Service on-premise?", and are about to start asking "how can we operate the open source frameworks of our choice, such as Spark, TensorFlow, HDFS, and more, as a service across hybrid clouds”.
This session will discuss: - Challenges of orchestrating and operating distributed and stateful services such as Kubernetes, Spark, Kafka, HDFS, TensorFlow - Challenges of operating these technologies across hybrid clouds For example, using Kubernetes for container lifecycle management is a great step to take, but how can we manage the across the entire lifecycle (deploy, failure, resize, upgrades, and more) for multiple Kubernetes clusters?

11:30 - 12:10 PM
speaker photo

Thejas Nair

Hortonworks

Apache Hive 3 - A new horizon

7000 analysts, 80ms average latency, 1PB data, 250k BI queries per hour, on-demand deep reporting in the cloud over 100Tb in minutes, cloud and on-prem, fast, scalable, secure, open source. These are the kind of things Hive 3 is delivering for businesses today. If you want to find out how, come join us for this talk.

We will present and demo the new Kafka and Spark integrations, query federation to Druid and RDBMSs, the new BI cache, advances in the transactional system, materialized views, workload management & guardrails, SQL enhancements like constraints, default clauses and more. We will also take a look at the data analytics studio, a tightly integrated visual tool to optimize and enhance your data warehouse.

Apache Hive 3 is a tremendous step forward, come find out how it can change analytics for you!

11:30 - 12:10 PM
speaker photo

Michael Arnold

Clairvoyant
speaker photo

Nithya Koka

Clairvoyant

Tuning the Beloved DB-Engines

Impala boastfully leads in performance against the traditional analytic databases for high concurrency workloads. HBase gives the ability of real-time read/write access to your wide column store data. These services are at the core of the majority of Insight's client projects. In this talk, Nithya and Michael will present case studies of two different client environments and how we tuned the underlying database infrastructure to scale for the needs of each. We will go over what to do to make Impala scale to meet high customer demand. Then we will present tuning steps to take to get an HBase infrastructure to utilize large amounts of available RAM. At the end of the talk will be a raffle.

11:30 - 12:10 PM
speaker photo

Josh Sivey

Microsoft

Developing MPP solutions in Azure

Massively Parallel Processing (MPP) architectures allow the use of large numbers of processors (or computers) to perform a set of coordinated computations simultaneously. MPP architectures can help with the processing and aggregation of large volumes of data with varying shapes, but they can be complicated and costly to deploy and maintain. Join Microsoft Technical Specialist Josh Sivey as he explores using Microsoft Azure as a platform for quickly creating cost-effective MPP solutions. He will use Microsoft Azure to deploy and process data with a Hadoop based MMP architecture (using HDInsight) and a SQL based MMP architecture (using Azure SQL Data Warehouse). The pros and cons of Hadoop based MPP versus SQL based MPP will be discussed, as well as when a company might choose a Platform-as-a-Service (PaaS) offering over an Infrastructure-as-a-Service (IaaS) solution. This talk will be packed full of demos and real-world learnings.

11:30 - 12:10 PM
speaker photo

Chirag Patel

Pitney Bowes

Big Data to Big Profits

Learn more about next generation of Big Data use cases that are driving F30 companies to find net-new revenue generating businesses within 1/20th the cost. Discover new ways of applying AI to digest non-traditional data sources in Big Data within non-traditional industries like Life and Annuity Insurance, Investment Management, Retail Banking, Telecom and Investment Banking

12:10 - 1:10 PM
speaker photo

Lunch


1:10pm - 1:20 PM
speaker photo

Sponsored by Intel® : How the blurring of memory and storage is revolutionizing the data era


1:30 - 2:10 PM
speaker photo

Ginger Gilsdorf

Intel

Revolutionizing the Memory-Storage Hierarchy with Intel® OptaneTM DC persistent memory

What could you do with affordable, large capacity memory that is also persistent? The possibilities are endless, especially in the field of data science. Persistent memory accelerates analytics, big data, and storage workloads across a variety of use cases, bringing new levels of speed and efficiency to the data center and to in-memory computing. Ginger Gilsdorf offers an overview of the recently announced Intel® Optane™ DC Persistent Memory, and shares the exciting potential of this technology in analytics solutions.

1:30 - 2:10 PM
speaker photo

Nitin Mulimani

speaker photo

Amrinder Singh

A Platform for Developing and Managing Data Applications on Hadoop

Lucy-Data Application platform is a "platform for developing and managing data applications on Hadoop (Cornerstone), Netezza, Postgres,Teradata, Real-time or Stream processing. Lucy provides integrated capabilities and life-cycle support for data applications. Lucy out-of-the-box provides a distributed rules engine, capabilities to groom data sets supporting multiple execution environments".

1:30 - 2:10 PM
speaker photo

MemSQL

MemSQL

Building a Fault Tolerant Distributed Architecture

In this talk I will present how we re-architected our distributed system management in MemSQL. Using local control loops in independent nodes, driven by replicated metadata, and feedback systems to detect failures, we improved how MemSQL detects and handles distributed failures.

1:30 - 2:10 PM
speaker photo

Bob O'Brien

Tibco

Better Business Starts With A Data-First Mentality!

Everyone wants analytics, data science, and deeper business insights. However, there is a fundamental flaw with the analytics-first business mentality — poor quality data can plague even the smartest data scientists and most robust analysis tools. What if the business focused first on data; quality, accessibility, connectivity?

Let’s pivot to a data-first mentality — the concept that we start with high-quality, highly accessible data sources and then we connect to analytics, BI, data warehouses, and other consumers. This way, we focus on the fuel that runs the business to generate a better output from analysis and insights. Join us to learn how you can embrace the data-first mentality in your business!

2:20 - 3:00 PM
speaker photo

Kendyl Douglas

Systems Imagination, Inc.
speaker photo

David Schneider

Systems Imagination, Inc.

GPU-accelerated machine learning analyses reveal new fusion proteins important in cancer progression

Author Summary For complex human diseases, identifying the genes harboring susceptibility variants has taken on medical importance. Disease-associated genes provide clues for elucidating disease etiology, predicting disease risk, and highlighting therapeutic targets. Here, we develop a method to predict whether a given gene and disease are associated. To capture the multitude of biological entities underlying pathogenesis, we constructed a heterogeneous network, containing multiple node and edge types. We built on a technique developed for social network analysis, which embraces disparate sources of data to make predictions from heterogeneous networks. Using the compendium of associations from genome-wide studies, we learned the influential mechanisms underlying pathogenesis. Our findings provide a novel perspective about the existence of pervasive pleiotropy across complex diseases. Furthermore, we suggest transcriptional signatures of perturbations are an underutilized resource amongst prioritization approaches. For multiple sclerosis, we demonstrated our ability to prioritize future studies and discover novel susceptibility genes. Researchers can use these predictions to increase the statistical power of their studies, to suggest the causal genes from a set of candidates, or to generate evidence-based experimental hypothesis.

2:20 - 3:00 PM
speaker photo

Thomas Brown

Cloudera

Big Data Journey to the Cloud

The cloud is fast becoming a preferred destination for enterprise big data workloads. Today, organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern IT architecture powered by Cloudera Enterprise. Cloudera makes Hadoop enterprise-ready with a platform that’s fast for business, easy to manage, and secure without compromise. So, whatever the environment, you can deploy with confidence and start recognizing value from all of your data.

2:20 - 3:00 PM
speaker photo

Hari Gottipati

Marsh

Practical Big Data, AI/ML 2018 - trends, technologies and use cases

Data technologies (Big Data, AI, Machine Learning) continue to evolve, becoming ever more efficient, and also more widely adopted around the world. This talk covers the latest developments (since last year) in Big Data Ecosystem including AI & Machine Learning. This talk also covers the past, present, and future of the Big Data tools, patterns, and use cases.

2:20 - 3:00 PM
speaker photo

Ravneet Ghuman

speaker photo

Madhu Sudhan Reddy Gudur

Bizzi – A Subject Matter Expert

Bizzi (Business Intel) plays the role of a subject-matter expert through machine learning from content based on the subject or topic of interest and then predicting whether or not new documents are relevant to topics and entities of interest. Bizzi includes two main parts (1) curating the training data by identifying stripes of keywords and training a model, which reads a document and classifies it based on the vocabulary learned from the training dataset (also referred to as predictive step) (2) evaluating the document through a series of models – Sentiment Analysis, Semantic Role Labeling (SRL) and Named Entity Recognition (NER) along with Entity frequency analysis to make a final prediction on whether or not a document discusses topic of interest for the entity of interest.

3:10pm - 3:50 PM
speaker photo

Robert Sanders

Clairvoyant

Productionalizing Spark Streaming Applications

Spark Streaming has quickly established itself as one of the more popular Streaming Engines running on the Hadoop Ecosystem. Not only does it provide integration with many type of message brokers and stream sources, but it also provides the ability to leverage other major modules in Spark like Spark SQL and MLib in conjunction. This allows for businesses and developers to make use out of data in ways they couldn't hope to do in the past. However, while building a Spark Streaming pipeline, it's not sufficient to only know how to express your business logic. Operationalizing these pipelines and running the application with high uptime and continuous monitoring has a lot of operational challenges. Fortunately, Spark Streaming makes all that easy as well. In this talk, I'll be going over some of the main steps you'll need to take to get your Spark Streaming application ready for production, specifically in conjunction with Kafka. This includes steps to gracefully shutdown your application, steps to perform upgrades, monitoring, various useful spark configurations and more.

3:10 - 3:50 PM
speaker photo

Mike Sharkey

Data and Graphs

Building a Learner Analytics Data Hub in AWS

The Higher Education space depends on multiple sources of federated data. Student applications, student information, course schedules, source activity, financial aid -- most of these data live in different source systems. These data can be used to help promote student success, but the sources must be pulled together in order to uncover valuable insights. This session will describe how a few universities are using an AWS/Redshift data pipeline to aggregate data from multiple sources. In addition to aggregation, the schools utilize additional AWS toolsets for machine learning insights, data health tracking, and trigger event routing..

3:10 - 3:50 PM
speaker photo

Kiran Ramineni

Intel

0 - 60 with Transfer (Machine) Learning

There is an important element that drove the success that drove recent advances in AI and deep learning, Model building and time to market have been reduced greatly with the concept of transferring previous model's knowledge to a future model. With the transfer learning, you can get the best of both worlds. A state of the art model to start with and you can focus on customizing the model for your data. I will explain the basics of this concept with two applications (image classification and text topic modeling).

3:10 - 3:50 PM
speaker photo

James Chien

Microfocus
speaker photo

Curtis Bennett

Microfocus

Big Data doesn't have to mean Hadoop: Using SQL to easily solve Big Data challenges

Big Data doesn't have to mean Hadoop: Using SQL to easily solve Big Data challenges.

3:50pm - 4:10pm
speaker photo

Raffle

Submit your tickets to win great prizes

More details at the conference

PHX DATA CONFERENCE 2018

0

+

Participants

0

Topics

0

Speakers

0

Sponsors

SPEAKERS

We invited digital experts around the world

speaker-photo

Avinash Ramineni

Kogni
speaker-photo

Stephen Viramontes

Assurevote

Gopal Swaminathan

Saama

Brian Jones

Attunity
speaker-photo

Raghav Jandhyala

SAP
speaker-photo

Matteo Rebeschini

Elastic Presentation Slides
speaker-photo

Scott Sumner

Unravel Data Presentation Slides
speaker-photo

Shekhar Vemuri

Clairvoyant
speaker-photo

Robert Sanders

Clairvoyant Presentation Slides
speaker-photo

Jörg Schad

Mesosphere
speaker-photo

Gaurav Sethi

PayPal
speaker-photo

Priyanka Mishra

PayPal
speaker-photo

Jaipal Kothakapu

Artha Solutions
speaker-photo

Josh Sivey

Microsoft
speaker-photo

Ginger Gilsdorf

Intel
speaker-photo

Hari K Gottipati

Marsh
speaker-photo

Chirag Patel

Pitney Bowes
speaker-photo

Thomas Brown

Cloudera
speaker-photo

Ravneet Ghuman

Watch Presentation
speaker-photo

Madhu Sudhan Reddy Gudur

speaker-photo

Curtis Bennett

Microfocus Presentation Slides
speaker-photo

James Chien

Microfocus Presentation Slides
speaker-photo

Mike Sharkey

Data and Graphs
speaker-photo

Kiran Ramineni

Intel Presentation Slides
speaker-photo

Nick Dearden

Confluent
speaker-photo

Thejas Nair

Hortonworks Presentation Slides
speaker-photo

Charles Zhan

Infusionsoft
speaker-photo

Michael Arnold

Clairvoyant Presentation Slides
speaker-photo

Nithya Koka

Clairvoyant Presentation Slides
speaker-photo

Rodrigo Gomes

MemSQL Presentation Slides
speaker-photo

Bob O'Brien

TIBCO Presentation Slides
speaker-photo

Nitin Mulimani

speaker-photo

Amrinder Singh

speaker-photo

Lenin Aboagye

Kogni
speaker-photo

Kendyl Douglas

Systems Imagination, Inc.
speaker-photo

David Schneider

Systems Imagination, Inc.

TESTIMONIALS

PDC is an incredible conference that brings together big data experts from many different industries and backgrounds. It's designed to deliver cutting-edge development and use of technologies in short, easy-to-understand presentations.

Vivaan Barsar, Freelancer

The emphasis on practical applications of big data technologies to real world business problems and decision-making is just right! The importance of defining a problem and designing the strategy to provide architecture that is implementable is emphasized at the Phoenix Data Conference.

Michael Lee, Intel

The Phoenix Data Conference is a great way to meet peers within the industry and to see what everybody else is doing to make sure you don’t fall behind.

Sarah Mesner, College Student

A great event, well-organized and lots of engaging presentations. I hope to go again next year!

John Bingham, American Express
conference photo

WHAT IS PHX DATA CONFERENCE?

Technology leaders in the big data space shared innovative implementations and advances in the hadoop space. Specific challenges around security, talent availability, technical deployments, managed services, and more were discussed by the speakers.

EVENT INFORMATION