Saturday 4 November 2017 9:00 AM - 5:00PM PM
November 4th, 2017Grand Canyon University

The largest big data event in the Phoenix valley

RSVP
Registration and Attendance is Free
conference photo

About Phoenix Data Conference

Phoenix Data Conference is the largest data conference in the Phoenix area. PDC brings together local and national leaders in Big Data, Data Science, Data Mining, and other data spaces to discuss the latest trends, newest technologies, and most innovative implementations in Big Data. Come join this free, all-day conference aimed to promote shared knowledge and growth within the big data community.

Speakers

speaker-photo

Hari Gottipati

American Express
speaker-photo

Jeffrey Carpenter

DataStax

Mark Castrovinci

Cloud Navigation
speaker-photo

Sunil Sabat

Protegrity
speaker-photo

Thejas Nair

Hortonworks
speaker-photo

Malo Denielou

Google Cloud
speaker-photo

Abhishek Mehta

Tresata
speaker-photo

Uma Kannikanti

Choice Hotels
speaker-photo

Sunil Pedapudi

Google Cloud
speaker-photo

Clark Richey Jr.

Factgem
speaker-photo

Casey Fox

Swift Transportation
speaker-photo

Jimmy Bates

MapR
speaker-photo

Gopal Swaminathan

Saama
speaker-photo

Mark Goldstein

International Research Center
speaker-photo

Hershey Khan

Clairvoyant
speaker-photo

Avinash Ramineni

Kogni
speaker-photo

Thomas Lin

Qlik
speaker-photo

James Baker

Microsoft
speaker-photo

Venkatesh Sunderam

PayPal
speaker-photo

Tom Brown

Cloudera
speaker-photo

Hannah Smalltree

CAZENA
speaker-photo

Michael Sick

EY
speaker-photo

Nitin Mulimani

American Express
speaker-photo

Amrinder Singh

American Express
speaker-photo

Mark Stouse

Proof
speaker-photo

Michael Arnold

Clairvoyant
speaker-photo

Raghav Jandhyala

SAP
speaker-photo

Curtis Bennett

MICRO FOCUS
speaker-photo

James Chien

MICRO FOCUS

Get Tickets

Don't miss your chance to get tickets to Phoenix Data Conference 2017

0

Participants

0

Topics

0

Speakers

0

Sponsors
RSVP

Schedule

Please find the November 4th schedule below. Also feel free to download the PDF version here

  • 8:00 AM

  • 9:00 AM

  • 9:50 AM

  • 10:40 AM

  • 11:30 AM

  • LUNCH

  • 1:30 PM

  • 2:20 PM

  • 3:10 PM

  • 4:00 PM

08:00 - 9:00 AM
speaker photo

Registration

Registration / Breakfast / Swag Bags

Get your name tag and swag bags at the registration desk.

09:00 - 9:40 AM
speaker photo

Avinash Ramineni

Kogni

Data-centric Security - Discovering, Securing, and Monitoring Sensitive Data in Hadoop

Today’s organizations have rich datasets that power operations and analytics for their businesses. This combined with the explosion in the amount and variety of data being captured and stored in the enterprise opens up concerns around data security and privacy. Securely managing and providing the right governance to ever-expanding data is a key challenge faced by enterprises today. Multiple examples of security breaches have had significant business and economic impact across the world. As modern data lakes combine datasets across the enterprise into a single place making insights easier to achieve – security and governance are often an afterthought. With increasing complexity of big data implementations on Hadoop, creating a secure data lake environment is key to successful analytics solutions. Knowledge of sensitive data storage in enterprise data sources is tribal, not institutional. This creates challenges around monitoring data security within the data lake in a centralized manner. Lack of a structured standard focus on securing data, enabling governance and visibility leads to information security and compliance related issues. Most enterprises are focussed on securing data using perimeter security via firewalls. Globally, data security community accepts that data breaches are inevitable. It is not a question if the perimeter security will be breached, the only question is when. In this talk we will cover how enterprises can move beyond perimeter security and focus on data-centric security by discovering, securing, and monitoring sensitive data in enterprise data sources. Topics:

  • Automatic discovery and tagging of sensitive data at rest and in transit
  • Sensitive data location as a key metric to track, monitor and alert.
  • User activity monitoring by turning audit logs into an audit event firehose and alerting based on real-time anomaly detection
  • Transparently masking the data at ingest
  • Proactively masking the data before egress to cloud storage
  • Data governance policy enforcement
9:00 - 9:40 AM
speaker photo

Moving from a Traditional Enterprise Warehouse

Making the move from a traditional enterprise warehouse to a big data-based architecture is a significant project. Few companies have resources with the specific skills needed to successfully complete the conversion and hit the ground running on the new architecture. It is important to spend the necessary time planning the architecture before you get started. How will the data be consumed? How many users will be querying the system? How often will your workflows need to run? All of those answers will go into planning capacity and identifying the components needed.
Swift has been able to transition from a legacy enterprise RDBMS to a Hadoop-based architecture on time and on budget. In addition to our warehouse replacing the function of our legacy warehouse, our analysts have already been able to complete research that just was not possible under our old environment.

9:00 - 9:40 AM
speaker photo

Malo Denielou

Google Cloud

Future proof, portable batch and streaming pipelines using Apache Beam

Apache Beam is a top-level Apache project which aims at providing a unified API for efficient and portable data processing pipeline. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, Apache Apex, …) and proprietary (e.g., Google Cloud Dataflow). This talk covers the basics of Apache Beam, describes the main concepts of the programming model and talks about the current state of the project.

9:00 - 9:40 AM
speaker photo

Mark Stouse

Proof

Major Trends in Business Analytics for 2018

NA

9:50 - 10:30 AM
speaker photo

Tom Brown

Cloudera

Cloudera Data Science Workbench

Learn how to drive better model accuracy and deliver more strategic analytic capabilities. Tom will introduce you to an enterprise data science platform that accelerates analytics projects from exploration to production.

9:50 - 10:30 AM
speaker photo

Abhishek Mehta

Tresata

Whats the first thing to do now that you sold (bought) the Big Data dream?

you clearly bought into the Big Data dream (well you are at this conference) and jumped feet up and arms down into it. The only remaining question - where to start?  Amongst the buzzwords of cloud, algos, data science, AI and internal priorities, there’s always one thing that takes precedence and makes the journey fruitful - if taken. Question is - do you know what  that step should be? If you do, come join the debate, if you do not, this may help lift the fog.

9:50 - 10:30 AM
speaker photo

James Baker

Microsoft

Succeeding with Big Data Analytics in the Cloud Using Remote Storage

The long-standing assumptions of bringing compute to the data as the best way to achieve big data analytics do not really apply in today’s hyper-scale cloud data centers. Instead, performant and cost-effective remote storage services not only succeed with running contemporary analytics workloads, they also open up a broader set of opportunities for extracting value from your data. In this talk, James Baker will share insights and data describing the benefits of disaggregated storage for the full range of analytics applications, especially in the cloud. He will show compelling performance results and cost analysis as well as best practices for designing and gaining maximum benefit from your cloud-based analytics pipelines.

9:50 - 10:30 AM
speaker photo

Thejas Nair

Hortonworks

Apache Hive 2.3.0 - Even more SQL & Speed improvements in 2017

Apache Hive community two new major releases in last 12 months which bring even more exciting improvements around new functionality as well as performance. New performance improvements around Hive include dynamic semi join reduction, shared scans and efficient use of memory for hash joins in LLAP. Improvements in SQL support now enable all 99 TPC decision support benchmark queries to be run with trivial modifications. Integration with Druid time series database now enables very fast queries on top of data that is ingested in real time. Transactional tables (ACID) now also support merge queries that let you efficiently apply larger updates to your existing data. I will also touch upon some of the other exciting features that are in the road-map.

10:40 - 11:20 AM
speaker photo

Thomas Lin

Qlik

Uncovering the Big ‘V’ in Big Data: Seeing the Value with Qlik

When talking about Big Data, people often talk about the three V’s to describe it: Volume, Velocity, and Variety. Solutions thus far have been focused on addressing these challenges. There are actually a couple more V’s not as widely talked about that are equally important, Veracity and Value.
This session takes a look at the different aspects of Value and how to uncover them in the sea of Big Data. With an appropriate analytics platform like Qlik Sense, see how users of all skillsets can begin extracting insights and Value from their Big Data.

10:40 - 11:20 AM
speaker photo

Venkatesh Sunderam

PayPal

NA

NA

10:40 - 11:20 AM
speaker photo

Jimmy Bates

MapR

Making a wager on your business with data

Every day we take stock of our business and make wagers on what we need for the future. Has the explosion of data made those wagers more or less risky? That is one of the questions we will look at in this talk. What are the choices for these wagers? Are there ways to hedge our bets? We all know that data is increasing at an ever-increasing rate. We all read the blogs that tell us to j ust place your bets on “this” and your success is assured. We can read the large stack of amazing outcomes where the winnings were huge. But right alongside of that is a stack of losers just as big.
We will take a look at successes and failures and try and find some commonality that we can use in picking the projects to drive success with the resources we have as we anti up for that next round of bets. Join us in looking at successful wagers that moved the needle. No matter if your interest is asset failure predictive modeling, competitive asset assessment, instantaneous time series analytics, or a host of other possibilities, the core factors of risk vs. reward are the same. Let’s look at the odds and find a wager that wins.

10:40 - 11:20 AM
speaker photo

Hannah Smalltree

CAZENA

The BDaaS Way to Success: Big Data as a Service 101

Hadoop and Spark are challenging to run in the cloud, where security is critical and mistakes get expensive quickly. That’s why there’s a growing interest in BDaaS or Big Data as a Service. Several companies offer prebuilt "as a service" solutions delivered in the public cloud that help companies jumpstart big data, data science, analytics and digital projects. But the category is young, labels are fuzzy and it’s confusing to navigate. In this session, get an unbiased, business-level introduction to big data services in the cloud. Hear real stories of how companies are finding success and get tips for evaluations.

11:30 - 12:10 PM
speaker photo

Mark Castrovinci

Cloud Navigation

Building a Technology Assisted Review (TAR) system with NLP and Machine Learning

A previously time consuming and costly human task is greatly simplified and accelerated with the introduction of NLP to extract text and Machine Learning to categorize it.

11:30 - 12:10 PM
speaker photo

Clark Richey Jr.

Factgem

The Failed Promise of the Enterprise Data Wareshouse

This talk will explore the promise of the enterprise data warehouse with a focus on highlighting the vast difference between what IT promised us and what has been delivered. Included will be a discussion of how Hadoop also promised to solve this problem and why it failed. We will then examine new possibilities for data integration and demonstrate why businesses should no longer be beholden to IT organizations for data integration.

11:30 - 12:10 PM
speaker photo

Sunil Sabat

Protegrity

Securing enterprise big data at scale using standards for compliance

With the rise of compliance standards like GDPR, the organizations are looking for solutions to secure enterprise big data. Big Data ecosystem is open source, but resident data in the ecosystem cannot be open. Protegrity Big Data Protector secures all sensitive data in Hadoop utilizing advanced tokenization and encryption – at rest in the Hadoop Distributed File System (HDFS); in use during MapReduce, Spark, Hive, and Pig processing; and in transit to and from other data systems. This continuous protection ensures the data is secure throughout its lifecycle, no matter where it is or how it’s used. The actual sensitive data is transparently protected with policy-based controls, while non-sensitive data can remain in the clear. This enables maximum usability for users and processes to continue to mine the data for transformative decision-making insights. We will review few use cases that are in production. We will discuss future real-time and cloud-based data protection use cases and how Protegrity is positioned to address the need.

11:30 - 12:10 PM
speaker photo

Nitin Mulimani

American Express
speaker photo

Amrinder Singh

American Express

Leveraging HBase to handle massive data in real time

Learn how to leverage HBase to manage different Read, Write and Mixed workload. Learn how to leverage different HBase components for the optimal design of the applications. How to troubleshoot existing challenges with HBase performance.

12:10 - 1:30 PM
speaker photo

Lunch

Lunch & Book Signing Event

Phoenix area author and engineer Jeff Carpenter will be signing copies of his O’Reilly book, Cassandra: The Definitive Guide, 2nd Ed

1:30 - 2:10 PM
speaker photo

Hari Gottipati

American Express

Practical Big Data 2017 - tools, technologies and use cases

This talk covers the latest developments (since last year) in Big Data Eco system including AI & Machine Learning. This talk also covers the past, present and future of the Big Data tools, technologies and use cases.

1:30 - 2:10 PM
speaker photo

Gopal Swaminathan

Saama

Future of Analytics for Insurance Companies

How Innovation in Data & Analytics can help Insurance Industry, focused on P&C. How should Solution providers & enablers be geared up to handle this change & make a difference. Sample Case studies & ROI can be presented.

1:30 - 2:10 PM
speaker photo

Jeffrey Carpenter

DataStax

Building a Real-Time Recommendation Engine with Apache Cassandra and DataStax Enterprise Graph

Although big data technology has led to great gains in our ability to analyze vast quantities of data, one of the main obstacles to getting more value from our big data initiatives is the difficulty of operationalizing the results of our analysis in real-time. Jeff Carpenter, Technical Evangelist at DataStax, will share how his team built a real-time recommendation engine using DataStax Enterprise for a video sharing reference application called KillrVideo. You’ll learn how technologies including Apache Cassandra, Apache Spark, and the Gremlin graph traversal language can be used to create recommendation engines and a variety of other real-time analytics applications.

1:30 - 2:10 PM
speaker photo

Michael Sick

EY

The IT Data Lake: Leveraging IT Data to Improve IT and Business Outcomes

NA

2:20 - 3:00 PM
speaker photo

Big Data for IoT: Analytics from Descriptive to Predictive to Prescriptive

As the Internet of Things (IoT) floods data lakes and fills data oceans with sensor and real-world data, analytic tools and real-time responsiveness will require improved platforms and applications to deal with the data flow and move from descriptive to predictive to prescriptive analysis and outcomes.

2:20 - 3:00 PM
speaker photo

Uma Kannikanti

Choice Hotels

BigData - A Simplified Solution for Batch and Real-time Data Processing - Spark Structured Streaming

Overview of choosing the right tool to build data analytics platform.

  • How Choice Hotels uses Structured Streaming to develop an end-user self-service ETL framework
  • Explore Spark MLlib
2:20 - 3:00 PM
speaker photo

Hershey Khan

Clairvoyant
speaker photo

Michael Arnold

Clairvoyant

Secured. Upgraded. Optimized. Managed. Taking your cluster to the next level.

Hadoop enables organizations to implement solutions and applications that leverage data to drive business value. While Hadoop brings all this power to enterprises, it is also very challenging to Manage, Optimize, Upgrade, and most importantly Secure these Hadoop clusters. In this talk, Hershey and Michael will share insights into some of these topics as well as describe Clairvoyant's Hadoop Managed Services offering - INSIGHT - which enables our clients to build upon our expertise.

2:20 - 3:00 PM
speaker photo

Sunil Pedapudi

Google Cloud

Achieving optimal performance automatically in Cloud Dataflow

Performance is great. And even better than finely-tuned, benchmark-optimized systems is performance that works for you. This talk discusses techniques used by Cloud Dataflow to dynamically readjust pipeline behavior for specific executions to achieve "no-knobs" performance optimizations.

3:10 - 3:50 PM
speaker photo

Raghav Jandhyala

SAP

Digital Supply Chain transformation using In-Memory Big Data, Machine Learning and IOT

Supply Chain is the backbone for everything we enjoy today, be it ordering an item online or buying a bottle of water. In this session we will cover how In-Memory Technologies with Data Science help bring insights from big data for performing complex what-if scenarios planning in supply chains, Machine Learning and AI have led to better demand sensing and how millions of connected devices with IOS are helping to take real-time Supply Chain decisions (for example if a truck breaks down). We will also cover how SAP HANA, Leonardo, and Integrated Business Planning are the game changers for this new era of Digital Supply Chain.

3:10 - 3:50 PM
speaker photo

James Chien

MICRO FOCUS
speaker photo

Curtis Bennett

MICRO FOCUS

Machine Learning and Analytics at Big Data Scale with the Vertica Analytics Database

Learn how Vertica can provide superior speed, scale, concurrency, and advanced in-database machine learning all with a familiar SQL-based engine so your organization can analyze data at a lower TCO than alternatives.Come find out what many of the most data-centric companies in the world such as Intuit, Uber and Bank of America already know - Vertica is the secret sauce to Big Data Analytics. With machine learning, and predictive analytics being the talk of the tech industry this year, a lot of organizations are still struggling to get a handle on all of the data. Predictive Analytics is complex—especially with the volume of Big Data available today. Data Scientists are often required to build and tune models using a sample subset of data (which may not adequately represent the larger dataset) before deployment, extending the time needed to operationalize and get value out of the predictive analysis. The Vertica Analytics Database can streamline this process. Join us as we explore the emergence (or reemergence) of machine learning and how data-management practices are evolving to keep up with the new speed and scale of business through continuous, streaming applications and real-time predictive capabilities. What You Will Learn:

  • How Vertica allows for analytics to evolve from descriptive analytics to predictive and prescriptive analytics
  • How you can leverage SQL on large data sets for advanced analytics
  • How leading firms are overcoming data science talent shortages through adoption of new technology

4:00 - 4:30 PM
speaker photo

Raffle

Submit your tickets to win great prizes

More details at the conference

Testimonials

Sponsors

Sponsor this event and showcase your Big Data and Analytics initiatives. Sponsoring grants you benefits that others miss out on.
Contact us if interested at contact@phxdataconference.com.

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Location Sponsors

Event Information

Everything you need to know.

Venue

This year's Phoenix Data Conference is at Grand Canyon University, voted as one of the 10 Best College Campuses Across America.

Transportation

The campus is located near the heart of the city so it is easily reached by private or public vehicles which are available for 24 hours.

Hotel

There are hotel and rental options available around the event location. Check out Google for nearby places.