Phoenix Data Conference is the largest data conference in the Phoenix area. PDC brings together local and national leaders in Big Data, Data Science, Data Mining, and other data spaces to discuss the latest trends, newest technologies, and most innovative implementations in Big Data. Come join this free, all-day conference aimed to promote shared knowledge and growth within the big data community.
Please find the November 4th schedule below. Also feel free to download the PDF version here
Get your name tag and swag bags at the registration desk.
Today’s organizations have rich datasets that power operations and analytics for their businesses. This combined with the explosion in the amount and variety of data being captured and stored in the enterprise opens up concerns around data security and privacy. Securely managing and providing the right governance to ever-expanding data is a key challenge faced by enterprises today. Multiple examples of security breaches have had significant business and economic impact across the world. As modern data lakes combine datasets across the enterprise into a single place making insights easier to achieve – security and governance are often an afterthought. With increasing complexity of big data implementations on Hadoop, creating a secure data lake environment is key to successful analytics solutions. Knowledge of sensitive data storage in enterprise data sources is tribal, not institutional. This creates challenges around monitoring data security within the data lake in a centralized manner. Lack of a structured standard focus on securing data, enabling governance and visibility leads to information security and compliance related issues. Most enterprises are focussed on securing data using perimeter security via firewalls. Globally, data security community accepts that data breaches are inevitable. It is not a question if the perimeter security will be breached, the only question is when. In this talk we will cover how enterprises can move beyond perimeter security and focus on data-centric security by discovering, securing, and monitoring sensitive data in enterprise data sources. Topics:
Making the move from a traditional enterprise warehouse to a big data-based architecture is a significant project. Few companies have resources with the specific skills needed to successfully complete the conversion and hit the ground running on the new architecture. It is important to spend the necessary time planning the architecture before you get started. How will the data be consumed? How many users will be querying the system? How often will your workflows need to run? All of those answers will go into planning capacity and identifying the components needed. Swift has been able to transition from a legacy enterprise RDBMS to a Hadoop-based architecture on time and on budget. In addition to our warehouse replacing the function of our legacy warehouse, our analysts have already been able to complete research that just was not possible under our old environment.
Apache Beam is a top-level Apache project which aims at providing a unified API for efficient and portable data processing pipeline. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, Apache Apex, …) and proprietary (e.g., Google Cloud Dataflow). This talk covers the basics of Apache Beam, describes the main concepts of the programming model and talks about the current state of the project.
Learn how to drive better model accuracy and deliver more strategic analytic capabilities. Tom will introduce you to an enterprise data science platform that accelerates analytics projects from exploration to production.
you clearly bought into the Big Data dream (well you are at this conference) and jumped feet up and arms down into it. The only remaining question - where to start? Amongst the buzzwords of cloud, algos, data science, AI and internal priorities, there’s always one thing that takes precedence and makes the journey fruitful - if taken. Question is - do you know what that step should be? If you do, come join the debate, if you do not, this may help lift the fog.
The long-standing assumptions of bringing compute to the data as the best way to achieve big data analytics do not really apply in today’s hyper-scale cloud data centers. Instead, performant and cost-effective remote storage services not only succeed with running contemporary analytics workloads, they also open up a broader set of opportunities for extracting value from your data. In this talk, James Baker will share insights and data describing the benefits of disaggregated storage for the full range of analytics applications, especially in the cloud. He will show compelling performance results and cost analysis as well as best practices for designing and gaining maximum benefit from your cloud-based analytics pipelines.
Apache Hive community two new major releases in last 12 months which bring even more exciting improvements around new functionality as well as performance. New performance improvements around Hive include dynamic semi join reduction, shared scans and efficient use of memory for hash joins in LLAP. Improvements in SQL support now enable all 99 TPC decision support benchmark queries to be run with trivial modifications. Integration with Druid time series database now enables very fast queries on top of data that is ingested in real time. Transactional tables (ACID) now also support merge queries that let you efficiently apply larger updates to your existing data. I will also touch upon some of the other exciting features that are in the road-map.
When talking about Big Data, people often talk about the three V’s to describe it: Volume, Velocity, and Variety. Solutions thus far have been focused on addressing these challenges. There are actually a couple more V’s not as widely talked about that are equally important, Veracity and Value. This session takes a look at the different aspects of Value and how to uncover them in the sea of Big Data. With an appropriate analytics platform like Qlik Sense, see how users of all skillsets can begin extracting insights and Value from their Big Data.
Every day we take stock of our business and make wagers on what we need for the future. Has the explosion of data made those wagers more or less risky? That is one of the questions we will look at in this talk. What are the choices for these wagers? Are there ways to hedge our bets? We all know that data is increasing at an ever-increasing rate. We all read the blogs that tell us to j ust place your bets on “this” and your success is assured. We can read the large stack of amazing outcomes where the winnings were huge. But right alongside of that is a stack of losers just as big. We will take a look at successes and failures and try and find some commonality that we can use in picking the projects to drive success with the resources we have as we anti up for that next round of bets. Join us in looking at successful wagers that moved the needle. No matter if your interest is asset failure predictive modeling, competitive asset assessment, instantaneous time series analytics, or a host of other possibilities, the core factors of risk vs. reward are the same. Let’s look at the odds and find a wager that wins.
Hadoop and Spark are challenging to run in the cloud, where security is critical and mistakes get expensive quickly. That’s why there’s a growing interest in BDaaS or Big Data as a Service. Several companies offer prebuilt "as a service" solutions delivered in the public cloud that help companies jumpstart big data, data science, analytics and digital projects. But the category is young, labels are fuzzy and it’s confusing to navigate. In this session, get an unbiased, business-level introduction to big data services in the cloud. Hear real stories of how companies are finding success and get tips for evaluations.
A previously time consuming and costly human task is greatly simplified and accelerated with the introduction of NLP to extract text and Machine Learning to categorize it.
This talk will explore the promise of the enterprise data warehouse with a focus on highlighting the vast difference between what IT promised us and what has been delivered. Included will be a discussion of how Hadoop also promised to solve this problem and why it failed. We will then examine new possibilities for data integration and demonstrate why businesses should no longer be beholden to IT organizations for data integration.
With the rise of compliance standards like GDPR, the organizations are looking for solutions to secure enterprise big data. Big Data ecosystem is open source, but resident data in the ecosystem cannot be open. Protegrity Big Data Protector secures all sensitive data in Hadoop utilizing advanced tokenization and encryption – at rest in the Hadoop Distributed File System (HDFS); in use during MapReduce, Spark, Hive, and Pig processing; and in transit to and from other data systems. This continuous protection ensures the data is secure throughout its lifecycle, no matter where it is or how it’s used. The actual sensitive data is transparently protected with policy-based controls, while non-sensitive data can remain in the clear. This enables maximum usability for users and processes to continue to mine the data for transformative decision-making insights. We will review few use cases that are in production. We will discuss future real-time and cloud-based data protection use cases and how Protegrity is positioned to address the need.
Phoenix area author and engineer Jeff Carpenter will be signing copies of his O’Reilly book, Cassandra: The Definitive Guide, 2nd Ed
This talk covers the latest developments (since last year) in Big Data Eco system including AI & Machine Learning. This talk also covers the past, present and future of the Big Data tools, technologies and use cases.
How Innovation in Data & Analytics can help Insurance Industry, focused on P&C. How should Solution providers & enablers be geared up to handle this change & make a difference. Sample Case studies & ROI can be presented.
Although big data technology has led to great gains in our ability to analyze vast quantities of data, one of the main obstacles to getting more value from our big data initiatives is the difficulty of operationalizing the results of our analysis in real-time. Jeff Carpenter, Technical Evangelist at DataStax, will share how his team built a real-time recommendation engine using DataStax Enterprise for a video sharing reference application called KillrVideo. You’ll learn how technologies including Apache Cassandra, Apache Spark, and the Gremlin graph traversal language can be used to create recommendation engines and a variety of other real-time analytics applications.
As the Internet of Things (IoT) floods data lakes and fills data oceans with sensor and real-world data, analytic tools and real-time responsiveness will require improved platforms and applications to deal with the data flow and move from descriptive to predictive to prescriptive analysis and outcomes.
Overview of choosing the right tool to build data analytics platform.
Performance is great. And even better than finely-tuned, benchmark-optimized systems is performance that works for you. This talk discusses techniques used by Cloud Dataflow to dynamically readjust pipeline behavior for specific executions to achieve "no-knobs" performance optimizations.
Supply Chain is the backbone for everything we enjoy today, be it ordering an item online or buying a bottle of water. In this session we will cover how In-Memory Technologies with Data Science help bring insights from big data for performing complex what-if scenarios planning in supply chains, Machine Learning and AI have led to better demand sensing and how millions of connected devices with IOS are helping to take real-time Supply Chain decisions (for example if a truck breaks down). We will also cover how SAP HANA, Leonardo, and Integrated Business Planning are the game changers for this new era of Digital Supply Chain.
More details at the conference
Sponsor this event and showcase your Big Data and Analytics initiatives. Sponsoring grants you benefits that others miss out on. Contact us if interested at firstname.lastname@example.org.
Everything you need to know.
This year's Phoenix Data Conference is at Grand Canyon University, voted as one of the 10 Best College Campuses Across America.
The campus is located near the heart of the city so it is easily reached by private or public vehicles which are available for 24 hours.
There are hotel and rental options available around the event location. Check out Google for nearby places.