Please find the October 27th schedule below. Also feel free to download the PDF version here
Registration / Breakfast / Swag Bags
Get your name tag and swag bags at the registration desk.
Autonomous Security: How to use Big Data, Machine Learning and AI to build the Security of the future
The evolution of the general technology landscape continues to influence the security and risk landscape across multiple enterprises. IoT, Cloud, Edge Computing, Containerization, Automation, Big Data, AI continues to influence a whole different breadth of security challenges and opportunities for organizations. In the same breadth with which security teams have to address the emerging technology and risk stack, such technologies should be driving the security technologies landscape to shape the intelligence, velocity, and scale at which security-related decision points can be reached in more accurate, faster, and more efficient manner. This talk will give a sneak peek into thoughts on the technologies that will shape the security capabilities of the future in order to address the challenges of current, emerging technologies including the first introduction on the concept of Autonomous Security and how it is the future of security.
Building Data Lake and Deploying Data Products on Google Cloud Platform
We will talk about how Infusionsoft builds a data lake on google cloud platform. We will talk about the challenges that we faced, how we solve the problems using tools like Airflow and Spark and offerings from GCP like Google Cloud Storage and BigQuery. We will also talk about how we deploy data product from problem formulation to deploying in a very cost effective way using BigQuery, Dataflow, App Engine, GCS.
Commit to Conform for Next Generation Data Lakes
As the demand for large-scale Data Lakes grows exponentially, the need for agility is driving the modernization of end-to-end Data Pipelines. Easily sourcing data from heterogeneous sources in real time (Commit data) and automation of the processes required to land, merge, and conform the data within the Big Data platforms becomes a necessity to drive adoption within the modern enterprise. This session will review market demands and Attunity’s approach for automating the integration and conforming of data in a Big Data environment.
ML/AI Influence on Insurance Underwriting
Underwriting is one of the key functions to generate & grow revenue for a Insurance company. Machine Learning & Artificial Intelligence will play a major role in this function to change the aspects on Risk Assessment, Coverage Recommendations, etc. In this session, we will deep dive to understand the uses cases & the models that would greatly influence this Business Function.
Scott SumnerUnravel Data
Data Application Performance Management
Enterprises are running more and more big data applications in production (like ETL, analytics, machine learning, etc.). These applications have to meet performance-related business needs such as deadlines (e.g., ETL job to finish by market close), fast response time (e.g., Hive queries), and reliability (e.g., fraud app). The data platform team (architect, operations and developers) needs to ensure application reliability, optimize storage and compute while minimizing infrastructure costs, and optimize DevOps productivity. Unravel provides a single pane to optimize, troubleshoot, and analyze the performance of big data applications. Unravel correlates all performance metrics across every layer of the big data stack, from the infrastructure to the services and applications, as well as within each layer to give a true full-stack view of application performance. Unravel then applies artificial intelligence, machine learning, and predictive analytics to empower operations to optimize, troubleshoot and analyze from a single tool.
Bots, outliers and outages… Do you know what's lurking in your data?
With the mass amounts of data that are being ingested daily it is nearly impossible by traditional means to understand what is hidden in your data. How do you separate the ordinary from the un-ordinary in a timely fashion? Unsupervised machine learning on time series data enables real-time discovery of those interesting and possibly costly data anomalies. Matteo will describe, build and run several types of machine learning jobs in Elasticsearch that can detect and alert on these anomalies and outliers in real time.
Streaming ETL with Apache Kafka and KSQL
Companies new and old are all recognizing the
importance of a low-latency, scalable, fault-tolerant
data backbone - in the form of the Apache Kafka
streaming platform. With Kafka developers can integrate
multiple systems and data sources, enabling low latency
analytics, event-driven architectures and the
population of multiple downstream systems. What’s more,
these data pipelines can be built using configuration
In this talk, we’ll see how easy it is to capture a stream of data changes in real-time from a database such as MySQL into Kafka using the Kafka Connect framework and then use KSQL to filter, aggregate and join it to other data, and finally stream the results from Kafka out into multiple targets such as Elasticsearch and MySQL. All of this can be accomplished without a single line of Java code!
The Intelligent Enterprise - How SAP is Driving this next frontier of Business Transformation
In this session, we talk about the transformation of Business Enterprises from Industrial Automation to Business Process Automation to Digital Transformation to Intelligent Enterprise. We take a closer look at Machine Learning use cases in Industries, how it is adopted by Customers and how SAP is playing a leading role in digital transformation journey. We will further cover SAP Intelligent Machine Learning Foundation with SAP Leonardo and SAP Enterprise to start your Intelligent Enterprise transformation journey.
Deploying Agile Data Governance Labs in 100 days from Strategy to Execution on Big data Lake
"Developing insights from data assets that are growing
in size, scale, variety, and velocity now requires a
re-tooling and rethinking of managing information and
data in this new world. Hence, the development of an
Agile Governance Model that is Fit for Purpose Accurate
Complete Timely & Secure (FACTS) to comply with the
ever growing privacy, security, and regulatory
requirements of managing customer data while allowing
you to respond and react at the speed and scale of the
changing needs of the market and the competition.
Learn How To: Kick-start your agile data governance initiatives and drive value realization within 100 days of deployment Apply consistent governance strategy, operating model and approach to manage the changing demands of the business Develop a data trust index and funding model to scale governance initiatives across the organization Manage the complexities of Data Governance related issues such as data quality, lineage, metadata and master data in a way that aligns with business value "
Operationalizing AI for Scale and Velocity
While a large focus of AI and Machine learning solutions over the last couple of years has been on the experiment design, algorithms, techniques, and frameworks, not enough focus has been on how to operationalize AI in the enterprise. How do you setup pipelines and systems that support repeatability, visibility and governance, quality control, deployment, and tracking? In this talk we will discuss building machine learning platforms to support repeatability, automating experimentation and best practices on setting up AI as an enterprise capability, in production as opposed to a one-off implementation.
We will explore best practices associated with building machine learning platforms that move AI from the experimentation stage to robust and production-ready systems.
Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service
The overall evolution towards microservices has caused
a lot of IT leaders to radically rethink architectures
and platforms. One can hardly keep up with the rapid
onslaught on new distributed technologies. The same
people who just asked yesterday, "how can we deploy
Docker containers?", are now asking "how can we operate
Kubernetes-as-a-Service on-premise?", and are about to
start asking "how can we operate the open source
frameworks of our choice, such as Spark, TensorFlow,
HDFS, and more, as a service across hybrid clouds”.
This session will discuss: - Challenges of orchestrating and operating distributed and stateful services such as Kubernetes, Spark, Kafka, HDFS, TensorFlow - Challenges of operating these technologies across hybrid clouds For example, using Kubernetes for container lifecycle management is a great step to take, but how can we manage the across the entire lifecycle (deploy, failure, resize, upgrades, and more) for multiple Kubernetes clusters?
Apache Hive 3 - A new horizon
7000 analysts, 80ms average latency, 1PB data, 250k BI queries per hour, on-demand deep reporting in the cloud over 100Tb in minutes, cloud and on-prem, fast, scalable, secure, open source. These are the kind of things Hive 3 is delivering for businesses today. If you want to find out how, come join us for this talk.
We will present and demo the new Kafka and Spark integrations, query federation to Druid and RDBMSs, the new BI cache, advances in the transactional system, materialized views, workload management & guardrails, SQL enhancements like constraints, default clauses and more. We will also take a look at the data analytics studio, a tightly integrated visual tool to optimize and enhance your data warehouse.
Apache Hive 3 is a tremendous step forward, come find out how it can change analytics for you!
Tuning the Beloved DB-Engines
Impala boastfully leads in performance against the traditional analytic databases for high concurrency workloads. HBase gives the ability of real-time read/write access to your wide column store data. These services are at the core of the majority of Insight's client projects. In this talk, Nithya and Michael will present case studies of two different client environments and how we tuned the underlying database infrastructure to scale for the needs of each. We will go over what to do to make Impala scale to meet high customer demand. Then we will present tuning steps to take to get an HBase infrastructure to utilize large amounts of available RAM. At the end of the talk will be a raffle.
Developing MPP solutions in Azure
Massively Parallel Processing (MPP) architectures allow the use of large numbers of processors (or computers) to perform a set of coordinated computations simultaneously. MPP architectures can help with the processing and aggregation of large volumes of data with varying shapes, but they can be complicated and costly to deploy and maintain. Join Microsoft Technical Specialist Josh Sivey as he explores using Microsoft Azure as a platform for quickly creating cost-effective MPP solutions. He will use Microsoft Azure to deploy and process data with a Hadoop based MMP architecture (using HDInsight) and a SQL based MMP architecture (using Azure SQL Data Warehouse). The pros and cons of Hadoop based MPP versus SQL based MPP will be discussed, as well as when a company might choose a Platform-as-a-Service (PaaS) offering over an Infrastructure-as-a-Service (IaaS) solution. This talk will be packed full of demos and real-world learnings.
Big Data to Big Profits
Learn more about next generation of Big Data use cases that are driving F30 companies to find net-new revenue generating businesses within 1/20th the cost. Discover new ways of applying AI to digest non-traditional data sources in Big Data within non-traditional industries like Life and Annuity Insurance, Investment Management, Retail Banking, Telecom and Investment Banking
Sponsored by Intel® : How the blurring of memory and storage is revolutionizing the data era
Revolutionizing the Memory-Storage Hierarchy with Intel® OptaneTM DC persistent memory
What could you do with affordable, large capacity memory that is also persistent? The possibilities are endless, especially in the field of data science. Persistent memory accelerates analytics, big data, and storage workloads across a variety of use cases, bringing new levels of speed and efficiency to the data center and to in-memory computing. Ginger Gilsdorf offers an overview of the recently announced Intel® Optane™ DC Persistent Memory, and shares the exciting potential of this technology in analytics solutions.
A Platform for Developing and Managing Data Applications on Hadoop
Lucy-Data Application platform is a "platform for developing and managing data applications on Hadoop (Cornerstone), Netezza, Postgres,Teradata, Real-time or Stream processing. Lucy provides integrated capabilities and life-cycle support for data applications. Lucy out-of-the-box provides a distributed rules engine, capabilities to groom data sets supporting multiple execution environments".
Building a Fault Tolerant Distributed Architecture
In this talk I will present how we re-architected our distributed system management in MemSQL. Using local control loops in independent nodes, driven by replicated metadata, and feedback systems to detect failures, we improved how MemSQL detects and handles distributed failures.
Better Business Starts With A Data-First Mentality!
Everyone wants analytics, data science, and deeper business insights. However, there is a fundamental flaw with the analytics-first business mentality — poor quality data can plague even the smartest data scientists and most robust analysis tools. What if the business focused first on data; quality, accessibility, connectivity?
Let’s pivot to a data-first mentality — the concept that we start with high-quality, highly accessible data sources and then we connect to analytics, BI, data warehouses, and other consumers. This way, we focus on the fuel that runs the business to generate a better output from analysis and insights. Join us to learn how you can embrace the data-first mentality in your business!
GPU-accelerated machine learning analyses reveal new fusion proteins important in cancer progression
Author Summary For complex human diseases, identifying the genes harboring susceptibility variants has taken on medical importance. Disease-associated genes provide clues for elucidating disease etiology, predicting disease risk, and highlighting therapeutic targets. Here, we develop a method to predict whether a given gene and disease are associated. To capture the multitude of biological entities underlying pathogenesis, we constructed a heterogeneous network, containing multiple node and edge types. We built on a technique developed for social network analysis, which embraces disparate sources of data to make predictions from heterogeneous networks. Using the compendium of associations from genome-wide studies, we learned the influential mechanisms underlying pathogenesis. Our findings provide a novel perspective about the existence of pervasive pleiotropy across complex diseases. Furthermore, we suggest transcriptional signatures of perturbations are an underutilized resource amongst prioritization approaches. For multiple sclerosis, we demonstrated our ability to prioritize future studies and discover novel susceptibility genes. Researchers can use these predictions to increase the statistical power of their studies, to suggest the causal genes from a set of candidates, or to generate evidence-based experimental hypothesis.
Big Data Journey to the Cloud
The cloud is fast becoming a preferred destination for enterprise big data workloads. Today, organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern IT architecture powered by Cloudera Enterprise. Cloudera makes Hadoop enterprise-ready with a platform that’s fast for business, easy to manage, and secure without compromise. So, whatever the environment, you can deploy with confidence and start recognizing value from all of your data.
Practical Big Data, AI/ML 2018 - trends, technologies and use cases
Data technologies (Big Data, AI, Machine Learning) continue to evolve, becoming ever more efficient, and also more widely adopted around the world. This talk covers the latest developments (since last year) in Big Data Ecosystem including AI & Machine Learning. This talk also covers the past, present, and future of the Big Data tools, patterns, and use cases.
Madhu Sudhan Reddy Gudur
Bizzi – A Subject Matter Expert
Bizzi (Business Intel) plays the role of a subject-matter expert through machine learning from content based on the subject or topic of interest and then predicting whether or not new documents are relevant to topics and entities of interest. Bizzi includes two main parts (1) curating the training data by identifying stripes of keywords and training a model, which reads a document and classifies it based on the vocabulary learned from the training dataset (also referred to as predictive step) (2) evaluating the document through a series of models – Sentiment Analysis, Semantic Role Labeling (SRL) and Named Entity Recognition (NER) along with Entity frequency analysis to make a final prediction on whether or not a document discusses topic of interest for the entity of interest.
Productionalizing Spark Streaming Applications
Spark Streaming has quickly established itself as one of the more popular Streaming Engines running on the Hadoop Ecosystem. Not only does it provide integration with many type of message brokers and stream sources, but it also provides the ability to leverage other major modules in Spark like Spark SQL and MLib in conjunction. This allows for businesses and developers to make use out of data in ways they couldn't hope to do in the past. However, while building a Spark Streaming pipeline, it's not sufficient to only know how to express your business logic. Operationalizing these pipelines and running the application with high uptime and continuous monitoring has a lot of operational challenges. Fortunately, Spark Streaming makes all that easy as well. In this talk, I'll be going over some of the main steps you'll need to take to get your Spark Streaming application ready for production, specifically in conjunction with Kafka. This includes steps to gracefully shutdown your application, steps to perform upgrades, monitoring, various useful spark configurations and more.
Building a Learner Analytics Data Hub in AWS
The Higher Education space depends on multiple sources of federated data. Student applications, student information, course schedules, source activity, financial aid -- most of these data live in different source systems. These data can be used to help promote student success, but the sources must be pulled together in order to uncover valuable insights. This session will describe how a few universities are using an AWS/Redshift data pipeline to aggregate data from multiple sources. In addition to aggregation, the schools utilize additional AWS toolsets for machine learning insights, data health tracking, and trigger event routing..
0 - 60 with Transfer (Machine) Learning
There is an important element that drove the success that drove recent advances in AI and deep learning, Model building and time to market have been reduced greatly with the concept of transferring previous model's knowledge to a future model. With the transfer learning, you can get the best of both worlds. A state of the art model to start with and you can focus on customizing the model for your data. I will explain the basics of this concept with two applications (image classification and text topic modeling).
Submit your tickets to win great prizes
More details at the conference
We invited digital experts around the world