Los Angeles 12/17
Ambrish is Director of Product for Data Platform and Personal Loans business unit at Credit Karma, where he has been leading efforts to transform financial underwriting and architecting a recommendation system that assists over 80M Americans make right financial decisions at every step. Prior to Credit Karma, Ambrish cofounded Digital Health startup that provided a smart digital health assistant for patients with pre-diabetic and obesity related conditions. He also led product and engineering leadership roles in building digital acquisition platform at Electronic Arts and one of the largest search & advertising platform Bing.com at Microsoft. Ambrish holds Masters in Computer Science from University of Southern California and MBA from Haas School of Business at UC Berkeley.
People spend more time shopping for hotels where they stay for 2 nights vs. shopping for a loan that they carry for 2 years. By the application of machine learning models to Terabytes of financial data for over 80M Americans, Credit Karma helps people make better financial decisions at every step in their life. The more specific you can get in your recommendations for users, the more trust and ultimately, loyalty you’re able to maintain. Credit Karma only surfaces the most appropriate products for each member to ensure they are matched with the product that makes the most financial sense for their profile. On a macro level, Credit Karma is using this data to address the issue of mispriced financial products Americans live with and often helping them find the alternatives that help make financial progress possible for everyone. Such as with over $1 Trillion in Credit Cards debt many Americans are paying more than 3x interest than what they could get with a personal loan. If Americans refinanced their auto loans, they could save over $30 billion. This and several other examples like this is how Credit Karma is changing the way consumers make their financial decisions.
Bio coming soon.
Abstract coming soon..
Dr. Arun Verma joined the Bloomberg Quantitative Research group in 2003. Prior to that, he earned his Ph.D from Cornell University in the areas of computer science and applied mathematics. At Bloomberg, Arun’s work initially focused on Stochastic Volatility Models for pricing & hedging Derivatives & Exotic financial Instruments. More recently, he has enjoyed working at the intersection of diverse areas such as data science, cross-asset quantitative finance models and machine learning & AI methods to help reveal embedded signals in traditional & alternative data.
The high volume and time sensitivity of news and social media stories requires automated processing to quickly extract actionable information. However, the unstructured nature of textual information presents challenges that are comfortably addressed through machine learning techniques. This talk will cover the following topics: • The application of machine learning in finance • Extracting sentiment from news stories and social media content using machine learning algorithms • Quantitative techniques for constructing aggregated sentiment scores and other derived metrics (e.g., sentiment dispersion) • Demonstrating the sentiment signal based trading strategies that have high risk-adjusted returns • Illustrating variation in sensitivity of sentiment with respect to industry sector, market cap, trading volume, etc.
Abstract coming soon.
Madhav Khurana has a vast amount of experience in Data Science and has worked in India, the UK, Sweden and Germany. He is a Senior Data Scientist at Careem in Berlin, currently leading efforts to optimize dispatching of drivers to customers. As a Data Scientist at King, he worked on a variety of projects to increase players’ retention in Candy Crush games. He also created a dynamic level difficulty model to improve game’s engagement. Prior to applying his Data Science expertise and skills in mobile applications industry, Madhav has helped numerous telecom operators in Asia, Africa, North America and Europe with their customer retention strategies. He did that while working as a Data Scientist at IMImobile, by predicting customer churn, identifying churn triggers, calculating customer lifetime value and devising plans for effective customer relationship management.
The talk is about using machine learning to optimize mobile application experience. Work done by the speaker on Candy Crush games and Careem’s ride hailing application is showcased with details – including how a major failure resulted in establishing the need to predict user behavior. An elaborate used case from Careem is also presented to illustrate best practices in making predictions with mobile app data. A major part of the talk is on feature engineering, arguably the the most important aspect in applied machine learning. Various alternative experiment designs are discussed which are used to deal with interference bias in testing new mobile application features. While the discussed cases are about mobile applications, it should be noted that all these practices can be employed for prediction problems in any other industry.
Colleen M. Farrelly is a data scientist at Graham Holdings (Kaplan Higher and Professional Education) whose research focuses on applications of topology and differential geometry in machine learning and data science. Her industry experience spans healthcare, genomics, education, and business. Lately, she has enjoyed writing articles for lay audiences, which can be found on KDnuggets and Quora.
Identifying trends within high-dimensional datasets can be difficult, and visualization can guide further exploration of the data or provide an easy-to-use check of analysis results. Manifold learning is a broad class of algorithms that map high-dimensional data to lower-dimensional spaces without making assumptions about linearity like principle component analysis. R provides many good packages that wrangle high-dimensional data for easy visualization of trends and subgroups. This talk will demonstrate a couple useful methods on an open-source multivariate time series data using a few lines of R code.
Anna Veronika graduated from the Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University and Yandex School of Data Analysis. She used to work at ABBYY, Microsoft, Bing and Google, and has been working at Yandex since 2015, where she currently holds the position of the head of Machine Learning Systems group.
Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results
in a variety of practical tasks. For a number of years, it has remained the primary method for
learning problems with heterogeneous features, noisy data, and complex dependencies: web search,
recommendation systems, weather forecasting, and many others.
CatBoost (http://catboost.yandex) is a new open-source gradient boosting library, that outperforms existing publicly available implementations of gradient boosting in terms of quality. It has a set of addional advantages.
1. CatBoost is able to incorporate categorical features in your data (like music genre, URL, search query, etc.) in predictive models with no additional preprocessing. For more details on our approach please refer to our NIPS 2017 ML Systems Workshop paper (http://learningsys.org/nips17/assets/papers/paper_11.pdf).
2. CatBoost inference is 20-60 times faster then in other open-source gradient boosting libraries, which makes it possible to use CatBoost for latency-critical tasks.
3. CatBoost has the fastest GPU and multi GPU training implementations of all the openly available gradient boosting libraries.
4. CatBoost requires no hyperparameter tunning in order to get a model with good quality.
The talk will cover a broad description of gradient boosting and its areas of usage and the differences between CatBoost and other gradient boosting libraries. We will also briefly explain the details of the proprietary algorithm that leads to a boost in quality.
In depth look at what differentiates deep learning from rest of ML. In this session, we will take a look at similarities between the human brain and deep learning including the necessary hardware and learning techniques. Deep learning is associated with various forefront ML initiatives such as autonomous vehicles and intelligent digital assistant. This session includes theory about key topics as well as demo using Python, Tensorflow and Keras.
Talk abstract coming soon.
Barry Cassidy is a 15 year veteran at the FT, Barry has held diverse leadership roles running campaign management and planning teams as well as data infrastructure projects. His role as Head of Campaign Planning from 2014 led him to confront the impact of legacy systems and outdated practices on the ability of his teams to deliver timely reporting and impactful insight to customers, an effort that led to his current position as Head of Advertising Data Operations. Prior to the FT, Barry worked for Express Newspapers and News International (Now News UK) as well as several creative and design agency start ups. He is a graduate of the University of Sheffield.
“As the premiere global financial media brand, the FT’s Advertising Operations team has a history of award-winning innovation. For years, managing the FT’s growing volume of multi-channel data has been a top priority requiring new tools and techniques. In 2016, the FT advertising launched its most ambitious data initiative to-date: Deploying Data Operations to automatically normalize and unify high-value data across the FT portfolio. The FTs goal with this project was to improve operational excellence in data management and free AdOps resources from manual data tasks so they could build more innovative ad products faster.”
Robert Parviainen is leading the data science team at Seriously Digital Entertainment, a gaming and entertainment company with a mission of marrying the creative with the data. Before his career in gaming, Robert held research positions at the University of Melbourne and Reykjavik University, and received a Ph.D. in Mathematical Statistics from Uppsala University in Sweden.
Clara Shin is a Business Insight Analyst in the Data Science team at Disney. Her work mainly focuses on building statistical and machine learning models for audience segmentation. She is also experienced in Game analytics, Media forecasting and data engineering. Clara earned a Master’s degree in Statistics from the University of Minnesota. Outside of work, she enjoys hiking, biking and playing video games.
Also the organizer at Big Data LA Meetup.
Will is as a Senior Data Scientist at Netflix in Los Angeles, where he builds machine learning models to predict demand for the movies and TV shows Netflix might want to stream to its subscribers globally. He also occasionally serves as a Data Ambassador for DataKind, bringing state-of-the-art data science practices to bear on problems facing non-profits in the health, education and water sectors since 2013. Previously he worked in the online digital advertising domain in New York. Will received a PhD from Harvard and a BA from Berkeley and has conducted research at Caltech and the University of Chicago, where he specialized in gravitational lensing studies of dark matter and dark energy in the fields of astrophysics and cosmology.
“As streaming products begin to proliferate the entertainment space, moviegoing has been in a constant state of flux. Moviegoers have evolved into highly aware and particular consumers whose entertainment share of wallet has become more and more competitive for theatrical films. This talk will highlight how NRG utilizes AI to help shed light on what motivates theatrical attendance leading to accurate Opening Box Office forecasts as far as 1 year out before a film’s release.”
Vipin is a Principal at Work-Bench, an enterprise tech focused VC fund and startup community in New York. Work-Bench helps corporate executives across various industry verticals find solutions they need to their biggest technology painpoints, investing in the startups whose solutions resonate across the board. Vipin covers the firm’s investments in enterprise infrastructure, DevOps and AI. Prior to joining Work-Bench, Vipin worked in the Office of the CIO at Bank of America, where his team vetted hundreds of startups a year and onboarded many as vendors. He’s been involved with the majority of Work-Bench’s investments last 3 years, sourced investments in CoreOS, Semmle and currently serves as Board Observer at Algorithmia, an open marketplace for algorithms. Vipin was named a 2018 Forbes 30 Under 30 in the Venture Capital category.
Scott Breitenother is an investor and advisor who specializes in building data driven organizations. He was employee #16 at direct-to-consumer mattress startup Casper and founded the company’s industry-leading Data & Analytics team. In a former life, Scott was a Management Consultant at L.E.K. Consulting (which is probably where he developed his love of frameworks and structure). He has a BS in Business Management from Babson College and a MSc in International Management from London School of Economics. When he’s not blogging about analytics trends at LocallyOptimistic.com, you can find him walking around Brooklyn with his wife and daughter.
“The fastest way to doom an Analytics team (and any hope of building a data-driven organization) is to frequently present unreliable data and analyses. But how do you ensure data quality at scale? Analytics leaders from Casper and Harry’s talk about how to build the Data Quality Flywheel – a scalable approach to data quality that empowers everyone in the organization to promote data quality.”
Michael cut his teeth applying econometric research methods to a variety of fields including environmental economics, child welfare policy, healthcare outcomes, and medical treatment efficacy. Michael was the founding member of the analytics team at new-wave men’s grooming brand Harry’s where he drove data science, data engineering, and anything else needed to empower the organization to “make better decisions faster.” In his spare time Michael drinks too much coffee, reads, and pets dogs around Bed-Stuy.
Jennifer Shin is the Founder of 8 Path Solutions, a data science, analytics, and technology company based in NYC. She is an experienced data scientist and management consultant who has successfully led complex, large scale, and high profile projects as a Product Director at NBCUniversal, Director of Data Science at Comcast, Senior Principal Data Scientist at The Nielsen Company and Management Consultant at GE Capital, the Carlyle Group, Fortress Investment Group, the City of New York, and Columbia University.
“Similarity measures are the driving force for nearly every machine learning algorithm and AI driven technology. From cleaning customer information, to ranking product recommendations, to identifying audiences, the application of machine learning and AI in media and entertainment are limitless.”
Christopher Whitely is Senior Director of Data Science and Research at Comcast / Freewheel, working on deep-dive viewership modeling efforts and data-driven advertising campaigns. Previously, as part of Comcast EBI, he led applied analysis projects and developed business intelligence tools to identify drivers of content performance and enhance networks’ programming strategies. Chris also led cross-platform research efforts at the Weather Channel and ad sales research at the national cable networks of NBCU. He holds an MBA from the University of Michigan Ross School of Business and a BA from Middlebury College.
“Marketers are increasingly using data to make more effective media buying decisions and assess the impact of their campaigns. While this is already producing significant benefits, data sets are often limited and have their own biases, making it hard to assess the true impact without a deep understanding of the data. How can data scientists follow a rigorous process for this, and ultimately deliver models and insights to their clients that drive the business? What additional ways can data science and machine learning be applied to the media/entertainment space in the future?”
Igor Uzilevskiy is a Sr. Data Scientist on the Data Integration team at Nielsen, where he works on combining Nielsen data from different sources using statistical methodologies. Igor holds an M.S. in Analytics from Northwestern University.
“Data Integration at Nielsen is used to combine data from different sources, such as various Nielsen panels and surveys, to create modeled single source data. The approach most widely used at Nielsen is referred to as “data fusion” and relies on nearest neighbor matching. For this project we wished to integrate the Nielsen NScore Survey of Consumer Attitudes to Celebrities with three other Nielsen data sources, measuring respectively Consumer Lifestyle Behavior, TV Viewing, and CPG Purchasing. We compared data fusion to logistic regression, and found that logistic regression performed better for this use case according to commonly used metrics.”
Eyal Pfeifel is the CTO and Co-Founder of imperson, a Disney Accelerator alum and developers of conversational AI technology that power premium conversational bots via text, voice, and video.
Olivia manages data products within the Data Strategy team (data engineering, data science, BI, audience activation, marketing sciences). Her primary focus is building in-house tools for analytics, predictive modeling + enhanced audience segmentation to create unique, personalized experiences on-site across all CN brands. A subset of this focus is on a offering called Spire— a highly valued advertising product which enables augmented insights into O+O audiences (marrying purchase data with on-site behavior) + campaign opportunities, as well as partners like Vox, NBCU, Advanced Digital, etc.
Ling is a Senior Research Scientist at Tumblr working with a Data Science and Analytics Team focused on user interest and user behavior analytics with big data. Ling holds her Ph.D. in Statistics from Iowa State University. At Tumblr, she works on many R&D initiatives connecting millions of diverse social media data by leveraging the latest business trends and data science techniques–including machine learning, data mining and predictive models–to create a holistic user experience.
“Tumblr is one of the most popular and vibrant social networks with over 400 million blogs and 160 billion posts, where your interests connect you with your people. In this talk Ling will provide a brief overview of how Tumblr data science and analytics team exploits data science techniques to drive product developments and product operations. Next she will discuss recent projects including real time anomaly detection, user retention and segmentation, and funnel analysis.”
Luis Capelo lead Forbes Media’s Data Products team. Our team is responsible for investigating how Forbes articles are distributed and read, identifying patterns that are then used to improve business metrics via new models and algorithms. Our solutions include an AI-agent that collaborates with writers, a swarm of bots that help our editors distribute our content, and a new analysis tool that traces and predicts how our content is shared.
“At Forbes, we believe that there is great potential in humans and machines working together. We think that machines ought to enhance human abilities, making human work better. That affects how we write stories. In this talk we will be introducing Bertie, our new publishing platform. Bertie is an AI assistant that learns from writers at all times and works with to suggest improvements to their stories. We will discuss Bertie’s features, architecture, and ultimate goals. We will be giving special attention to how we implement an ensemble of machine learning models that, together, makeup a skillset and personality of an AI assistant.”
Sophia Tee is a Principal Data Scientist at Verizon, where she helps guide supply chain strategy in the Planning Analytics Group. She is a native of the tiny island nation of Singapore and graduate of Northwestern University. After beginning her career in finance, Sophia obtained a Masters Degree in Statistics at Yale University purely so that she can tell people she “models professionally.”
Discussion on the pros and cons of the old, current and new machine learning methodologies used for forecasting.
Friederike Schüür is a research engineer at Cloudera Fast Forward Labs, where she imagines what applied machine learning in industry will look like in two years time; a time horizon that fosters ambition and yet provides grounding. She dives into new machine learning capabilities and builds fully functioning prototypes that showcase state-of-the-art technology applied to real use cases. She advises clients on how to make use of new machine learning capabilities, from strategy advising to hands-on collaboration with in-house technical teams. She earned a PhD in Cognitive Neuroscience from University College London and is a long-time data science for social good volunteer with DataKind.
“When we humans learn new tasks, we take advantage of knowledge we gained from learning, or having learned, related tasks. Machines tend to struggle to take advantage of such task relationships. Most machine learning algorithms are trained to master one and one task only. Multi-task learning is an approach to problem solving that allows supervised algorithms to master more than one objective (or task). It works by exposing algorithms to not just one but multiple sets of labels, one for each task. Multi-task trained algorithms learn task relationships in an unsupervised fashion for better performance, akin to human learning. Exposure to multiple sets of labels also nudges algorithms to learn more abstract representations of input data that tend to generalize better. In this talk, I introduce multi-task learning. I cover example applications of multi-task learning to image and text data to explain why it has become an exciting approach for practicing data scientists and I go over the building blocks of a multi-task neural network for text classification (implemented in pytorch) to demonstrate how to build and train multi-task models.”
Amy Yu is Senior Director of Product Strategy & Data Science in the Audience Science team at Viacom, where she leads the design and development of visual data science platforms that enable data-driven decisions across Viacom’s portfolio brands. Amy holds a M.S. in Media Arts and Sciences from MIT Media Lab and graduated from the Jerome Fisher Program in Management and Technology at the University of Pennsylvania.
“In today’s rapidly evolving media landscape, data is emerging as the differentiating factor driving critical innovations and decisions within the content industry. Data science is a critical source of new insights into audience behavior, and is a growing force in shaping how content is created and delivered. This talk will review the key challenges of data science at scale in media, and discuss how Viacom is innovating in the space of big data and advanced audience analytics”
Bio coming soon
Data Scientist and Mentor, Data Science for Social Good Fellowship at University of Chicago
Mollie wears many technical hats including that of a data scientist, a data visualization engineer, and an instructor of both fields. In her career, Mollie has worked on projects involving a wide variety of problems including but not limited to interactive data visualizations, exploratory data analysis, machine learning, corporate data tool creation, course development, instruction, and ideation. Mollie previously worked with Datascope Analytics as a data scientist / consultant and at Metis as a data visualization (D3.js) and data science instructor. In addition to freelance work, she is currently working as a technical mentor with the Data Science for Social Good Fellowship with the University of Chicago. When Mollie is not being a technical nerd, she swing dances as much as possible, listens to educational podcasts, and strives to be all-around fabulous.