This live classroom course is new for 2018! It focuses on the newest technologies of Microsoft Machine Learning Server and SQL Server 2017. [+]
This live classroom course is new for 2018! It focuses on the newest technologies of Microsoft Machine Learning Server and SQL Server 2017. This 2-day course introduces the most important concepts and Tools, and should be followed by the 3-day course: Intermediate Machine Learning in R on SQL Server and Microsoft ML Server. If you have attended a prior course on Machine Learning, like Rafals week-long class Practical Data Science course offered in 2015-2017, and if you are versed in model validity, accuracy, and reliability, then you should consider attending the Intermediate course only. Ask yourself these questions: Can I explain the difference between cross-validation and hold-out testing? Do I know which business metrics correspond to precision and which to recall? Is model accuracy more important than reliability? And how does a boosted decision tree work? If in doubt, please attend both the Introduction course (2 days) and Intermediate course (3 days).
To deliver the best possible training we follow the industry. The agenda and course content are subject to continuous improvement and revision without further notice. Machine Learning Fundamentals We begin with a thorough introduction of all of the key concepts, terminology, components, and tools. Topics include:
Machine learning vs. data mining vs. artificial intelligence
Tool landscape: open source R vs. Microsoft R, Python, SQL Server, ML Server, Azure ML
Teamwork
Algorithms There are hundreds of machine learning algorithms, yet they belong to just a dozen of groups, of which 5 are in very common use. We will introduce those algorithm classes, and we will discuss some of the most often used examples in each class, while explaining which technology tools (Azure ML, SQL, or R) provide their most convenient implementation. You will also learn how to find more algorithms on the Internet and how to figure out if they are any good for real use. Topics include:
What do algorithms do?
Algorithm classes in R, Python, ML Server, Azure ML, and SSAS Data Mining
Supervised vs. unsupervised learning
Classifiers
Clustering
Regressions
Similarity Matching
Recommenders
Data Machine learning requires you to prepare your data into a rather unique, flat, denormalised format. While features (inputs) are always necessary, and you may need to engineer thousands of them, we do not need labels (predictive outputs) in all cases. Topics include:
Cases, observations, signatures
Inputs and outputs, features, labels, regressors, independent and dependent variables, factors
Data formats, discretization/quantizing vs. continuous
Indicator columns
Feature engineering
Azure ML data preparation and manipulation modules
Moving data around and its storage, SQL vs. NoSQL, files, data lakes, BLOBs, and Hadoop
Process of Data Science The process consists of problem formulation, data preparation, modelling, validation, and deployment—in an iterative fashion. You will briefly learn about the CRISP-DM industry-standard approach but the key subject of this module will teach you how to apply the scientific method of reasoning to solve real-world business problems with machine learning and statistics. Notably, you will learn how to start projects by expressing needs as hypotheses, and how to test them. Topics include:
CRISP-DM
Stating business question in data science term
Hypothesis testing and experiments
Students t-test
Pearson chi-squared test
Iterative hypothesis refinement
Introduction to Model Building At the heart of every project we build machine learning models! The process is simple and it follows a well-trodden path. In this module you will build your first decision tree and get it ready for validation in the next module. Topics include:
Connecting to data
Splitting data to create a holdout
Training a decision tree
Scoring the holdout
Plotting accuracy
Introduction to Model Validation The most important aspect of any data science, artificial intelligence, and machine learning project is the iterative validation and improvement of the models. Without validation, your models cannot be reliably used. There are several tests of model validity, most importantly those that check accuracy and reliability. Topics include:
Testing accuracy
False positives vs. false negatives
Classification (confusion) matrix
Precision and recall
Balancing precision with recall vs. business goals and constraints
Introduction to lift charts and ROC curves
Testing reliability
Testing usefulness
Format
The course format is 50% lectures, 30% demos and 20% tutorials. You are encouraged to follow the demos on your machine, and you will be challenged to find answers to 3 larger problems during the tutorials. While they are a hands-on part of the course, if you prefer not to practice, you are welcome to use that time for additional Q&A, or to analyse your own data. We will provide you with all the necessary data sets, and we will explain what free or evaluation edition software needs to be installed to follow the course on your own laptop. In some training centres we are able to provide pre-built machines which you can use instead of your own—please enquire. You will need an Azure account (even a free one) during the course. You can copy course experiments and data into your workspace for learning and for future reference after the course. [-]
Les mer