# Course Design and Learning Outcomes

Updated May 7, 2021 • Overview of the design philosophy of the course and the Focussed Learning Outcomes

The world of Data Science and Machine Learning is vast and there are too many topics to cover in a single course. So, to help us bring some order to the chaos, this course is organized hierarchically in terms of **Topics**, **Modules** and cross-cutting labels of **Core** and **Non-Core** topics.

## Organization of Topics

The **Topics** of this course all come down to being one of the following:

**Algorithms**for extracting patterns, estimating values, assigning categories or making decisions based on input datasets**Methodologies**for preprocessing, organizing and transforming data in a way that improves our ability to learn useful models and be confident in their correctness**Tasks**we want to perform on datasets such as prediction, classification, anomaly detection, interpretation, etc.

## CORE vs. NON-CORE Topics

To further focus our discussion, we have also designed this course so that some of these methodologies, algorithms, and tasks are labelled as **CORE** and others **NON-CORE**. This doesn’t mean the non-core topics are less important, it may mean they are simply too complex to treat fully in one course where we need to start from the fundamental skills.

Our high-level goals for this course are for the student to leave with:

- A deep understanding and real experience with the most important foundational methods
- A broad understanding of the landscape so that you can find the right tool you need in the right situation. So how to assess which methodologies and algorithms to use for which task given your dataset will be an important learning outcome.

# Learning Outcomes

The learning outcomes are a way to concretely define what it is you should expect to learn in this course and how inform how you will be assessed. The outcomes can be understood in four parts, which necessarily interact with each other and relate to the topics above.

## Theory

For *core* topics (m,a,t):

**define**them at a*detailed*level (ie. could include mathematical definition)**distinguish**them from others when given theoretical cases or concrete examples

For *core* methodologies or algorithms:

**design**a detailed solution for a given task utilizing the core methodology or algorithm**implement**them, in code, on real data to perform a given task

For *non-core* topics:

**define**them at a*high*level**distinguish**them from others when given theoretical cases or concrete examples

## Analysis

Given a new **dataset**:

- Describe the properties of the dataset (size, dimensions, nominal, categorical, continuous, etc.).
- Summarize the data using simple statistical measures.
- Analyse the distribution patterns (eg. mean, variance, skew, missing data, cross-correlation) of the data.

## Design

Given a new dataset and data analysis or machine learning task, be able to do the following:

- Write a concise
**design plan**for performing the task including specific details including:- data preparation pipeline
- data separation, training and validation methodology
- proposal for a specific algorithm, with sufficient parameter choices, to perform the task

- Justify your design choices in writing, including:
- discussion of computational performance tradeoffs
- data requirements of the proposed approach compared to alternatives
- interpretability vs. accuracy tradeoff
- comparison to the next best alternative approach that could be followed

## Implementation

On a given dataset and common Data Analysis and Machine Learning tasks, demonstrate the ability to:

- implement a
**full data processing pipeline**to clean, normalize, otherwise prepare the data - perform
**feature, dimensionality and manifold processing**as needed to obtain a better dataset to perform the task - concretely
**implement in code**a solution for the task using the methods and algorithms from the course - write a short
**descriptive report**with numerical and visual analysis of the performance of your solution and interesting patterns found in the data