Course ECE657A - Data Analysis and Machine Learning
|First Class Cancelled!|
|See the news section for information|
|Hello Potential Winter 2023 Machine Learning Students!|
|Keep in mind, that this year the cap on this course is lower, so there will be just one course section, and no extra ones added. This means registration will be quite competitive. If you are a PhD or MASc student in ECE, there are reserved spots for you, please use register right away so that you can use them and we can free them up for others.|
|This year will be harder than previous years, we will have higher expectations for technical understanding, some simple proofs, and we are returning to the pre-Covid requirement that final exams are worth 50%. So be prepared, and if you are not sure this course is for you, let other’s who need it, register for it.|
|Offered: Winter 2023||Instructor: Prof. Mark Crowley|
|Website||YouTube Channel / Lectures Schedule|
|Piazza||Hypoth.ECE657A: Collaborative Readings|
|Jump to …|
|Course Topics||Weekly Schedule (TBD)||Assessment|
|Course Tools and Websites||Getting Help||Academic Policies|
|Health and Wellness||COVID-19|
Engineers encounter data in many of their tasks, whether the sources of this data may be from experiments, databases, computer files or the Internet. There is a dire need for effective methods to model and analyze the data and extract useful knowledge from it and to know how to act on it. In this course you will learn the fundamental tools for assessing, preparing and analyzing data. You will learn to design a data and analysis pipeline to move from raw data to task solution. You will learn to implement a variety of analytical and machine learning algorithms to including supervised, unsupervised and other learning approaches. Students will gain practical experience with coding and analysis through assignments. Research students will have opportunity to connect course material to their research as a project instead of some of the assignments.
- Data Structures and Algorithms, Basic Programming Skills (Python especially)
- Basic Knowledge of Probability and Statistics Theory
|Instructor: Prof. Mark Crowley||TA: Josh Sun||TA: Shayan Shirahmad Gale Bagi|
|Contact: email@example.com (use piazza or come to office hours)||firstname.lastname@example.orgemail@example.com|
Lecture Locations and Times
|Section||Room||Time and Day||Class Capacity||Notes|
|Lecture||E7 4053||11:30am - 12:50pm (Mondays and Tuesdays)||148|
The Course Design and Learning Outcomes describe the design philosophy of this course which guide the topics and assessments below.
These topics are an outline, and each year some subset of non-core topics will be skipped due to time constraints and in order to benefit students through deeper focus. When tests are planned to assess knowledge of the material the mandatory core topics will be highlighted to students so they know which will be tested.
Understanding and Preparing Data
- Data types, sources, nature, scales, representations and distributions
- Preparation of Data: missing data, smoothing, transformation and normalization
- Summarizing Data: mean, variance, skew, PCC, cross correlation
- Comparison Measures between datasets
- Experimental Methodology, statistical tests and validation metrics, model capacity, avoiding overfitting, ablation studies, ROC curve
Fundamentals of Estimation and Learning
- Background Review of Probability and Statistics: Random Variables, Conditional Probability, Bayes Rule, Entropy, KL-Divergence, Hypothesis Testing
- Parameter Estimation: statistical approaches, probabilistic approaches (MLE, MAP), other approaches (EM, density estimation)
- MAP Implmentation as Classification: Naive Bayes Algorithm
Representation Learning I
- Feature Selection:
- Feature extraction : PCA, LDA/FDA, ((
ISOMAP)), (( LLE)), t-SNE
- Dimensionality Reduction and
- Vector Embeddings : TF-IDF, Word2Vec, BERT
- Distance based - k-Nearest Neighbours (kNN) Algorithm
- Decision Tree based, Ensemble Methods including Random Forests, Adaboost, XG-Boost, Mondrian Forests
Midterm Test Description (planned)
Types of Data, Data Preprocessing and Analysis, Probabilities, Experimental Methodology (Train/Test/Validate, cross-validation, resampling), Distances and Error Measures, Ablation Experiments, Error Estimation, Confusion Matrix, ROC/PR Curves Parameter Estimation (Bias, Logistic Regression, MLE, MAP, Naive Bayes Classifier), kNN classifier, Representation Learning (Feature Selection, Feature Extraction, PCA, LDA, t-SNE), Decision Trees, Batch Ensemble Methods (including Random Forests, Boosting, Bagging, Gradient Boosted Trees, Adaboost).
((Classification II)) (skip?) Support Vector Machines (SVM) Kernel Methods and Latent Models
- Clustering: Partition, Hierarchical, Model and Density based.
- Clustering evaluation measures
- k-Means Algorithm, DBScan Algorithm
- Anomaly Detection: Classification, Outlier, Density, and Isolation based
- Fundamentals of Neural Networks
- Types of Deep Learning : CNN, RNN
- Classification III : Data, Image and Timeseries classification using Deep Learning
- Effective Deep Learning Training Methods: Attention, Regularization, Optimizers
- Reusing Information: Resnet, Inception
- (may skip) Representation Learning II: Autoencoders and Variational Autoencoders
- Transfer Learning (likely)
- Attention and Transformer Networks
Additional Learning Topics (if time allows)
- Active Learning
- Incremental/Online Learning
At the start of term I’ll send out some initial plans for the course schedule over the term for assignments and topics. I will also talk about this in the first intro lecture on Jan 6. For more information, this website contains core content such as core concepts, topics and grade breakdown will be similar with some small changes.
In terms of preparation, these topics have many resources online, including the ones listed on the website. But there is no required textbook for the course, but there is some content to read that is useful, all the content you need to know will be provided as tutorial notes and slides or assigned reading.
The goal of this course is to help students learn how to analyse and prepare data, describe and apply theoretical concepts in Data Science and Machine Learning, design data processing pipelines and implement important machine learning algorithms on a range of datasets and tasks. This course will be a success if students can use these skills in their future endeavours, research and employment.
No more than one Test or Assignment will be due on any given week.
Weighting of Assessments
35% Assignments (three assignments, done in pairs or alone):
- Collaboration: assignments can be done in pairs or alone
- Topics: Assignments will arise from the major component topics of the course, some will buid on previous assignment outcomes.
- Possibility to have later assignments as Kaggle-style competitive submissions (note: vast majority of grade will be based on performance and correctness rather than based on competitive performance)
- Completing the assignments will require multiple skills:
- simple mathematical proofs and derivations (assignment 1)
- familiarity with statistical, probabilistic analysis of data and results
- logical design and clear description of a expeirmental methodology
- programming various algorithms for processing, training and analysis of data to achieve given tasks (programming will be in Python using libraries such as sci-kit learn and tensorflow, pytorch, pandas)
- Late Assignments:
- Assignment due dates will be at midnight on the designated date.
- Late assignments will have the following penalties from the assigned grade:
- 6 hours (0%)
- 6-24 hours (5%)
- 24-48 hours (10%)
- >48 hours (100%)
- If you know ahead of time that you will not be able to make the deadline do to serious health or personal issues contact the professor to ask for an exception.
- Research Project Option:
- For research based graduate students only (ie. MASc and PhD), you have the option of replacing Asg 2 and 3 with a single research project on a dataset or problem related to your research. At the time that Asg 1 is handed in, present a one page proposal to Prof about your project and he will discuss it with you to make it an appropriate scale project.
- This option must be done alone and will be graded outside the Kritik peer review system.
There will be two graded tests to evaluate your learning in the course, a Midterm and a Final Exam.
These will be done alone, on paper (or on Crowdmark online if needed), in class and it will be time-limited.
For each test, questions will be on content up to that point on concepts, theory and design.
Assessment Weighting Overview
|Item||Weight towards Final Grade|
I know there has been app/feature/tool creep in courses as they the pandemic has worn on, we’re trying to minimize that while still not holding ourselves back when a new tool does something better than an old one.
- Course Website : https://compthinking.github.io/DKMA/
- Course News, Outline, Learning Goals
- Schedule of lectures, assignments and tests
- Links to all resources
- Learn : Log in to learn.uwaterloo.ca
- Online Course management system for UWaterloo.
- Your grades will be managed here, up until the final grade submission phase of the course.
- Links, announcements and course materials will all be made available here as well.
- Only registered students can access learn.
- Piazza : ECE657A Discussions
- Online, threaded discussion forum with at the ability for students to construct an answer in addition to the answer provided by course staff.
- Hypothesis : Hypoth.ECE657A
- For extra reading resources: textbook portions, published papers, unpublished tutorials on arxiv
- Allows collaborative annotation by the whole class and course staff.
- replaces: Zotero
- Crowdmark : (links will be made available as needed)
- A visual grading tool for pdfs submissions of tests and assignments, allows limited online test with mark-down text entry and multiple choice questions.
- Used by the course staff for grading your tests.
- Some assignments and tests might be made available online for submission using this tool as well.
- Live Lectures/Review Sessions/Tutorials: The Tuesday and Thursday scheduled lecture timeslots will be used for a variety of live session types. Initially these will all be online via Microsoft Teams, but later in the term hopefully they can be held in person in the designated lecture halls. See the Planned Weekly Schedule section above for more details.
- Pre-recorded Video Lectures: These will be made available on the course youtube channel, and links from within Learn (and/or) the website.
- Discussion board:
- Piazza will be the main place for detailed discussion and questions. Students can post anonymously (from students only), post a collaborative answer and course staff can confirm these, post their own or run Live Q&A events.
- Go there there and sign up with your UWaterloo email now!
- LEARN Website: The main course content, announcements, grade tracking and materials will be made available on Learn. All registered students should see this in their LEARN courses.
- One-on-one “office hour” meetings: We will provide a contact method (piazza tag, or doodle booking page) to arrange one-on-one meetings with course staff for help. Initially these meetings will be online via Microsoft Teams, but hopefully later in the term they could be held in person.
- Asking questions via email :
- This doesn’t work very well as I have too many emails to manage, you should use the other methods described here.
- In an emergency, if all else has failed contact the Prof via a Team message to arrange a meeting or point him to the question or email that has the main description of the issue.
- AccessAbility Services : http://uwaterloo.ca/accessability-services
- If you need any accommodation, assistance with exams, learning environment, assignments, talk to this office and they can help you set it up as securely and anonymously as possible.
Discussion Group Protocols
- Posts on Piazza can be public or anonymous to your classmates, but they will never be anonymous to the TAs and Instructor.
- Be kind. Assume the best, not the worst. Think before you hit
- Posts which are considered offensive, abusive, bullying, discriminatory to any group or person, will be made private or deleted and followed up with private discussion.
- If you feel there is inappropriate, hurtful behaviour occurring on the discussion forum, please notify the professor, TAs or department staff as you feel appropriate.
- If you really can’t get in touch with anyone and it is an emergency you can contact Prof. Crowley directory via Microsoft Teams messaging (please don’t abuse this though :- )
There is no required textbook. But most of the course is based on the following books and will be useful to take a look at them.
- Arxiv Tutorials - pdfs will be posted to hypothes.is and others links will be provided to a number of detailed tutorials on some course topics.
- K. Murphy. “Machine Learning: A Probabilistic Perspective”. MIT Press, 2012.
- I. Goodfellow, Y. Bengio and A. Courville. “Deep Learning”. MIT Press, 2016.
- Online for free at http://www.deeplearningbook.org*. The first half covers many of basics of this course, while the second half focusses on Deep Learning only.
- R. O. Duda, P. E. Hart and D. G. Stork. Pattern Classification (2nd ed.), John Wiley and Sons, 2001.
Papers and electronic references will be made available on the course website which is on LEARN (go to http://learn.uwaterloo.ca to log in).
Recipe for success:
- Ask questions.
- Connect with your classmates.
- Do the assignments.
- Ask questions!
- Most of all, have fun! …yes really.
Health and Safety
- see these slides on university policy for COVID-19 safety once in-person classes begin again.
- Attendance: Students are to be instructed to attend only the section for which they are registered. If you wish to attend a different section (less people are registered for section 2) you should transfer to that section using official means.
- Absence: Students shall not attend class if they are experiencing influenza-like illness, have been in close contact with someone who is ill, or have travelled outside of Canada within the past 14 days. You will be able to engage with the course content online while reducing the risk of others becoming ill.
- Face coverings: Wearing of face-covering/mask is a requirement in all common areas on campus, including all indoor instructional spaces.
- Students who will not wear masks will be asked to leave the classroom. If the student has a medical reason why they cannot wear a mask they should contact the professor electronically and provide proof of this.
- As such, no food is allowed to be consumed in instructional space. Beverages are allowed if a straw is used or if the mask is lowered only for a brief period.
- When a student asks or answers a question it may be difficult for them to be heard while wearing a mask. A student may briefly lower their mask to ask/answer the question and then the mask must be replaced.
- Hand hygiene: Students are expected to practice frequent hand hygiene (handwashing with soap and water or use of hand sanitizer), including immediately before coming into an instructional space
- Seating: Students are permitted to sit where they wish. Students are encouraged to sit with one seat left empty between them and other students when possible.
- Student illness: In the event of absence due to influenza-like illness or required self-isolation, students shall submit an Illness Self-declaration. Students can find the Illness Self-declaration form in the Personal Information section of Quest. A doctor’s note for accommodation is not required.
Fair Contingencies for Emergency Remote Teaching
We are facing unusual and challenging times. The course outline presents the instructor’s intentions for course assessments, their weights, and due dates in Winter 2023. As best as possible, we will keep to the specified assessments, weights, and dates. To provide contingency for unforeseen circumstances, the instructor reserves the right to modify course topics and/or assessments and/or weight and/or deadlines with due and fair notice to students. In the event of such challenges, the instructor will work with the Department/Faculty to find reasonable and fair solutions that respect rights and workloads of students, staff, and faculty.
Wellness Support and Contact Information.
University can be a challenging environment and it is normal to need support from time-to-time. Campus Wellness services are available to students through counselling and health services. If you are struggling or need someone to talk to you, please reach out.
To book an appointment or learn more about the services, call 519-888-4567 x 32655 or explore www.uwaterloo.ca/campus-wellness.
If you’re experiencing a crisis and feel unable to cope and Campus Wellness is closed, contact any of these after-hours supports: EmpowerMe (1-833-628-5589), Good2Talk (1-866-925-5454) or Here 24/7 (1-844-437-3247). They are available at any time of the day or night to help.
General Course Policy and Rules
Online Academic Integrity
In order to maintain a culture of academic integrity, members of the University of Waterloo community are expected to promote honesty, trust, fairness, respect and responsibility. [Check www.uwaterloo.ca/academicintegrity/ for more information.]
All students are expected to work individually, or in pairs as described for assignments, and submit their own original work. Under Policy 71, the instructor may have follow-up conversations with individual students to ensure that the work submitted was completed on their own. Any follow up will be conducted remotely (e.g., MS Teams, Skype, phone), as the University of Waterloo has suspended all in-person meetings until further notice.
A student who believes that a decision affecting some aspect of his/her university life has been unfair or unreasonable may have grounds for initiating a grievance. Read Policy 70, Student Petitions and Grievances, Section 4, http://www.adm.uwaterloo.ca/infosec/Policies/policy70.htm. When in doubt please be certain to contact the department’s administrative assistant who will provide further assistance.
A student is expected to know what constitutes academic integrity to avoid committing academic offenses and to take responsibility for his/her actions. A student who is unsure whether an action constitutes an offense, or who needs help in learning how to avoid offenses (e.g., plagiarism, cheating) or about “rules” for group work/collaboration should seek guidance from the course professor, academic advisor, or the undergraduate associate dean. For information on categories of offenses and types of penalties, students should refer to Policy 71, Student Discipline, http://www.adm.uwaterloo.ca/infosec/Policies/policy71.htm. For typical penalties check Guidelines for the Assessment of Penalties, http://www.adm.uwaterloo.ca/infosec/guidelines/penaltyguidelines.htm.
Plagiarism-detection software may be used on any submitted work.
A decision made or penalty imposed under Policy 70, Student Petitions and Grievances (other than a petition) or Policy 71, Student Discipline may be appealed if there is a ground. A student who believes he/she has a ground for an appeal should refer to Policy 72, Student Appeals, http://www.adm.uwaterloo.ca/infosec/Policies/policy72.htm.
Note for students with disabilities:
The Office for Persons with Disabilities (OPD), located in Needles Hall, Room 1132, collaborates with all academic departments to arrange appropriate accommodations for students with disabilities without compromising the academic integrity of the curriculum. If you require academic accommodations to lessen the impact of your disability, please register with the OPD at the beginning of each academic term.