Machine Learning with Python: MITx Course Review
Overall impression
I have recently finished Machine Learning with Python: From Linear Models to Deep Learning by MITx. This is my first structured course on machine learning. Before this, I have a vague and mostly conceptual idea of what machine learning is. I aim to learn the details and operations of machine learning algorithms and utilise these skills for my research interests.
This course is ambitiously designed. As an introductory course, it really tries to cover a wide range of machine learning algorithms:
linear classifiers, non-linear classifiers, linear regression, neural networks (Recurrent Neural Networks, Convolutional Neural Networks), clustering (e.g., K-means), generative models, EM algorithm, mixture models, reinforcement learning, and natural language processing.
Throughout the course, you will learn details about machine learning jargons like training, validation, testing, parameter tuning, and feature engineering. This is a computationally and mathematically intense course. I've learnt a few new math skills like how to construct the cost/loss functions, Bellman equation and convolution. Since I did not take other machine learning courses, I have nothing to compare. Overall, I achieve my goal of deepening my understanding on machine learning and get to practice a few projects. It is a great learning experience.
What I like about it.
I love the breadth of the course. It gives me a big picture of what machine learning can achieve, its scope of application, the thinking logic behind, the kind of python skills that are needed to operate those algorithms and the amount and type of data to make it successful.
Teaching assistants' (TA) notes are extremely helpful. Recitation videos by TAs are great resource to compensate the general lectures. In fact, I have to say — I learn most from TA's materials.
I also enjoy the projects for that they test my understanding of algorithms at a deeper level and they force me to translate conceptual understanding into actual codes. This process is the actual learning. I can see myself improve quite a bit in coding after several projects.
What I don't like about it.
The quality of the lectures fluctuates. Some are clear but not enough critical details, yet some are too general. Although the course team did mention there would be gap between lectures and projects, and that we should make up the gap by our self learning — surely, several key lectures are somehow neglecting crucial details, which cause confusion and create a knowledge gap that's larger than intended. I spent a lot of time searching for answers and asking questions in discussion board and the internet than I would have wanted.
Discussion board is not monitored as closely as the probability course. I can see piling up questions, unanswered and ignored. Community TAs are very helpful indeed, but they are limited in their time and capability.
My takeaways
My key takeaway is the logic behind machine learning — a way to automate tasks. For example, junk email. Sure, as a human, you can spot one out quite easily. What about tens of thousands of emails? It is impossible to hire a team of humans to just deal with junk emails. There needs to be an automated system to deal with tedious tasks, though probably with less accuracy. Depending on your tasks, the machine learning algorithms involved could look vastly different. Yet the logic remains. Machine learning is an umbrella term for using algorithms to find patterns in data for your goals — be it filtering out junk email, predicting stock prices or designing robot vacuums.
In general, machine learning is a data-driven method. You may study past data, identify your problem (e.g., classify, predict, model), validate your choice of algorithms in validation data set, tune your parameters, balance your algorithm's generality and complexity and eventually test it in real world. This is the supervised learning route.
Another route is called the unsupervised learning. It is not literally unsupervised by humans. It means we don't provide a pre-determined label (answer) for the data, e.g., we don't set examples for what is a junk email. Instead, we just present all these data (e.g., all emails) and let the algorithms decide the commonality (e.g., junk emails may tend to be using a lot of exclamation mark or mentioning money prizes) of them — clustering. Of course, there are various ways to cluster (e.g., K-means, mixture models — based on probability theory). This method is helpful when we just don't know the categories of the data due to its scale and our insufficient knowledge. One application of this kind of clustering is recommendation systems.
Think how Netflix/Amazon recommends shows/things to you. You may be part of several clusters based on your view/purchase record.
Most of the time, we just don't know enough about our world to categorise things, or act optimally. This brings us to reinforcement learning. Reinforcement learning is quite different from supervised and unsupervised learning. I had a hard time when the lectures transitioned from supervised and unsupervised learning to reinforcement learning, as it seems to be an entirely different field. Reinforcement learning is in fact more like human actions. It models how an agent (you can think of it as a human or robot) figures out a way to act optimally in an unknown dynamic environment so as to maximise its cumulative/expected awards. Doesn't this sound like us living in this world?
Reinforcement learning is hard for me. The math is brain racking. The programming is even…worse. I need more practices for sure. From the above, I guess my second takeaway is a much elevated understanding in machine learning and in what I am lacking in knowledge and skills.
The third takeaway is probably more philosophical. Now that I see the big picture of what artificial intelligence is and can achieve, on the one hand, I felt like I have been missing the front row of human intelligence roadshow; on the other hand, I realise how much technology has infiltrated our daily life and how little we are aware of it. We just take everything for granted at an affordable price (I know this may sound privileged). It strikes me that how little I knew, and every time I thought I knew very well about something, I did not. The fact is that I only knew a fraction of it. Let me summarise my third takeaway after all the gibberish. The takeaway is that the experience of learning something completely new and challenging is always a delight and an enlightenment. Learning always makes me humble and eager to know more.
Course details
Official Prerequisites from course site:
- 6.00.1x or proficiency in Python programming
6.00.1x is MITx: Introduction to Computer Science and Programming Using Python. This is an entry-level python class. Great course. I audited it before as a review. I would recommend Harvard's CS50 series taught by Prof. David J. Malan if you are newbie to programming. Malan's teaching is top notch. Incredibly clear, fun and completely no assumption on your prior knowledge. In most introductory courses, professors seem to unconsciously assume some prior knowledge from students about the subject, thus missing details that later become obstacles. I know this would be hard to realise especially when you become an expert in a field and things just come naturally. Thus, I really like Malan's teaching and the CS50 courses. Excellent in terms of depth, breadth and details! Python skills are quite crucial for this course. I learnt more than a few new tricks throughout.
- 6.431x or equivalent probability theory course
6.431x is MITx: Probability — The Science of Uncertainty and Data. A great course and I reviewed it here. Worth taking. Probability knowledge is needed particularly in mixture models and reinforcement learning.
- College-level single and multi-variable calculus
My first impression is that this is technically a programming course from the title. In fact, there are a lot more math than I would expect. Yes, calculus is desperately needed. In fact, I would say, if you are not so proficient in calculus, you would probably have a harder time in this course, apart from all the programming hurdles.
- Vectors and matrices
Linear algebra is essential in machine learning. How much is enough? See my story here, where I introduce basics of machine learning.
Workload
The course is composed of lectures, exercises, homework and projects. You have two weeks for each unit. The lectures and exercises are not time intensive. However, the homework and particularly the projects, take time. As I mentioned before, there is gap between lectures and projects. While I was doing the projects, I often had to re-watch parts of lectures again, watch TA's recitation videos and learn from other sources in the internet. It took me days for the projects. On average, expect 15 hours of concentrated work a week.
Exams
There are two exams — mid-term and final. Both exams do not require programming but conceptual understanding. The difficulty level is similar to homework. It is not hard to achieve a passing grade.
I do expect myself to continue to delve into the vast and evolving field of machine learning, and to see how I could use it for my intellectual pursuit. After all, the joy of learning is the process itself.
Until next story.