Bayesian inference, Central Limit Theorem, Stochastic Process (Bernoulli, Poisson, Markov chains) are just a few takeaways from this amazing course. Probability — The Science of Uncertainty and Data (I will just call it Probability Course) is part of MITx MicroMasters program in Statistics and Data Science. This is a great course for the depth and breadth it offers. I've taken this course to solidify my understanding of probability theory and it offers more than I anticipated. In this post, I will share my overall impression, dissecting the course and share what it takes to get the most out of it.
MIT has been a pioneer in massive open online course (MOOC). The course materials and lectures are top-notch. My key takeaway from the Probability Course are not the theorems and their application (these are very important indeed), but trained intuition. Initially, I did not quite get what the professors meant by intuition. One of the reasons that I learn probability theory is to fight my intuition, which often goes wrong. I want to learn a structured way to understand our world and reality. Along the course, I start to sense what the professors mean by intuition. In fact, this intuition is logically trained so that it ingrains in you, not the one we are born with. This intuition requires us to constantly reflect what the theorem actually says, what the underlying assumptions are and how we associate it to real-life phenomena.
Throughout the course, we are taught a bunch of tools for calculating the probability of uncertainty and estimating unknown risks, yet the key tool I learnt, is a systematic way of reasoning to understand and eventually solve problems. Having said this, I do not mean math is not important for the Course. There are a number of mathematic proof to ensure that our intuition is correct but the math here is used to elevate your understanding. It is not required that you could derive the theorem on your own. This is a mathematic-intensive course nonetheless. It is intensive in the sense that you learn to judge what math skills to use in what scenarios. There is little tedious calculation but a lot of understanding and judgement, leading to a trained intuition.
Here is a summary of my takeaways. Some of these are insights from the live webinar with Prof. John Tsitsiklis — MIT professor who taught 90% of the course.
- Trained intuition — systematic way of reasoning:
- define problems — what are you trying to understand or find out, e.g. buy a good car
- define your sample space — what are your options, e.g. different car brands
- define your random variables and their probabilistic distributions— what are the factors relevant to your car choices, e.g. size (10%), safety features(40%), fuel efficiency (30%), look (20%) and how important is each of these factor to you.
- choose a suitable probabilistic theorem or model — what is the reasonable tool to access the problem
- translate all these into mathematic equations and use your chosen theorem or model to calculate it, e.g. multiplication rules: size (5 *10%) + safety features(8 *40%)… you will have to measure the factors first.
This line of thinking helps to avoid impulsiveness and bias. It forces you to pulse, step back and really evaluate the problem at hand.
2. Divide and conquer. Divid complex problems into small pieces and solve it one by one. Sound straightforward but often neglected.
3. Actively shift your mind to look for alternative way of interpreting problems. Don't be stubborn! This is what I felt when solving some problem sets. Instinctively, I tend to prefer mathematic derivation but oftentimes, there are smarter ways to solve them. All I need is a pair of fresh eyes and an open mind.
4. Of course, I learn hard skills too, such as Bayesian inference, Least Mean Squares (LMS) estimation, Maximum a posteriori probability (MAP), linear models, Central Limit Theorem, Stochastic Process (Bernoulli, Poisson, Markov chains) and etc.
The course follows a well-planned syllabus and calendar closely. You will know exactly when lectures, exercises, problem sets and exams are released and due. It is comforting to follow a plan. You can expect one problem set due each week and one new lecture being released. The passing score is overall 60% (achievable). Details of grade components can be easily accessed via edX.
What I would emphasis is that you really need to plan a study schedule and try to stick to it. This is not a crash course. The content takes time and mental energy to sink in and the problem sets take time (could be hours for one problem, if unfortunately, you stuck) to do. Having a schedule and sticking to it give you a sense of flow and progress. This is important to keep your momentum. I love to study in the morning when my mind is clear and mental energy is sufficient after caffeine (Yes, the modern drug that everyone is legal to inject). Less ideally, afterwork hours also do but I will do exercise and eat before to boost some dopamine. A light dinner without any carbon-hydrate works best if I need to study at night. Again, the key here is to have a schedule and stick to it.
Official prerequisites — where you start with:
College-level calculus (single-variable & multivariable). Comfort with mathematical reasoning; and familiarity with sequences, limits, infinite series, the chain rule, and ordinary or multiple integrals.
Above are MIT's official prerequisites for the course. The following is what I add to avoid frustration and quitting.
Lacking enough math skills could impede your understanding and confidence in this course. If these following terms look a bit rusty to you, it would be good to have a review. If they look completely strange to you, you would benefit from a calculus course. For both review and new study, I recommend again MIT's calculus course. I audited the course material on edX before as a review but it seems that this calculus course is no longer available on edX. It is now hosted on MIT's own open course platform.
Calculus skills needed: chain rule, product rule, power rule, general derivative calculation, exponential functions and their derivatives, logarithms and their derivatives, leibniz notation, the concept of derivative in terms of geometry — the tangent line, basic integral formulas, basic algebra (factorials, multinomial equations etc). All these are frequently used.
Nice to have:
The followings are not necessary to know before the course. They are the "icing on the cake".
Integration by parts. This will help you understand hard problems smoother.
Basic understanding of probability. This will help you start off with confidence and interest.
Basic programming in R or Python. I saw some classmates use Monte Carlo simulation in R to help them understand. For me, I do not find it necessary. You will be fine without any computational skills.
Course materials — where you get your knowledge:
The first few lectures are about discrete space and counting, which I feel quite comfortable with. However, the course escalates quickly when you enter the continuous space, followed by Bayesian inference and its different forms, Central Limit Theorems, then enter the stochastic process domain: Bernoulli process, Poisson process (merging and splitting) then the endless Markov chains (a bit of queueing theory). Expect to spend twice, if not triple, study time starting from the Bayesian inference unit.
Handouts and videos are clearly illustrated and annotated. Prof. John Tsitsiklis is charming, even through recorded videos. His way of illustrating a problem is engaging and he shows you how to think. As much as I love the liveliness and human interaction of in-person class, I adore online classes because I can replay to the exact point where I feel confused; whereas in class, it is practically impossible. I wish to tell my professor: " could you just go back about 6 seconds? Yes, right there". And sometimes, interrupting to ask a math problem in class feels stupid and intimidating. I love that studying online allows me to freely stop anytime to check out a concept before continuing.
Exercises, problem sets and exams — where you get your grades:
It is hard to evaluate the difficulty of the course in general. It really depends on your prior math and probability knowledge. For me, it is challenging. It means, it is not a course that I could leisurely glide through; yet it is not a course that I worry so much that I could not sleep. It is a rigorous course and it is mentally challenging.
Expected workload: lectures and follow-up exercises (about 4–6 hours), solved problems videos (about 2-3 hours, these are teaching assistants' video where they show you step-by-step problem solving), additional material (about 1–2 hours, these are optional videos that offer you more information), problem sets (3–5 hours). So I guess, 10–16 hours per week. These are concentrated study hours, not counting that you may get distracted and have to recalibrate. (been there)
Exams are open-book and timed (48 hours once you started). Scores are released in around one week. Difficulty level is between exercise (easy) to problem sets (hard).
Discussion forum and course operations — where you get interaction:
I love the discussion forum. It is lively. Every day there are many threads. Fellow learners are posting questions and answering questions. I answered a few and benefited a lot from other learners' input. Learning online can be lonely sometimes. When you have a question or get a wrong answer, you eagerly want to know why, but there is no one you can discuss to (families and friends do not give it a shit about the time the fifth ship arrival (poisson), expected time a lightbulb burnout (exponential function) or when a steady-state may have reached (Markov chain)). You need people who share the same context. The discussion forum is an amazing community. I could often get detailed answer either from fellow learners or TAs in a few days. I am thankful for all the kind help from my fellow learners.
Final words are: I recommend this course for anyone who wish to think systematically; though admittedly, it is a heavy course if one does it for interest. Despite the fact that this course would be especially useful for researchers and academics, many of the concepts are surprisingly general and of great usefulness in helping you navigate your work, life and make decisions on all sorts of events. Be prepared to change your mind by what the data could tell you about events in your life. As an example, here is the Bayesian Rule:
This example is for concept illustration, not meant for actual calculation. Say Event A: flight crash; Event B: Friday.
P(A|B) = P(flight crash| on a Friday) means the probability of a flight crash, given that the day is a Friday. This probability is related to the following events:
P(B|A) = P(on a Friday|flight crash) means that the probability of a Friday, given that there is a flight crash.
P(A) = P (flight crash) is the probability of a flight crash without any given conditions.
P(B) = P(Friday) is the probability of Fridays in a year. Roughly, there are 52 Fridays/365 days
Using Bayesian Rule, we calculate the probability in the entire sample space of flight crashes. However, when we infer this probability, our mind's sample space is narrowed down to what we knew, i.e. how many flight crashes that we saw in the news and how many of them happened on Fridays. If you read about 3 flight crashes and 1 of them happens on a Friday, for you the probability is 33.3%. In fact, there were 39 (source: IATA) crashes in 2022, so on average, there is 39 x 52/365 = 5.6 could happen on Friday. It is about 14%. What I am trying to say is that our mind is limited to the information we knew. This course is a great way to train your mind and help you think more systematically and critically.