I have always emphasized on the importance of mathematics in machine learning.
Here is a compilation of resources (books, videos & papers) to get you going.
(Note: It's not an exhaustive list but I have carefully curated it based on my experience and observations)
Here is a compilation of resources (books, videos & papers) to get you going.
(Note: It's not an exhaustive list but I have carefully curated it based on my experience and observations)

by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong
https://mml-book.github.io/
Note: this is probably the place you want to start. Start slowly and work on some examples. Pay close attention to the notation and get comfortable with it.

by Christopher Bishop
Note: Prior to the book above, this is the book that I used to recommend to get familiar with math-related concepts used in machine learning. A very solid book in my view and it's heavily referenced in academia.

by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
Mote: machine learning deals with data and in turn uncertainty which is what statistics teach. Get comfortable with topics like estimators, statistical significance,...

by E. T. Jaynes
Note: In machine learning, we are interested in building probabilistic models and thus you will come across concepts from probability theory like conditional probability and different probability distributions.

by Dr. Sam Cooper & Dr. David Dye
https://www.youtube.com/playlist?list=PLiiljHvN6z193BBzS0Ln8NnqQmzimTW23
Note: backpropagation is a key algorithm for training deep neural nets that rely on Calculus. Get familiar with concepts like chain rule, Jacobian, gradient descent,.

by Terence Parr & Jeremy Howard
https://arxiv.org/abs/1802.01528
Note: In deep learning, you need to understand a bunch of fundamental matrix operations. If you want to dive deep into the math of matrix calculus this is your guide.

by Dr. Sam Cooper & Dr. David Dye
https://www.youtube.com/playlist?list=PLiiljHvN6z1_o1ztXTKWPrShrMrBLo5P3
Note: a great companion to the previous video lectures. Neural networks perform transformations on data and you need linear algebra to get better intuitions.

by David J. C. MacKay
Note: When you are applying machine learning you are dealing with information processing which in essence relies on ideas from information theory such as entropy and KL Divergence,...