Welcome to DS-1. The objective of this module is to provide a fundamental understanding of data analysis. The course proceeds in 3 parts, following the Data Science Process:
Obtain and clean the data: we will teach you how to obtain, clean, and process data from different sources such as scraped web pages, spreadsheets, APIs, and documents.
Exploratory Data Analysis: we develop your skills in pre-modeling and post-modeling exploratory data analysis and visualization. This part is all about understanding your data.
Modeling: We choose some very specific models to cover, from the perspective of teaching techniques which are generalizable to any models. Thus we will cover classification and recommendation engines. We’ll also cover similarity and PCA, as a way of understanding structure in your data and the models you ran on them.
At the end of this module, you will be ready to run the entire data science process, all on your own, from fetching (from the internet or databases) and cleaning the data, setting up pipelines, to exploratory data analysis, visualization, and modeling.
This page introduces you to the team, the basic instructions, the schedule, and various elements of our class.
(Dr.) Ignacio Becker
- Astronomer currently pursuing a Ph.D. in Computer Science at Pontificia Universidad Católica in Chile.
- His main area of research is applied AI to astrophysical problems.
- Nowadays, he focuses on developing models to process the real-time data of the next generation of telescopes.
The teaching assistants for the duration of this course are:
- Arya is working as a research and teaching fellow with Univ.AI.
- Previously, she was a data analyst at Schneider Electric.
- She is currently working as a research fellow at the StellarDNN lab.
- Harsh Vardhan completed Master AI and ML with univ.ai and is currently a teaching assistant. He is passionate about ai enabled sustainable development.
- He also enjoys climbing/bouldering and running.
- Anshika is a deep learning enthusiast currently a final year undergrad at JECRC University, Jaipur.
- Previously, she was a Research Intern at a healthcare startup where she worked on developing algorithms for medical image analysis and segmentation.
- When not studying, she can be found writing prose and poetry while sipping a cup of coffee.
- Sakthisree is a Machine Learning Lead in a leading German-based wholesale company.
- Her current goal is to establish autonomous systems that are able to comprehend the world for its multi-modal richness and dimensionality through casual inference, which she is pursuing through independent research.
- She is also very active in the non-profit space where she leads courses, workshops and panels in the area of Tech Equity + Society, specifically catering to principles of Intersectionality and Social Justice in the Global South.
- Previously worked as a freelance full stack developer.
- Fascinated about AI and it’s ability to solve complex problems.
We have very carefully designed the coursework to give you, the student, a wholesome learning experience. Each week shall include:
- 2 Sessions
- 1 Lab
- Office hours
Session - What to expect
Before the session begins, students are expected to complete a pre-class reading assignment and attempt a quiz based on the same.
A session will have the following pedagogy layout which will be repeated a few times:
- Approx. 10-15 minutes of live online instruction followed by a quiz
- Some sessions will have hands-on coding exercises or group activities
- Sessions will help students develop the intuition for the core concepts, provide the necessary mathematical background, and provide guidance on technical details.
- Sessions will be accompanied by relevant examples to clarify key concepts and techniques.
After the session, students are expected to complete a short post-class quiz based on the principal concepts covered in class and optional post-class reading will be provided.
Lab - What to expect
A lab is a TA driven 1.5 hour session that is divided into 3 major parts.
- Each lab begins by solving parts of a complete problem. This problem is designed to help you with your homework and further elucidate concepts you learned in lecture.
- After discussing exercises, we will have a semi-formal Q/A session. The first part of this session is limited to homework questions, but the second part is more free-for-all, where you can ask any doubts that lingered over from lecture.
You are expected to have programming experience and basic machine learning concepts such as model fitting, test-validation, regularization, etc.
- Programming Experience:
- Pandas - Specific topics: Introducing Pandas Objects Data, Indexing and Selection
- NumPy - Specific topics: Understanding Data Types in Python, The Basics of NumPy Arrays, Computation on Arrays: Broadcasting, Comparisons, Masks, and Boolean Logic
- Matplotlib - Specific topics: Simple Line Plots, Simple Scatter Plots, Visualizing Errors, Density and Contour Plots, Histograms, Binnings, and Density
- Sklearn API
- Machine Learning Experience:
- Loss functions
- Overfitting and regularization
- Regression and classification
Diversity & Inclusion
We actively seek and welcome people of diverse identities, from across the spectrum of disciplines and methods since Artificial Intelligence (AI) increasingly mediates our social, cultural, economic, and political interactions .
We believe in creating and maintaining an inclusive learning environment where all members feel safe, respected, and capable of producing their best work.
We commit to an experience for all participants that is free from – Harassment, bullying, and discrimination which includes but is not limited to:
- Offensive comments related to age, race, religion, creed, color, gender (including transgender/gender identity/gender expression), sexual orientation, medical condition, physical or intellectual disability, pregnancy, or medical conditions, national origin or ancestry.
- Intimidation, personal attacks, harassment, unnecessary disruption of talks during any of the learning activities.
 K. Stathoulopoulos and J. C. Mateos-Garcia, “Gender Diversity in AI Research,” SSRN Electronic Journal, 2019 [Online]. Available: http://dx.doi.org/10.2139/ssrn.3428240.
Logistics - What you need to begin?
Education software we use
- Our lectures and labs are carried out via Zoom (install instructions).
- Quizzes & exercises will be conducted on the digital learning platform Ed.
- Ocassionally, we may conduct in-class contests on kaggle. Please register on kaggle and familarize yourself with it, if you haven’t already done so. This is a short video that will help you learn how to use kaggle.
All exercises and homeworks in this course will be done in jupyter notebooks. This link will help you setup jupyter lab and get you acquianted with jupyter notebooks.
Our module policies around collaboration and grading are listed here. Our expectations of you are also laid out in that document.
As you will learn in this course, data science is not just about writing efficient algorithms.
It requires proficiency in critical thinking, ideation & presentation, along with strong foundations in statistics, computer science & mathematics.
Keeping that in mind, you are adviced to give your full active attention to every session, homework & exercise.
We wish you best of luck for your data science journey.