GSSP Courses
Check out the schedule of courses here.
Introductory Data Science with Python and Tableau
DSA1361
Introductory Data Analysis with Python and Tableau (2 units)
Instructor
Prof Chan Yiu Man
Course summary
This course will provide participants with a foundation on what data science is. There will be a focus on linking business questions to statistical techniques, and linking analytical results to business value. By the end of the course, participants will know how to make sense of data using simple statistical techniques and how best to visualize data. Two software that are very widely used in the data science industry will be introduced in this class: Tableau for data visualisation and presentation, and Python for data analysis.
Syllabus
1. Ideas for data visualisation. In this topic we cover some general recommendations when making visualisations. For instance, we discuss the use of colours, types of plots, good graphics and bad graphics.
2. Methods for Data Visualisation. We introduce Tableau; an intuitive software for creating multivariate interactive graphics.
3. Exploratory Data Analysis. We introduce data summaries,transformations, outlier inspection and other such tools to understand the data we have before we proceed to a deeper analysis.
4. Hypothesis testing. Tests based on linear models (t-tests, ANOVA) will be used to introduce the concepts of hypothesis testing and statistical logic.
5. Linear regression. We introduce the assumptions behind the linear regression model. We then demonstrate model fitting and residual analysis to fully comprehend the model and analysis.
6. Topics 3 to 6 will be covered in Python, through the use of Jupyter notebooks, which are widely used in the data science industry. This will enable students to easily pick up and use source code from repositories such as github and bitbucket. Thus, the course will also introduce git – the version control software that is used by almost all data scientists.
Preferred basic knowledge
NIL
Assessments
Quizzes/ Tests: 60% (3 quizzes, 20% each)
Project/ Group Project: 30%
Class Participation: 10%
Prerequisites and preclusions (for NUS students)
NIL
This course will provide participants with a foundation on what data science is and will focus on linking business questions to statistical techniques, and linking analytical results to business value. Participants will learn how to make sense of data using simple statistical techniques and how best to visualize data. Tableau for data visualisation and presentation, and Python for data analysis will be introduced in the class.
(Can be read together with DSA2362)
Decision Trees for Machine Learning and Data Analysis
DSA2362
Decision Trees for Machine Learning and Data Analysis (2 units)
Instructor
Prof Loh Wei-Yin
Course summary
Decision tree methods predict the value of a target variable by learning simple decision rules from the data. In this course, participants will learn decision tree methods and how to use software to build predictive models and score variables in terms of their importance. They will use real data to compare the strengths and weaknesses of decision tree models with those obtained by linear and logistic regression and discriminant analysis. Participants will also learn how to handle data with missing values without requiring prior imputation. Possible applications include economic surveys, credit card data, vehicle crash tests data and precision medicine.
Preferred basic knowledge
- Enrolled students should bring a laptop to use during class.
- Those without prior experience with using R (write functions, install and use packages) will be asked to take a few DataCamp R courses to pick up such necessary skills. DataCamp courses are online courses.
Assessments
Class Participation: 10%
Essay: 15%
Quizzes/Tests: 75% (3 tests, 25% each)
Prerequisites and preclusions (for NUS students)
Prerequisite: DSA1361 or department approval (Can be read together with DSA1361)
Instructor’s Profile
Prof Loh Wei-Yin has BSc. (Hons.) and MSc. degrees in mathematics from the University of Singapore and a PhD in statistics from the University of California, Berkeley, and is currently Professor of Statistics at the University of Wisconsin, Madison. He has been developing algorithms for classification and regression trees for thirty-five years and is the author of the GUIDE algorithm (www.stat.wisc.edu/~loh/guide.html). He has taught short and semester-long courses on the subject in the U.S., Hong Kong, Taiwan, South Korea, Malaysia, and Singapore. Professor Loh is a fellow of the American Statistical Association and the Institute of Mathematical Statistics and a consultant to government and industry. He is a recipient of the Benjamin Reynolds Award for teaching, the U.S. Army Wilks Award for statistics research and application, and an Outstanding Science Alumni Award from the National University of Singapore.
Decision tree methods predict the value of a target variable by learning simple decision rules from the data. Use real data to compare the strengths and weaknesses of decision tree models with those obtained by linear and logistic regression and discriminant analysis. Possible applications include economic surveys, credit card data, vehicle crash tests data and precision medicine.
(Can be read together with DSA1361)
Note:
- Courses are taught on a graded basis and grade(s) will be reflected on your transcript.
- Courses listed above are correct at the time of update, but may be subject to changes.
- There may be further updates to the curriculum of the courses listed.
- The Faculty of Science reserves the right to cancel any course, if there is insufficient enrolment to start a class.