Data Science
A Structured Individual Major
Directors: Eni Mustafaraj (CS), Casey Pattanayak (MATH/QR), Wendy Wang (MATH)
The Data Science major is a structured individual major, consisting of twelve (12) courses that include a concentration area, plus a capstone experience. Students are expected to design their major and concentration in consultation with one of the directors listed above and a second advisor from a department related to the concentration. At least two (2) courses must be at the 300level, and at least one of these must be from STAT or CS as opposed to the concentration. A student can begin the major requirements in the first or second year. She can take MATH 115 and/or MATH 116 in their first year as prerequisites for MATH 205, if needed. Ordinarily, at least statistical modeling, data structures, and two 300level courses must be taken at Wellesley.
The structured individual major in Data Science is large and comprehensive. Students interested in pursuing this major along with another major or minor should consult closely with both the Data Science advisors and the other department. In particular, students should not major in Data Science and minor in statistics or computer science.
Goals of the major: Data Science lies at the intersection of computer science, mathematics, and statistics. A student pursuing a structured individual major in Data Science will develop a strong foundation in all three areas and complete coursework that emphasizes the integration of the three. By completing a concentration in an applied or theoretical field connected to data analysis, students will learn how datadriven knowledge is produced in that field, gain exposure to its foundations and language, and build the perspective needed to work on fieldspecific data problems. The capstone will ensure that students experience the challenges of Data Science research. Students will graduate with the critical thinking needed to pose and refine questions that can be answered with data in an ethical way, the statistical skills needed to draw meaning from data appropriately, the computational skills needed to tackle practical data challenges, and the ability to collaborate, communicate, and critique in the context of modern data.
Major requirements:

Six (6) foundational courses:

Introductory Statistics: Any one of STAT 101, STAT 218, BISC 198, ECON 103, POL 299, PSYC 205, QR 180, or SOC 190

Statistical Modeling: Either QR/STAT 260 or STAT 318 (Students may take both modeling courses and count the second as an elective.)

Introduction to Programming: CS 111

Data Structures: CS 230 (requires CS 111)

Multivariable Calculus: MATH 205 (requires MATH 116)

Linear Algebra: MATH 206 (requires MATH 205)
If a student places out of any foundational course, or substitutes the Quantitative Analysis Institute Summer Course for the modeling requirement, or enrolls in STAT 260 or the QAI Summer Course without first taking introductory statistics, she must choose an additional elective, as listed in (2), so that the total number of courses for the major is twelve (12).

Three (3) electives, including at least one from statistics and at least one from computer science, usually chosen from the following list:

CS 232: Artificial Intelligence

CS 234: Data, Analytics, and Visualization

CS 304: Databases with Web Interfaces

CS 305: Machine Learning

CS 315: Data and Text Mining for the Web

CS 313: Computational Biology

CS 342: Computer Security and Privacy

CS 343: Distributed Computing

STAT 220: Probability

STAT 221: Statistical Inference

STAT 228: Multivariate Data Analysis

STAT/QR 260: Applied Data Analysis

STAT/QR 309: Causal Inference

STAT 318: Regression Analysis and Statistical Models

ECON 203: Econometrics (for students with concentrations related to economics)

This list of electives is not exhaustive, and many other courses in the CS and MATH/STAT curricula or potentially other departments can be appropriate substitutes. We strongly encourage students to talk to the program directors about their interests and learning goals in order to select the most relevant courses for them.

Three (3) electives in an area of concentration, including at least one at the 200 or 300level. Possible concentrations include but are not limited to digital humanities, social justice, data journalism, economics, education, global ecology, molecular bioinformatics, psychology, mathematical/statistical theory, and computer science/data engineering.

Students are expected to complete an experiential capstone as part of the Data Science major. The capstone must be approved by the program directors and may include: a thesis or other independent project; a Quantitative Analysis Institute internship; a research assistantship; or another internship or data consulting experience on or offcampus, during the semester, wintersession, or summer. Students are encouraged to present their work at a conference or poster session.
Honors: A student may achieve honors by writing a thesis, if her GPA in major courses over the 100level meets the college’s requirements. See Academic Distinctions.
Example Concentrations and Course Sequences
We have mapped out six possible sequences. Note that concentrations are not limited to these examples, and you are not restricted to the courses listed below if you choose one of these example concentrations.
Data Science with a Concentration in Social Justice
The concentration in social justice might also include courses from departments other than PEAC, such as economics or sociology, for students with appropriate preparation.
(MATH 115 Calculus I) 
(MATH 116 Calculus II) 
CS 111 Intro to Programming MATH 205 Multivariable Calculus 
STAT 218 Intro Statistics PEAC 104 Intro to Study of Conflict, Justice, and Peace 
STAT 260 Applied Data Analysis CS 230 Data Structures 
MATH 206 Linear Algebra PEAC 204 Conflict Transformation 
CS 234 Data, Analytics, and Visualization STAT 318 Regression Analysis 
PEAC 358 Palestinian Israeli Peace Prospects STAT 309 Causal Inference 
Example Experiential Capstone: Wintersession internship at an NGO, helping to run a study to assess a program’s effectiveness. 
Data Science with a Concentration in Digital Humanities
Instead of ANTH/CLCV 215 or ANTH 246, students might choose to spend a summer taking CLCV/MAS 220 Digital Archaeology in Greece. Students with strong French or Spanish backgrounds might propose sequences that include digital humanities courses taught in those languages.
(MATH 115 Calculus I) 
(MATH 116 Calculus II) 
MATH 205 Multivariable Calculus STAT 101 Reasoning with Data 
CS 111 Intro to Programming MATH 206 Linear Algebra 
STAT 260 Applied Data Analysis ANTH/CLCV 103 Intro to Archaeology 
CS 230 Data Structures ANTH/CLCV 215 Bronze Age Greece: Archaeology and the Digital Humanities 
CS 305 Machine Learning ANTH 246 From Glyphs to Bytes: Ancient Egypt and the Future of Digital Humanities 
CS 315 Data and Text Mining for the Web STAT 228 Multivariate Data Analysis 
Example Experiential Capstone: One semester QAI internship focused on analysis of ancient texts, analyzing patterns in words and symbols to determine similarity of texts in one time period v. another. 
Data Science with a Concentration in Computer Science / Data Engineering
(MATH 115 Calculus I) 
(MATH 116 Calculus II) 
CS 111 Intro to Programming MATH 205 Multivariable Calculus 
CS 230 Data Structures STAT 218 Intro to Statistics 
CS 234 Data, Analytics, and Visualization MATH 206 Linear Algebra 
CS 304 Databases MATH/STAT 220 Probability 
CS 240 Computer Systems STAT 318 Regression Analysis 
CS 231 Algorithms CS 305 Machine Learning 
Example Experiential Capstone: QAI consulting internship, one semester, providing advice to faculty, staff, and students on projects from all across the college. We are providing this example to point out that the experiential capstone does not necessarily have to be tied to the concentration area. 
How is this different from the CS major?

It requires 3 to 7 (typically 5 or 6) MATH/STAT courses (the CS major requires only two).

It specifies 200 and 300 electives from a subset of courses with focus on data science. CS majors can choose from a bigger set of electives (HCI, graphics, systems, etc.)

It drops two of the CS core components (CS 251 and CS 235), so that they are among the pool of possible concentration courses, but not required.
Data Science with a Concentration in Cognitive and Behavioral Science
The Cognitive and Behavioral Science concentration offers substantial flexibility. As an illustration of the breadth of options, two example pathways are highlighted below: a clinical psychology concentration and a cognitive neuroscience concentration. PSYC 205 would count as the introductory statistics course rather than as part of the concentration.
MATH 115 (Calculus I) 
MATH 116 (Calculus II) 
PSYC 101 Intro to Psychology OR NEUR 100 Intro to Neuroscience MATH 205 Multivariable Calculus 
PSYC 205 Statistics CS 111 Intro to Programming 
PSYC 213 Abnormal Psychology OR PSYC 218 Sensation & Perception STAT 260 Applied Data Analysis 
MATH 206 Linear Algebra STAT 309 Causal Inference 
CS 230 Data Structures STAT 228 Multivariate Data Analysis 
PSYC 333 Clinical and Educational Assessment OR PSYC 314R Research Methods in Cognitive Psychology CS 304 Databases 
Example Experiential Capstone: Science Summer Research Program project focused on visualization tools in psychology. 
Data Science with a Concentration in the Life Sciences (for example, Global Ecology or Molecular Bioinformatics)
The Life Sciences concentration should consist of either BISC 110/112 (Introductory Molecular & Cellular Biology) or BISC 111/113 (Introductory Organismal Biology) with a 200level lab course and a 300level lab course in the same area. BISC 198 would count as the introductory statistics course rather than as part of the concentration. Two potential pathways are highlighted below: a global ecology concentration and a molecular bioinformatics concentration.
(MATH 115 Calculus I) BISC 111/113: Introductory Organismal Biology with Lab OR BISC 110/112: Introductory Molecular & Cellular Biology with Lab 
(MATH 116 Calculus II) 
MATH 205 Multivariable Calculus BISC 201 Ecology with Lab OR BISC 209 Microbiology with Lab 
CS 111 Intro to Programming BISC 198 Statistics in the Biosciences 
STAT 260 Applied Data Analysis CS 230 Data Structures 
MATH 206 Linear Algebra CS 234 Data, Analytics, and Visualization 
CS 313 Computational Biology BISC 307 Ecosystem Ecology with Lab OR BISC 333 Genomics and Bioinformatics with Lab 
STAT 228 Multivariate Data Analysis 
Example Experiential Capstone: Senior thesis that involves analyzing large biological data sets. 
Data Science with a Concentration in Economics
A concentration in Economics should include ECON 101, ECON 102, and a 200level elective that includes discussion of empirical work. ECON 103 would count as the introductory statistics course, rather than part of the concentration. For students pursuing concentrations related to economics, ECON 203 could be one of the three electives, rather than part of the concentration.
(MATH 115 Calculus I) 
(MATH 116 Calculus II) 
MATH 205 Multivariable Calculus ECON 101 Principles of Microeconomics 
ECON 102 Principles of Macroeconomics CS 111 Intro to Programming 
ECON 103 Introduction to Probability and Statistics Methods CS 230 Data Structures 
MATH 206 Linear Algebra ECON 203 Econometrics 
STAT 260 Applied Data Analysis ECON 229 Women in the Economy 
STAT 309 Causal Inference CS 315 Data and Text Mining for the Web 
Example Experiential Capstone: Summer internship at the Federal Reserve, Division of Research and Statistics. 