Major

Data Science

A Structured Individual Major

Directors: Eni Mustafaraj (CS), Casey Pattanayak (MATH/QR), Wendy Wang (MATH)

The Data Science major is a structured individual major, consisting of twelve (12) courses that include a concentration area, plus a capstone experience. Students are expected to design their major and concentration in consultation with one of the directors listed above and a second advisor from a department related to the concentration. At least two (2) courses must be at the 300-level, and at least one of these must be from STAT or CS as opposed to the concentration. A student can begin the major requirements in the first or second year. She can take MATH 115 and/or MATH 116 in their first year as prerequisites for MATH 205, if needed. Ordinarily, at least statistical modeling, data structures, and two 300-level courses must be taken at Wellesley.

 

The structured individual major in Data Science is large and comprehensive. Students interested in pursuing this major along with another major or minor should consult closely with both the Data Science advisors and the other department. In particular, students should not major in Data Science and minor in statistics or computer science.

 

Goals of the major: Data Science lies at the intersection of computer science, mathematics, and statistics. A student pursuing a structured individual major in Data Science will develop a strong foundation in all three areas and complete coursework that emphasizes the integration of the three. By completing a concentration in an applied or theoretical field connected to data analysis, students will learn how data-driven knowledge is produced in that field, gain exposure to its foundations and language, and build the perspective needed to work on field-specific data problems. The capstone will ensure that students experience the challenges of Data Science research. Students will graduate with the critical thinking needed to pose and refine questions that can be answered with data in an ethical way, the statistical skills needed to draw meaning from data appropriately, the computational skills needed to tackle practical data challenges, and the ability to collaborate, communicate, and critique in the context of modern data.

 

Major requirements:

 

  1. Six (6) foundational courses:

 

  1. Introductory Statistics: Any one of STAT 101, STAT 218, BISC 198, ECON 103, POL 299, PSYC 205, QR 180, or SOC 190

  2. Statistical Modeling: Either QR/STAT 260 or STAT 318 (Students may take both modeling courses and count the second as an elective.)

  3. Introduction to Programming: CS 111

  4. Data Structures: CS 230 (requires CS 111)

  5. Multivariable Calculus: MATH 205 (requires MATH 116)

  6. Linear Algebra: MATH 206 (requires MATH 205)

    If a student places out of any foundational course, or substitutes the Quantitative Analysis Institute Summer Course for the modeling requirement, or enrolls in STAT 260 or the QAI Summer Course without first taking introductory statistics, she must choose an additional elective, as listed in (2), so that the total number of courses for the major is twelve (12).

 

  1. Three (3) electives, including at least one from statistics and at least one from computer science, usually chosen from the following list:

    1. CS 232: Artificial Intelligence

    2. CS 234: Data, Analytics, and Visualization

    3. CS 304: Databases with Web Interfaces

    4. CS 305: Machine Learning

    5. CS 315: Data and Text Mining for the Web

    6. CS 313: Computational Biology

    7. CS 342: Computer Security and Privacy

    8. CS 343: Distributed Computing

    9. STAT 220: Probability

    10. STAT 221: Statistical Inference

    11. STAT 228: Multivariate Data Analysis

    12. STAT/QR 260: Applied Data Analysis

    13. STAT/QR 309: Causal Inference

    14. STAT 318: Regression Analysis and Statistical Models

    15. ECON 203: Econometrics (for students with concentrations related to economics)

 

    This list of electives is not exhaustive, and many other courses in the CS and MATH/STAT curricula or potentially other departments can be appropriate substitutes. We strongly encourage students to talk to the program directors about their interests and learning goals in order to select the most relevant courses for them.

 

  1. Three (3) electives in an area of concentration, including at least one at the 200- or 300-level. Possible concentrations include but are not limited to digital humanities, social justice, data journalism, economics, education, global ecology, molecular bioinformatics, psychology, mathematical/statistical theory, and computer science/data engineering.

 

  1. Students are expected to complete an experiential capstone as part of the Data Science major. The capstone must be approved by the program directors and may include: a thesis or other independent project; a Quantitative Analysis Institute internship; a research assistantship; or another internship or data consulting experience on or off-campus, during the semester, wintersession, or summer. Students are encouraged to present their work at a conference or poster session.

 

Honors: A student may achieve honors by writing a thesis, if her GPA in major courses over the 100-level meets the college’s requirements. See Academic Distinctions.

 

Example Concentrations and Course Sequences

 

We have mapped out six possible sequences. Note that concentrations are not limited to these examples, and you are not restricted to the courses listed below if you choose one of these example concentrations.

 

Data Science with a Concentration in Social Justice

 

The concentration in social justice might also include courses from departments other than PEAC, such as economics or sociology, for students with appropriate preparation.

 

(MATH 115 Calculus I)

(MATH 116 Calculus II)

CS 111 Intro to Programming

MATH 205 Multivariable Calculus

STAT 218 Intro Statistics

PEAC 104 Intro to Study of Conflict, Justice, and Peace

STAT 260 Applied Data Analysis

CS 230 Data Structures

MATH 206 Linear Algebra

PEAC 204 Conflict Transformation

CS 234 Data, Analytics, and Visualization

STAT 318 Regression Analysis

PEAC 358 Palestinian Israeli Peace Prospects

STAT 309 Causal Inference

Example Experiential Capstone: Wintersession internship at an NGO, helping to run a study to assess a program’s effectiveness.

 

 

Data Science with a Concentration in Digital Humanities

 

Instead of ANTH/CLCV 215 or ANTH 246, students might choose to spend a summer taking CLCV/MAS 220 Digital Archaeology in Greece. Students with strong French or Spanish backgrounds might propose sequences that include digital humanities courses taught in those languages.

 

(MATH 115 Calculus I)

(MATH 116 Calculus II)

MATH 205 Multivariable Calculus

STAT 101 Reasoning with Data

CS 111 Intro to Programming

MATH 206 Linear Algebra

STAT 260 Applied Data Analysis

ANTH/CLCV 103 Intro to Archaeology

CS 230 Data Structures

ANTH/CLCV 215 Bronze Age Greece: Archaeology and the Digital Humanities

CS 305 Machine Learning

ANTH 246 From Glyphs to Bytes: Ancient Egypt and the Future of Digital Humanities

CS 315 Data and Text Mining for the Web

STAT 228 Multivariate Data Analysis

Example Experiential Capstone: One semester QAI internship focused on analysis of ancient texts, analyzing patterns in words and symbols to determine similarity of texts in one time period v. another.

 

 

Data Science with a Concentration in Computer Science / Data Engineering

 

(MATH 115 Calculus I)

(MATH 116 Calculus II)

CS 111 Intro to Programming

MATH 205 Multivariable Calculus

CS 230 Data Structures

STAT 218 Intro to Statistics

CS 234 Data, Analytics, and Visualization

MATH 206 Linear Algebra

CS 304 Databases

MATH/STAT 220 Probability

CS 240 Computer Systems

STAT 318 Regression Analysis

CS 231 Algorithms

CS 305 Machine Learning

Example Experiential Capstone: QAI consulting internship, one semester, providing advice to faculty, staff, and students on projects from all across the college. We are providing this example to point out that the experiential capstone does not necessarily have to be tied to the concentration area.

 

 

How is this different from the CS major?

  • It requires 3 to 7 (typically 5 or 6) MATH/STAT courses (the CS major requires only two).

  • It specifies 200 and 300 electives from a subset of courses with focus on data science. CS majors can choose from a bigger set of electives (HCI, graphics, systems, etc.)

  • It drops two of the CS core components (CS 251 and CS 235), so that they are among the pool of possible concentration courses, but not required.

 

Data Science with a Concentration in Cognitive and Behavioral Science

 

The Cognitive and Behavioral Science concentration offers substantial flexibility. As an illustration of the breadth of options, two example pathways are highlighted below: a clinical psychology concentration and a cognitive neuroscience concentration. PSYC 205 would count as the introductory statistics course rather than as part of the concentration.

 

MATH 115 (Calculus I)

MATH 116 (Calculus II)

PSYC 101 Intro to Psychology OR

NEUR 100 Intro to Neuroscience

MATH 205 Multivariable Calculus

PSYC 205 Statistics

CS 111 Intro to Programming

PSYC 213 Abnormal Psychology OR

PSYC 218 Sensation & Perception

STAT 260 Applied Data Analysis

MATH 206 Linear Algebra

STAT 309 Causal Inference

CS 230 Data Structures

STAT 228 Multivariate Data Analysis

PSYC 333 Clinical and Educational Assessment OR

PSYC 314R Research Methods in Cognitive Psychology

CS 304 Databases

Example Experiential Capstone: Science Summer Research Program project focused on visualization tools in psychology.

 

 

Data Science with a Concentration in the Life Sciences (for example, Global Ecology or Molecular Bioinformatics)

 

The Life Sciences concentration should consist of either BISC 110/112 (Introductory Molecular & Cellular Biology) or BISC 111/113 (Introductory Organismal Biology) with a 200-level lab course and a 300-level lab course in the same area. BISC 198 would count as the introductory statistics course rather than as part of the concentration. Two potential pathways are highlighted below: a global ecology concentration and a molecular bioinformatics concentration.

 

(MATH 115 Calculus I)

BISC 111/113: Introductory Organismal Biology with Lab OR

BISC 110/112: Introductory Molecular & Cellular Biology with Lab

(MATH 116 Calculus II)

MATH 205 Multivariable Calculus

BISC 201 Ecology with Lab OR

BISC 209 Microbiology with Lab

CS 111 Intro to Programming

BISC 198 Statistics in the Biosciences

STAT 260 Applied Data Analysis

CS 230 Data Structures

MATH 206 Linear Algebra

CS 234 Data, Analytics, and Visualization

CS 313 Computational Biology

BISC 307 Ecosystem Ecology with Lab OR

BISC 333 Genomics and Bioinformatics with Lab

STAT 228 Multivariate Data Analysis

Example Experiential Capstone: Senior thesis that involves analyzing large biological data sets.

 

 

Data Science with a Concentration in Economics

 

A concentration in Economics should include ECON 101, ECON 102, and a 200-level elective that includes discussion of empirical work. ECON 103 would count as the introductory statistics course, rather than part of the concentration. For students pursuing concentrations related to economics, ECON 203 could be one of the three electives, rather than part of the concentration.

 

(MATH 115 Calculus I)

(MATH 116 Calculus II)

MATH 205 Multivariable Calculus

ECON 101 Principles of Microeconomics

ECON 102 Principles of Macroeconomics

CS 111 Intro to Programming

ECON 103 Introduction to Probability and Statistics Methods

CS 230 Data Structures

MATH 206 Linear Algebra

ECON 203 Econometrics

STAT 260 Applied Data Analysis

ECON 229 Women in the Economy

STAT 309 Causal Inference

CS 315 Data and Text Mining for the Web

Example Experiential Capstone: Summer internship at the Federal Reserve, Division of Research and Statistics.