By its nature, data science uses ideas and methodologies from computer science and statistics, along with field-specific knowledge, to describe, learn and predict. Recently, storytelling has been highlighted as an important extension of more traditional data science skills such as coding and modeling. Three courses in our new Master in Data Science and Analytic Storytelling program were designed to include interdisciplinary modules, mainly taught by faculty in storytelling-related disciplines, such as Communication and Art & Design. These courses were PDAT 622: Narrative, Argument, and Persuasion in Data Science; PDAT 624: Principles of Design in Data Visualization; and PDAT 625: Big Data Ethics and Security.
Our first cohort serves as a natural case study, allowing us to reflectively analyze our materials and an informal student survey to explore the effects of interdisciplinarity in these novel courses. Results of the student survey show that students generally found value in these interdisciplinary course components, especially in course “signature assignments,” which allow students to actively engage with course content while reinforcing technical skills from previous courses. Examples of these signature assignments are presented in this paper’s supplementary materials.
This paper proposes a nonuniform subsampling method for finite mixtures of regression models to reduce large data computational tasks. A general estimator based on a subsample is investigated, and its asymptotic normality is established. We assign optimal subsampling probabilities to data points that minimize the asymptotic mean squared errors of the general estimator and linearly transformed estimators. Since the proposed probabilities depend on unknown parameters, an implementable algorithm is developed. We first approximate the optimal subsampling probabilities using a pilot sample. After that, we select a subsample using the approximated subsampling probabilities and compute estimates using the subsample. We evaluate the proposed method in a simulation study and present a real data example using appliance energy data.