Students will demonstrate competency in research:
- Critically evaluate the published scholarly record.
- Critically apply the theories and methodologies of data science to new research in their primary area of study.
- Apply appropriate principles, frameworks, and models to evaluate and interpret the frontiers of knowledge in their primary area of study.
- Demonstrate expository and oral communication skills appropriate to a Ph.D., publishing and presenting work in their field.
- Critique data practices for ethical issues, including discriminatory practices, power imbalances, and invasions of privacy.
- Demonstrate advanced competency in data science tools and techniques, applied statistical analysis, and a domain area relevant to their area of specialization.
- Develop a record of relevant scholarship.
- Demonstrate an ability to conduct independent, original research with a depth of knowledge in the chosen area of specialization.
Students will demonstrate competency in data analytics:
- Design and execute ethical research using quantitative and experimental methods.
- Organize, visualize, and analyze large, complex datasets using descriptive statistics and graphs to make decisions.
- Apply inferential statistics, predictive analytics, and data mining to informatics-related fields.
- Analyze datasets with supervised learning methods for functional approximation, classification, and forecasting and unsupervised learning methods for dimensionality reduction and clustering.
- Identify, assess, and select appropriately among data analytics methods and models for solving a particular real-world problem, weighing their advantages and disadvantages.
- Write programs to perform data analytics on large, complex datasets.
Students will demonstrate competency in data management and infrastructure:
- Design and implement relational databases using commercial database management systems according to database concepts and theory.
- Diagram a relational database design based on an identified scenario.
- Produce database queries using SQL.
- Perform database administration tasks.
- Describe the data management activities associated with the data lifecycle.
- Overcome difficulties in managing very large datasets, both structured and unstructured, using nonrelational data storage and retrieval (NoSQL), parallel algorithms, and cloud computing.
- Apply the MapReduce programming model to data-driven discovery and scalable data processing for scientific applications.