Introduction à la data science

Catalog of Institut Mines-Télécom Business School courses

Code

MUFF MIS 3002

Level

L3

Field

Systèmes d’information

Language

Français/French

ECTS Credits

3

Class hours

18

Total student load

60

Program Manager(s)

Department

  • Data analytics, Économie et Finances

Educational team

Introduction to the module

This course aims to provide students with the essential foundations of data science, or data analysis, through the use of RStudio (R programming language). It seeks to develop their ability to manipulate, structure, analyze, and interpret datasets in order to transform raw data into meaningful information for decision-making. By the end of the course, students will be able to conduct data analysis independently, produce rigorous statistical results, and present them in a clear and structured manner. Particular attention will be paid to understanding the underlying methods, interpreting results accurately, and developing a critical perspective on the analyses performed.

Learning objectives/Intended learning outcomes

  • 1 - Master advanced and specialised uses of digital intelligence tools, ensuring their sustainable and responsible impact
  • 1.1 - Audit advanced and specialised uses of digital intelligence tools in order to deploy them appropriately, taking into account the strategic context of organisations.
  • 1.2 - Use digital intelligence tools efficiently to support the societal, digital, energy and environmental transformations of organisations, ensuring their sustainable and responsible impact.

Rubrics

- Analyse de données
- Visualisation des données
- Prise de décision basée sur les données

Content : structure and schedule

The course addresses six main topics:

1. Installation and introduction to RStudio and the R language: installation of R and RStudio, importing and exporting datasets, creating and modifying variables, managing and saving datasets, and merging datasets.

2. Descriptive statistics and data visualization: simple and cross-tabulated frequency tables, descriptive statistics for quantitative variables (mean, standard deviation, minimum, maximum, skewness, kurtosis, etc.), and the creation of graphs (scatter plots, histograms, box plots, etc.).

3. Introduction to linear regression: basic principles, interpretation of coefficients and results, and verification of the assumptions required to ensure the validity of the model.

4. Data exploration and preparation: practical application of previously covered concepts to conduct a complete linear regression analysis (data exploration, cleaning, preparation, estimation, and interpretation of results).

5. Detection and treatment of common issues in linear regression: identification, using statistical and graphical methods, of potential problems such as non-linearity, non-normality, multicollinearity, or the presence of outliers. Presentation of possible solutions to address these issues.

6. Discrete dependent variable models: presentation of Probit and Logit models (principles and interpretation), identification of the main potential issues (collinearity, model specification, etc.), and estimation of a Probit model.

Sustainable Development Goals

ODD n°8 "Travail décent et croissance économique": l'analyse de données ou data science devient une outils d'aide à la décision indispensable dans un monde qui se digitalise et crée de plus en plus de données. La maîtrise ou au moins la compréhension de ces outils est de plus en plus demandé sur le marché du travail afin de réaliser des études et analyses sur lesquelles se basent des décisions aussi bien d'entreprises que d'acteurs publics.

Number of SDG's addressed among the 17

1

Learning delivery

Mixte

Pedagogical methods

Le cours est dispensé sous la forme de Travaux Pratiques (TPs). Les étudiants installent sur leur ordinateur ou tablette le logiciel RStudio et ils réalisent les différentes commandes nécessaires à l'exploitation des données.
The course is delivered in the form of practical sessions (labs). Students install the RStudio software on their computer or tablet and carry out the various commands necessary for data analysis and processing.

Evaluation and grading system and catch up exams

Two documents are used for assessment: a written in-class exam (1h, 60% of the final grade) and a data analysis report (40% of the final grade).

The written exam requires students to write the code necessary to execute specific commands.

The data analysis report is completed in four stages:

- 1) Identification of a research question, dataset, and model to be estimated (5% of the final grade).

- 2) In addition to the elements in stage (1), submission of a first draft of the literature review, descriptive statistics, and model results (5% of the final grade).

- 3) In addition to the elements in stages (1) and (2), identification and treatment of common issues (linearity, normality, multicollinearity, etc.) and incorporation of these corrections into the final model (10% of the final grade).

- 4) Final version of the report including all elements indicated in stages (1), (2), and (3) in their completed and revised form (20% of the final grade).

In the case of catch-up exam, a written exam (1h) is organized in which students are required to write the code needed to execute specific commands.

Module Policies

Instructor–Student Communication

The instructor will contact students via their institutional email address (IMT-BS/TSP) and through the Moodle platform. No communication will take place via personal email addresses. It is the student’s responsibility to check their IMT-BS/TSP email account regularly.

Students may contact the instructor by sending an email to his institutional address. If necessary, they may meet him during office hours or by appointment.

Students Requiring Accommodations
If a student has a disability that prevents them from completing the work described or requires any form of accommodation, it is their responsibility to inform the Director of Studies (with supporting documentation) as soon as possible. Students are also encouraged to discuss the matter with their instructor.

Classroom Conduct

As a courtesy to the instructor and other students, all mobile phones, electronic games, or other sound-generating devices must be turned off during class.

Students must avoid any disruptive or disrespectful behavior, such as arriving late, leaving early, engaging in inappropriate conduct (e.g., sleeping, reading unrelated material, using inappropriate language, excessive talking, eating, drinking, etc.). A warning may be given for a first violation of these rules. Offenders will be penalized and may be asked to leave the classroom and/or face additional disciplinary action.

A delay of up to 5 minutes is tolerated. Attendance will be recorded on Moodle during these first 5 minutes via a QR code provided by the instructor at the beginning of each class.

Students must arrive on time for exams and other assessments. No one will be allowed to enter the classroom once the first student has finished the exam and left the room. There are absolutely no exceptions to this rule. No student may continue working once the allotted time has expired. No student may leave the room during an exam unless they have completed their work and submitted all documents.

In the case of online classes, students must keep their camera on unless otherwise instructed by the teacher.

Code of Ethics
IMT-BS is committed to a policy of academic integrity. Any conduct that compromises this policy may result in academic and/or disciplinary sanctions. Students must refrain from cheating, lying, plagiarism, and theft. This means submitting original work and properly acknowledging any individuals whose ideas or printed materials (including those from the Internet) are paraphrased or directly quoted. Any student who violates or assists another student in violating the standards of academic conduct will be sanctioned in accordance with IMT-BS regulations.

Textbook Required and Suggested Readings

- Jeffrey Wooldridge, Introductory Econometrics: A Modern Approach, 3rd Ed., 2006
- William Green, Econometric Analysis, Prentice Hall, 6th Edition, 2008
- Florian Heiss, Using R for introductory econometrics, 2016

Keywords

Data science, langage R, statistiques descriptives, visualisation, régression linéaire, modèle à variable discrète

Prerequisites

Connaissances de base en statistique (variable, distribution d'une variable, moyenne, écart-type, etc.)