# 1 The parisian house market and data.gouv.fr

1. Using the open data from data.gouv.fr, plot the evolution of the average Parisian market price per transaction as a function of the year. Add confidence intervals. Note all the assumptions you make.
• The data can be found here https://files.data.gouv.fr/geo-dvf/2022-06/csv/

• 💡 You may have to use functions such as rbind, lubridate (for date and time management, see also), and group_by.

• 🧹 When doing it, try to obtain clean code, with for example a header with assumptions, a part with raw data treatment, and a part with plots.

# INTRODUCTION

## Assumptions and parameters for raw data management
VALEUR_FONCIERE_MAX = 2000000
VALEUR_FONCIERE_MIN = 50000

## Libraries
library(ggplot2)
library(dplyr) # group_by
library(lubridate) # function to treat automatically year

# PLOTS
1. Plot a linear model of the price of the flats, as function of the appartment’s surface. Repeat this with the effect of year. Do you see a difference?
• For this question, there are several solutions. One is to use geom_smooth.
1. Now, can you count the number of transaction per month in 2020? Do you see something?
• For this question you may have to use the function group_by and n() to count the number of rows per group.

# 2 Building your own dashboard

Now, your boss asks you to propose an interface (or dashboard) of the Parisian house market price. The goal is that someone should be able to filter the previous graphic by “arrondissements”.

Some ressources to build your own dashboard:

Some of you have declared in the form the interest in Twitter data. One of the first step is to gather data. You can try to do so with rtweet following this tutorial.