Commit e3e6d397 authored by François  LAURENT's avatar François LAURENT
Browse files

scikit-learn

parent 31e82fb1
%% Cell type:markdown id:b955dcc3 tags:
<img alt="https://allisonhorst.github.io/palmerpenguins/" src="https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/man/figures/lter_penguins.png" width=60% />
%% Cell type:code id:416c5291 tags:
``` python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
```
%% Cell type:code id:9d9ba356 tags:
``` python
penguins = pd.read_csv("https://github.com/allisonhorst/palmerpenguins/raw/5b5891f01b52ae26ad8cb9755ec93672f49328a8/data/penguins_size.csv")
penguins.head()
```
%%%% Output: execute_result
species_short island culmen_length_mm culmen_depth_mm \
0 Adelie Torgersen 39.1 18.7
1 Adelie Torgersen 39.5 17.4
2 Adelie Torgersen 40.3 18.0
3 Adelie Torgersen NaN NaN
4 Adelie Torgersen 36.7 19.3
flipper_length_mm body_mass_g sex
0 181.0 3750.0 MALE
1 186.0 3800.0 FEMALE
2 195.0 3250.0 FEMALE
3 NaN NaN NaN
4 193.0 3450.0 FEMALE
%% Cell type:markdown id:e1a82ffe tags:
# PCA
## Q
Remove the undefined values and get the sample size for each species.
%% Cell type:markdown id:623d14fd tags:
## A
%% Cell type:code id:b2e86638 tags:
``` python
```
%% Cell type:markdown id:fc6e86fd tags:
## Q
Plot a `pairplot` so that each species can be distinguished by a different color.
%% Cell type:markdown id:b0b71cf0 tags:
## A
%% Cell type:code id:82fef377 tags:
``` python
```
%% Cell type:markdown id:3ac3648c tags:
## Q
We will focus on the continuous variables only, and scale them.
%% Cell type:code id:1722dc49 tags:
``` python
penguin_data = penguins[[
"culmen_length_mm",
"culmen_depth_mm",
"flipper_length_mm",
"body_mass_g",
]].values
from sklearn.preprocessing import StandardScaler
scaled_penguin_data = StandardScaler().fit_transform(penguin_data)
```
%% Cell type:markdown id:59d834f0 tags:
Perform a PCA on the scaled data, with all 4 components, and draw a scree plot to choose a number of principal components.
%% Cell type:markdown id:6aa45291 tags:
## A
%% Cell type:code id:6c71cd2f tags:
``` python
```
%% Cell type:markdown id:6fbc37aa tags:
## Q
Perform a new PCA with a number of principal components, projects the data onto the principal axes, and plot the final data representing each species with a different color (hint: look at [this plotting function](https://plotly.github.io/plotly.py-docs/generated/plotly.express.scatter_3d.html)).
%% Cell type:markdown id:4fe3999e tags:
## A
%% Cell type:code id:3782d970 tags:
``` python
```
%% Cell type:markdown id:10f75594 tags:
# UMAP
%% Cell type:markdown id:398dd715 tags:
## Q
Play around with UMAP and the scaled penguin data.
%% Cell type:markdown id:9f71db23 tags:
## A
%% Cell type:code id:fd18eba0 tags:
``` python
```
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment