{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a8c0f52e",
   "metadata": {
    "tags": []
   },
   "source": [
    "# modifications et *slicing* de dataframe\n",
    "\n",
    "Où on apprend à découper et modifier des parties de dataframe"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f011bf85",
   "metadata": {},
   "source": [
    "Nous allons nous intéresser dans ce notebook à la manière de découper (trancher) slicer les objets `pandas` comme des séries ou des dataframes, et à les manipuler. C'est souvent ce que vous allez faire sur vos tables: appliquer une fonction à une sous-partie de vos données."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a39d85b8",
   "metadata": {},
   "source": [
    "Importons nos bibliothèques et nous allons lire une table des passagers du Titanic pour servir d'exemple."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "17fb9a6a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09d90019",
   "metadata": {},
   "source": [
    "Lisons notre dataframe du Titanic et passons lui comme index des lignes, la colonne `PassengerId`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "7733d3f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "file = 'titanic.csv'\n",
    "df = pd.read_csv(file, index_col='PassengerId')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2912a031",
   "metadata": {},
   "source": [
    "et aussi, comme dans le notebook précédent on va le trier par âge histoire de bien voir la différence entre les index et les indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7d6df860",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       3           Baclini, Miss. Eugenie  female  0.75   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                \n",
       "804              0      1    2625   8.5167   NaN        C  \n",
       "756              1      1  250649  14.5000   NaN        S  \n",
       "645              2      1    2666  19.2583   NaN        C  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(by='Age', inplace=True)\n",
    "df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5600e7c1",
   "metadata": {},
   "source": [
    "## copier une dataframe"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14f61801",
   "metadata": {},
   "source": [
    "Une chose que nous pouvons apprendre est à copier une dataframe. Pour cela il faut utiliser la méthodes `copy` des `pandas.DataFrame`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "c1642aff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       3           Baclini, Miss. Eugenie  female  0.75   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                \n",
       "804              0      1    2625   8.5167   NaN        C  \n",
       "756              1      1  250649  14.5000   NaN        S  \n",
       "645              2      1    2666  19.2583   NaN        C  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_copy = df.copy()\n",
    "df_copy.head(3)      # df_copy est une nouvelle dataframe jumelle de l'originale"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca7aea3f",
   "metadata": {},
   "source": [
    "voilà `df_copy` est une nouvelle dataframe avec les mêmes valeurs que l'originale mais totalement indépendante."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b6a182b",
   "metadata": {},
   "source": [
    "## créer une nouvelle colonne"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87bdd08d",
   "metadata": {},
   "source": [
    "Il est souvent pratique de créer une nouvelle colonne, en faisant un calcul à partir des colonnes existantes.  \n",
    "Les opérations sur les colonnes sont, en pratique, les seules opérations qui utilisent la forme `df[nom_de_colonne]`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "cdf9cb65",
   "metadata": {},
   "outputs": [],
   "source": [
    "# pour créer une nouvelle colonne\n",
    "# par exemple ici je vais ajouter une colonne 'Deceased'\n",
    "# qui est simplement l'opposé de 'Survived'\n",
    "\n",
    "df['Deceased'] = 1 - df['Survived']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "5bfce3ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       3           Baclini, Miss. Eugenie  female  0.75   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                          \n",
       "804              0      1    2625   8.5167   NaN        C         0  \n",
       "756              1      1  250649  14.5000   NaN        S         0  \n",
       "645              2      1    2666  19.2583   NaN        C         0  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf6e7b5d",
   "metadata": {},
   "source": [
    "## contextualisons l'accès et la modification de parties d'une dataframe"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd738f24",
   "metadata": {},
   "source": [
    "Pour accéder ou modifier des sous-parties de dataframe, vous pourriez être tenté d'utiliser les syntaxes classiques d'accès aux éléments d'un tableau par leur indice, comme vous le feriez en Python.\n",
    "\n",
    "Comme par exemple en Python:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "7cefdf70",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Hello !', 56, 34]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "L = [-12, 56, 34]\n",
    "L[0] = \"Hello !\"\n",
    "L"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c800a0e4",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Hello !', 100, 200, 300, 34]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "L[1:2] = [100, 200, 300]\n",
    "L"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49c9edfc",
   "metadata": {},
   "source": [
    "Ou encore, d'utiliser l'accès à un tableau par une paires d'**indices**, comme vous le feriez en `numpy`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0e61df7d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[10000     1     2     3 10000]\n",
      " [    5     6     7     8     9]\n",
      " [   10    11   100    13    14]\n",
      " [   15    16    17    18    19]\n",
      " [10000    21    22    23 10000]]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([10000,     1,     2,     3, 10000])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mat = np.arange(25).reshape((5, 5))   # je crée la matrice 5x5 d'éléments 0 à 24\n",
    "mat[2, 2] = 100                       # je modifie l'élément au milieu\n",
    "mat[::4, ::4] = 10000                 # je modifie les 4 coins (::4 = du début à la fin avec un pas 4)\n",
    "print(mat)                            # j'affiche la matrice\n",
    "mat[0]                                # j'accède à sa première ligne"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b0f676b",
   "metadata": {},
   "source": [
    "Mais voilà en `pandas`, c'est très différent: comme on l'a vu déjà, ils ont mis leurs efforts sur la gestion d'une indexation des lignes et des colonnes.\n",
    "\n",
    "Ils ont priviligié le repérage des éléments d'une dataframe **par des index** (les **noms** de colonnes et les **labels** de lignes), et **pas** les **indices** comme en Python et en `numpy`\n",
    "\n",
    "Pourquoi ? parce que si vous utilisez `pandas` c'est que vous avez besoin de voir vos données sous la forme d'une table avec des labels pour indexer les lignes et les colonnes. Si vous n'avez pas besoin d'index particuliers, ça veut dire que vous êtes à l'aise pour manipuler vos données uniquement à base d'indices - des entiers - et dans ce cas-là autant utiliser un simple tableau `numpy` : vous n'allez pas stocker une matrice dans une dataframe ! `numpy` et ses indices ligne, colonne vous suffisent !"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d37fcfb9",
   "metadata": {},
   "source": [
    "Néanmoins, `pandas` offre des techniques assez similaires, et assez puissantes aussi, que nous allons étudier dans ce notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fb3f6ec",
   "metadata": {},
   "source": [
    "## rappels : `loc` pour les accès atomiques"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91f33f01",
   "metadata": {},
   "source": [
    "on l'a vu dans le notebook précédent, les accès à un dataframe pandas se font \n",
    "\n",
    "* le plus souvent à base d'index et non pas d'indices\n",
    "* et dans ce cas on utilise `df.loc` pour accéder aux lignes et cellules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "65a3ac06",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       3           Baclini, Miss. Eugenie  female  0.75   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                          \n",
       "804              0      1    2625   8.5167   NaN        C         0  \n",
       "756              1      1  250649  14.5000   NaN        S         0  \n",
       "645              2      1    2666  19.2583   NaN        C         0  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "3bb2c191",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Hamalainen, Master. Viljo'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# avec loc, c'est ligne, colonne\n",
    "# et avec des index (pas des indices)\n",
    "df.loc[756, 'Name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "5c199168",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "# pour upgrader un passager\n",
    "df.loc[645, 'Pclass'] -= 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4d252626",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       2           Baclini, Miss. Eugenie  female  0.75   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                          \n",
       "804              0      1    2625   8.5167   NaN        C         0  \n",
       "756              1      1  250649  14.5000   NaN        S         0  \n",
       "645              2      1    2666  19.2583   NaN        C         0  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ece36893",
   "metadata": {},
   "source": [
    "## slicing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6cf0f41",
   "metadata": {},
   "source": [
    "### `df.loc`  et **bornes inclusives**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "892fd162",
   "metadata": {},
   "source": [
    "Du coup, la première chose qu'on peut avoir envie de faire, c'est d'accéder à la dataframe par des *slices*; ça doit commencer à être banal maintenant, puisqu'à chaque fois qu'on voit une structure de données qui s'utilise avec `[]` on finit par étendre le sens de l'opération pour des slices.\n",
    "\n",
    "Rappelez-vous qu'en Python une slice c'est de la forme `start:stop:step`, et qu'on peut éluder les morceaux qu'on veut, c'est-à-dire que par exemple `:` désigne une slice qui couvre tout l'espace, `::-1` permet de renverser l'ordre, je vous renvoie aux chapitres idoines si ce n'est plus clair pour vous.\n",
    "\n",
    "**Par contre**, il faut tout de suite souligner une **différence**, qui est que **dans le cas des index** les slices de dataframes **contiennent les bornes**, ce qui, vous vous souvenez, n'a jamais été le cas jusqu'ici avec les slices en Python ou numpy, où la borne supérieure est toujours exclue; voyons cela"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "669bf021",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Helene Barbara</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Caldwell, Master. Alden Gates</td>\n",
       "      <td>male</td>\n",
       "      <td>0.83</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>248738</td>\n",
       "      <td>29.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       2           Baclini, Miss. Eugenie  female  0.75   \n",
       "470                 1       3    Baclini, Miss. Helene Barbara  female  0.75   \n",
       "79                  1       2    Caldwell, Master. Alden Gates    male  0.83   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                          \n",
       "804              0      1    2625   8.5167   NaN        C         0  \n",
       "756              1      1  250649  14.5000   NaN        S         0  \n",
       "645              2      1    2666  19.2583   NaN        C         0  \n",
       "470              2      1    2666  19.2583   NaN        C         0  \n",
       "79               0      2  248738  29.0000   NaN        S         0  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "997a134c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Helene Barbara</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                           Name     Sex   Age  \\\n",
       "PassengerId                                                                  \n",
       "756                 1       2      Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       2         Baclini, Miss. Eugenie  female  0.75   \n",
       "470                 1       3  Baclini, Miss. Helene Barbara  female  0.75   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                          \n",
       "756              1      1  250649  14.5000   NaN        S         0  \n",
       "645              2      1    2666  19.2583   NaN        C         0  \n",
       "470              2      1    2666  19.2583   NaN        C         0  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# je sélectionne les lignes entre \n",
    "# l'index 756 et l'index 470 INCLUSIVEMENT\n",
    "\n",
    "df.loc[756:470]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "161f9f7b",
   "metadata": {},
   "source": [
    "Il y a tout de même une certaine logique, c'est que les index sont a priori mélangés (et peuvent être des noms et pas des entiers), mais bon ca reste troublant au début. Et ce ne sera pas le cas pour `iloc` qui travaille sur les indices."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8cbadf0a",
   "metadata": {},
   "source": [
    "### `df.loc` avec slicing sur les colonnes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c12cbfe",
   "metadata": {},
   "source": [
    "Voyons comment faire du slicing dans l'autre direction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "996ae998",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "804    3\n",
       "756    2\n",
       "645    2\n",
       "470    3\n",
       "79     2\n",
       "      ..\n",
       "860    3\n",
       "864    3\n",
       "869    3\n",
       "879    3\n",
       "889    3\n",
       "Name: Pclass, Length: 891, dtype: int64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# si j'écris ceci, je désigne \n",
    "# toutes les lignes de la colonne \n",
    "# donc toute la colonne Pclass\n",
    "\n",
    "df.loc[:, 'Pclass']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "8d1737ff",
   "metadata": {
    "cell_style": "split",
    "tags": [
     "level_advanced"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# d'ailleurs effectivement, c'est optimisé\n",
    "# au point que c'est le même objet en mémoire !\n",
    "\n",
    "df.loc[:, 'Pclass'] is df['Pclass']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cad311a",
   "metadata": {},
   "source": [
    "Et donc logiquement ici, si je veux sélectionner une plage de colonnes, je vais utiliser deux slices:\n",
    "\n",
    "* dans la direction des lignes, on prend tout avec une simple slice `:`\n",
    "* dans la direction des colonnes, le slicing marche aussi **en mode inclusif**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "fafac5a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                Sex   Age  SibSp  Parch\n",
       "PassengerId                            \n",
       "804            male  0.42      0      1\n",
       "756            male  0.67      1      1\n",
       "645          female  0.75      2      1"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ici comme pour les lignes, comme on est dans l'espace des index\n",
    "# et pas celui des indices, les bornes de la slice sont INCLUSIVES\n",
    "\n",
    "df.loc[:, 'Sex':'Parch'].head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "295952e1",
   "metadata": {},
   "source": [
    "### `df.loc` pour écrire : **bornes inclusives**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2202658",
   "metadata": {},
   "source": [
    "On peut parfaitement modifier une dataframe au travers de slices, toujours en utilisant `df.loc`, et toujours avec bornes inclusives bien entendu :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "6f2c92b9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Helene Barbara</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Caldwell, Master. Alden Gates</td>\n",
       "      <td>male</td>\n",
       "      <td>0.83</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>248738</td>\n",
       "      <td>29.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       2           Baclini, Miss. Eugenie  female  0.75   \n",
       "470                 1       3    Baclini, Miss. Helene Barbara  female  0.75   \n",
       "79                  1       2    Caldwell, Master. Alden Gates    male  0.83   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                          \n",
       "804              0      1    2625   8.5167   NaN        C         0  \n",
       "756              1      1  250649  14.5000   NaN        S         0  \n",
       "645              2      1    2666  19.2583   NaN        C         0  \n",
       "470              2      1    2666  19.2583   NaN        C         0  \n",
       "79               0      2  248738  29.0000   NaN        S         0  "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "e063d49e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1000</td>\n",
       "      <td>1000</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2000</td>\n",
       "      <td>1000</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Helene Barbara</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2000</td>\n",
       "      <td>1000</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Caldwell, Master. Alden Gates</td>\n",
       "      <td>male</td>\n",
       "      <td>0.83</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>248738</td>\n",
       "      <td>29.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       2           Baclini, Miss. Eugenie  female  0.75   \n",
       "470                 1       3    Baclini, Miss. Helene Barbara  female  0.75   \n",
       "79                  1       2    Caldwell, Master. Alden Gates    male  0.83   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                          \n",
       "804              0      1    2625   8.5167   NaN        C         0  \n",
       "756           1000   1000  250649  14.5000   NaN        S         0  \n",
       "645           2000   1000    2666  19.2583   NaN        C         0  \n",
       "470           2000   1000    2666  19.2583   NaN        C         0  \n",
       "79               0      2  248738  29.0000   NaN        S         0  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sans vouloir chercher un \"use case\" très utile\n",
    "# multiplions par 1000 une portion de la dataframe\n",
    "\n",
    "# les lignes entre 756 et 470 inclusivement\n",
    "# les colonnes entre SibSp et Parch inclusivement\n",
    "\n",
    "# quand on écrit x *= 1000,\n",
    "# cela signifie x = x * 1000\n",
    "\n",
    "df.loc[756:470, 'SibSp':'Parch'] *= 1000\n",
    "\n",
    "# vérifions\n",
    "df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "260988b2",
   "metadata": {},
   "source": [
    "### slicing généralisé"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a4ac126",
   "metadata": {},
   "source": [
    "Bon bien sûr on peut mélanger toutes les features que nous connaissons déjà, et écrire des sélections arbitrairement compliquées - pas souvent utiles, mais simplement pour montrer que toute la logique est préservée"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "3bfa130b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1000</td>\n",
       "      <td>1000</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2000</td>\n",
       "      <td>1000</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Helene Barbara</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2000</td>\n",
       "      <td>1000</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Caldwell, Master. Alden Gates</td>\n",
       "      <td>male</td>\n",
       "      <td>0.83</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>248738</td>\n",
       "      <td>29.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>832</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Richards, Master. George Sibley</td>\n",
       "      <td>male</td>\n",
       "      <td>0.83</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>29106</td>\n",
       "      <td>18.7500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>306</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Allison, Master. Hudson Trevor</td>\n",
       "      <td>male</td>\n",
       "      <td>0.92</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>113781</td>\n",
       "      <td>151.5500</td>\n",
       "      <td>C22 C26</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>828</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Mallet, Master. Andre</td>\n",
       "      <td>male</td>\n",
       "      <td>1.00</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>S.C./PARIS 2079</td>\n",
       "      <td>37.0042</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       2           Baclini, Miss. Eugenie  female  0.75   \n",
       "470                 1       3    Baclini, Miss. Helene Barbara  female  0.75   \n",
       "79                  1       2    Caldwell, Master. Alden Gates    male  0.83   \n",
       "832                 1       2  Richards, Master. George Sibley    male  0.83   \n",
       "306                 1       1   Allison, Master. Hudson Trevor    male  0.92   \n",
       "828                 1       2            Mallet, Master. Andre    male  1.00   \n",
       "\n",
       "             SibSp  Parch           Ticket      Fare    Cabin Embarked  \\\n",
       "PassengerId                                                              \n",
       "804              0      1             2625    8.5167      NaN        C   \n",
       "756           1000   1000           250649   14.5000      NaN        S   \n",
       "645           2000   1000             2666   19.2583      NaN        C   \n",
       "470           2000   1000             2666   19.2583      NaN        C   \n",
       "79               0      2           248738   29.0000      NaN        S   \n",
       "832              1      1            29106   18.7500      NaN        S   \n",
       "306              1      2           113781  151.5500  C22 C26        S   \n",
       "828              0      2  S.C./PARIS 2079   37.0042      NaN        C   \n",
       "\n",
       "             Deceased  \n",
       "PassengerId            \n",
       "804                 0  \n",
       "756                 0  \n",
       "645                 0  \n",
       "470                 0  \n",
       "79                  0  \n",
       "832                 0  \n",
       "306                 0  \n",
       "828                 0  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "1428fa5e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sex</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Ticket</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>male</td>\n",
       "      <td>0</td>\n",
       "      <td>2625</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>female</td>\n",
       "      <td>2000</td>\n",
       "      <td>2666</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>male</td>\n",
       "      <td>0</td>\n",
       "      <td>248738</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>306</th>\n",
       "      <td>male</td>\n",
       "      <td>1</td>\n",
       "      <td>113781</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                Sex  SibSp  Ticket\n",
       "PassengerId                       \n",
       "804            male      0    2625\n",
       "645          female   2000    2666\n",
       "79             male      0  248738\n",
       "306            male      1  113781"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# tous ce qu'on a appris jusqu'ici à propos des slices\n",
    "# fonctionne comme attendu, à part cette histoire de \n",
    "# borne supérieure qui est inclusive avec les index\n",
    "\n",
    "df.loc[804:828:2, 'Sex':'Ticket':2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c5914e7",
   "metadata": {},
   "source": [
    "### *copied or not copied, that is the question*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5757c29f",
   "metadata": {},
   "source": [
    "Pour terminer cette section, pour les curieux, il y a une question parfois épineuse qui se pose lorsqu'on fait des sélections de parties de dataframe.\n",
    "\n",
    "Quand une opération sur une dataframe `pandas` renvoie une sous-partie de la dataframe, savoir si cette sélection est en fait **une référence partagée** vers, ou si **c'est une copie** de la dataframe d'origine, ... dépend du contexte !!\n",
    "\n",
    "Bon très bien, vous dites-vous mais en quoi cela me concerne-t-il ! il gère bien comme il veut ses sous-tableaux, je ne vais pas m'en soucier ...\n",
    "\n",
    "alors oui cela est vrai ... jusqu'à ce que vous vous mettiez à modifier des sous-parties de dataframe ...\n",
    "\n",
    "   - si la sous-partie est une **copie** de la sous-partie de dataframe, votre modification ne sera **pas prise en compte** sur la dataframe d'origine ! évidemment…\n",
    "   \n",
    "   - et si c'est une référence partagée vers une partie de la dataframe d'origine, alors vos modifications dans la sélection vont bien se répercuter dans les données d'origine.\n",
    "   \n",
    "ahhh ... vous commencez à comprendre: savoir si une opération retourne une copie ou une référence devient important mais dépend du contexte.\n",
    "\n",
    "Ce qu'il faut retenir c'est que\n",
    "\n",
    "* en utilisant la forme `df.loc[line, column]` on ne crée pas de copie, c'est la bonne façon d'utiliser `loc`\n",
    "* par contre les formes qui utilisent un *chained indexing* - que ce soit `df[l][c]` ou `df.loc[l][c]`, on n'est plus du tout sûr du résultat : il ne faut pas les utiliser pour modifier quoi que ce soit !!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc7e3fdd",
   "metadata": {},
   "source": [
    "## autres mécanismes d'indexation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e14ad85",
   "metadata": {},
   "source": [
    "### accès à une liste explicite de lignes ou colonnes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "011ee151",
   "metadata": {},
   "source": [
    "Nous voulons maintenant prendre une référence sur une sous-partie d'une dataframe qui **ne s'exprime pas sous la forme d'une slice (tranche)**, mais par contre nous possédons la liste des (index des) lignes et des colonnes que nous souhaitons conserver dans ma sous-partie de dataframe.\n",
    "\n",
    "`pandas` sait parfaitement le faire :\n",
    "\n",
    "* on utilise `df.loc[]` puisqu'on va désigner des index,\n",
    "* et on va passer dans les `[]`,  non plus des slices, mais tout simplement des listes (et de plus, vous donnez les index dans l'ordre qui vous intéresse) :"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c388926f",
   "metadata": {},
   "source": [
    "Prenons ainsi par exemple \n",
    "\n",
    "* les lignes d'index 450, 3, 67, 800 et 678\n",
    "* et les colonnes `Age`, `Pclass` et `Survived`\n",
    "\n",
    "Et comme ce sont des index, nous utilisons `loc`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "1fa46021",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Survived</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>450</th>\n",
       "      <td>52.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>26.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>29.0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>800</th>\n",
       "      <td>30.0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>678</th>\n",
       "      <td>18.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Age  Pclass  Survived\n",
       "PassengerId                        \n",
       "450          52.0       1         1\n",
       "3            26.0       3         1\n",
       "67           29.0       2         1\n",
       "800          30.0       3         0\n",
       "678          18.0       3         1"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# c'est facile de créer une sélection de lignes et de colonnes \n",
    "df.loc[[450, 3, 67, 800, 678], ['Age', 'Pclass', 'Survived']]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4880bc46",
   "metadata": {},
   "source": [
    "### recherche selon une formule booléenne"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d52e7d9d",
   "metadata": {},
   "source": [
    "Nous avons vu dans le notebook précédent que nous pouvions faire des tests sur toutes les valeurs d'une colonne et que cela nous rendait un tableau de booléens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "944cbe7f",
   "metadata": {
    "cell_style": "center"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# cette expression retourne une Series\n",
    "mask = df['Pclass'] >= 3\n",
    "type(mask)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "cab3d2aa",
   "metadata": {
    "cell_style": "center"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "804     True\n",
       "756    False\n",
       "645    False\n",
       "470     True\n",
       "79     False\n",
       "Name: Pclass, dtype: bool"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# voyons ce qu'elle contient\n",
    "mask.head() # un masque de booléens sur la colonne des index donc la colonne PassengerId !"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46d22ac4",
   "metadata": {},
   "source": [
    "La dernière manière d'accéder à des sous-parties de dataframe, va être d'**indexer** une dataframe par un **masque de booléens** sur la colonne des `index` i.e. on va isoler de la dataframe les lignes où la valeur du booléen est vraie.\n",
    "\n",
    "Par exemple, pour extraire de la dataframe les lignes correspondant aux voyageurs en 3-ième classe, on va utiliser `mask` - un objet de type `Series` donc, qui contient des booléens - comme moyen pour indexer la dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "ca3df2e1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Helene Barbara</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2000</td>\n",
       "      <td>1000</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>382</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Nakid, Miss. Maria (\"Mary\")</td>\n",
       "      <td>female</td>\n",
       "      <td>1.00</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>2653</td>\n",
       "      <td>15.7417</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>165</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Panula, Master. Eino Viljami</td>\n",
       "      <td>male</td>\n",
       "      <td>1.00</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>3101295</td>\n",
       "      <td>39.6875</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>387</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Goodwin, Master. Sidney Leonard</td>\n",
       "      <td>male</td>\n",
       "      <td>1.00</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>CA 2144</td>\n",
       "      <td>46.9000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>860</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Razi, Mr. Raihed</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2629</td>\n",
       "      <td>7.2292</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>864</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Sage, Miss. Dorothy Edith \"Dolly\"</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "      <td>CA. 2343</td>\n",
       "      <td>69.5500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>869</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>van Melkebeke, Mr. Philemon</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>345777</td>\n",
       "      <td>9.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>879</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Laleff, Mr. Kristo</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>349217</td>\n",
       "      <td>7.8958</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>889</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Johnston, Miss. Catherine Helen \"Carrie\"</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>W./C. 6607</td>\n",
       "      <td>23.4500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>490 rows × 12 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                                      Name  \\\n",
       "PassengerId                                                               \n",
       "804                 1       3           Thomas, Master. Assad Alexander   \n",
       "470                 1       3             Baclini, Miss. Helene Barbara   \n",
       "382                 1       3               Nakid, Miss. Maria (\"Mary\")   \n",
       "165                 0       3              Panula, Master. Eino Viljami   \n",
       "387                 0       3           Goodwin, Master. Sidney Leonard   \n",
       "...               ...     ...                                       ...   \n",
       "860                 0       3                          Razi, Mr. Raihed   \n",
       "864                 0       3         Sage, Miss. Dorothy Edith \"Dolly\"   \n",
       "869                 0       3               van Melkebeke, Mr. Philemon   \n",
       "879                 0       3                        Laleff, Mr. Kristo   \n",
       "889                 0       3  Johnston, Miss. Catherine Helen \"Carrie\"   \n",
       "\n",
       "                Sex   Age  SibSp  Parch      Ticket     Fare Cabin Embarked  \\\n",
       "PassengerId                                                                   \n",
       "804            male  0.42      0      1        2625   8.5167   NaN        C   \n",
       "470          female  0.75   2000   1000        2666  19.2583   NaN        C   \n",
       "382          female  1.00      0      2        2653  15.7417   NaN        C   \n",
       "165            male  1.00      4      1     3101295  39.6875   NaN        S   \n",
       "387            male  1.00      5      2     CA 2144  46.9000   NaN        S   \n",
       "...             ...   ...    ...    ...         ...      ...   ...      ...   \n",
       "860            male   NaN      0      0        2629   7.2292   NaN        C   \n",
       "864          female   NaN      8      2    CA. 2343  69.5500   NaN        S   \n",
       "869            male   NaN      0      0      345777   9.5000   NaN        S   \n",
       "879            male   NaN      0      0      349217   7.8958   NaN        S   \n",
       "889          female   NaN      1      2  W./C. 6607  23.4500   NaN        S   \n",
       "\n",
       "             Deceased  \n",
       "PassengerId            \n",
       "804                 0  \n",
       "470                 0  \n",
       "382                 0  \n",
       "165                 1  \n",
       "387                 1  \n",
       "...               ...  \n",
       "860                 1  \n",
       "864                 1  \n",
       "869                 1  \n",
       "879                 1  \n",
       "889                 1  \n",
       "\n",
       "[490 rows x 12 columns]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# voyez qu'ici dans les crochets on n'a plus \n",
    "# une slice, ni une liste, \n",
    "# mais une colonne (une Series) de booléens\n",
    "# qu'on appelle un masque\n",
    "\n",
    "df.loc[ mask ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cb6b247",
   "metadata": {},
   "source": [
    "Notez que bien souvent on ne prendra pas la peine de décortiquer comme ça, et on écrira directement"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "6d2c9a3b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Helene Barbara</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2000</td>\n",
       "      <td>1000</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>382</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Nakid, Miss. Maria (\"Mary\")</td>\n",
       "      <td>female</td>\n",
       "      <td>1.00</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>2653</td>\n",
       "      <td>15.7417</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "470                 1       3    Baclini, Miss. Helene Barbara  female  0.75   \n",
       "382                 1       3      Nakid, Miss. Maria (\"Mary\")  female  1.00   \n",
       "\n",
       "             SibSp  Parch Ticket     Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                         \n",
       "804              0      1   2625   8.5167   NaN        C         0  \n",
       "470           2000   1000   2666  19.2583   NaN        C         0  \n",
       "382              0      2   2653  15.7417   NaN        C         0  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# en une seule ligne, c'est un peu moins lisible \n",
    "# mais c'est un idiome fréquent\n",
    "\n",
    "# je rajoute .head(3) pour abrèger un peu\n",
    "\n",
    "df[df['Pclass'] >= 3].head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "481b599c",
   "metadata": {},
   "source": [
    "### combinaison d'expressions booléennes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f728dde3",
   "metadata": {},
   "source": [
    "Un peu plus sophistiqué, nous pouvons mettre **plusieurs conditions**, par exemple des passagers qui ne sont pas en première classe et dont l'age est supérieur à 70 ans.\n",
    "\n",
    "Mais comment écrire ces conditions ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "bd0eec4b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ce n'est pas bon, il me dit 'The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'\n"
     ]
    }
   ],
   "source": [
    "# on pourrait être tenté d'écrire quelque chose comme ceci\n",
    "\n",
    "try:\n",
    "    df['Age'] >= 70 and not(df['Pclass'] == 1)\n",
    "except ValueError as e:\n",
    "    print(f\"Ce n'est pas bon, il me dit '{e}'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fbc7dda",
   "metadata": {},
   "source": [
    "Est-ce que cela ne vous rappelle pas quelque chose ?  \n",
    "Nous avons déjà vu le même comportement lorsqu'il s'était agi d'écrire des conditions sur les tableaux `numpy`; \n",
    "alors oui parmi les petites choses que l'on peut trouver parfois contre-intuitives avec `numpy` et `pandas`, il y a les expressions logiques sur les tableaux de booléens.\n",
    "\n",
    "Vous ne pouvez **pas** utiliser `and`, `or` et `not` ! \n",
    "\n",
    "   - soit vous utilisez les `np.logical_and`, `np.logical_or` et `np.logical_not` mais ce n'est pas super lisible ... \n",
    "   \n",
    "   - soit vous utilisez les `&`, `|` et `~` (les opérateurs logiques qu'on appelle *bitwise* i.e. qui travaillent bit à bit) et vous parenthésez bien !"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "49cf56a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "804    False\n",
       "756    False\n",
       "645    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mask_age = (df['Age'] >= 70) & (~ (df['Pclass'] == 1)) # une pandas.Series sur les index\n",
    "mask_age.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "e7f86089",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>673</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>Mitchell, Mr. Henry Michael</td>\n",
       "      <td>male</td>\n",
       "      <td>70.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>C.A. 24580</td>\n",
       "      <td>10.500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Connors, Mr. Patrick</td>\n",
       "      <td>male</td>\n",
       "      <td>70.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>370369</td>\n",
       "      <td>7.750</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Q</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>852</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Svensson, Mr. Johan</td>\n",
       "      <td>male</td>\n",
       "      <td>74.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>347060</td>\n",
       "      <td>7.775</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                         Name   Sex   Age  SibSp  \\\n",
       "PassengerId                                                                     \n",
       "673                 0       2  Mitchell, Mr. Henry Michael  male  70.0      0   \n",
       "117                 0       3         Connors, Mr. Patrick  male  70.5      0   \n",
       "852                 0       3          Svensson, Mr. Johan  male  74.0      0   \n",
       "\n",
       "             Parch      Ticket    Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                      \n",
       "673              0  C.A. 24580  10.500   NaN        S         1  \n",
       "117              0      370369   7.750   NaN        Q         1  \n",
       "852              0      347060   7.775   NaN        S         1  "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[mask_age]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04e02720",
   "metadata": {},
   "source": [
    "Ou de la manière concise habituellement utilisée:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "78476331",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>673</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>Mitchell, Mr. Henry Michael</td>\n",
       "      <td>male</td>\n",
       "      <td>70.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>C.A. 24580</td>\n",
       "      <td>10.500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Connors, Mr. Patrick</td>\n",
       "      <td>male</td>\n",
       "      <td>70.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>370369</td>\n",
       "      <td>7.750</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Q</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>852</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Svensson, Mr. Johan</td>\n",
       "      <td>male</td>\n",
       "      <td>74.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>347060</td>\n",
       "      <td>7.775</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                         Name   Sex   Age  SibSp  \\\n",
       "PassengerId                                                                     \n",
       "673                 0       2  Mitchell, Mr. Henry Michael  male  70.0      0   \n",
       "117                 0       3         Connors, Mr. Patrick  male  70.5      0   \n",
       "852                 0       3          Svensson, Mr. Johan  male  74.0      0   \n",
       "\n",
       "             Parch      Ticket    Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                      \n",
       "673              0  C.A. 24580  10.500   NaN        S         1  \n",
       "117              0      370369   7.750   NaN        Q         1  \n",
       "852              0      347060   7.775   NaN        S         1  "
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# plus de 70 ans, et pas en première classe\n",
    "# remarquez que ça se bouscule pas dans cette catégorie...\n",
    "\n",
    "df.loc [ (df['Age'] >= 70) & (~ (df['Pclass'] == 1)) ] "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "dc1bfc57",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>673</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>Mitchell, Mr. Henry Michael</td>\n",
       "      <td>male</td>\n",
       "      <td>70.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>C.A. 24580</td>\n",
       "      <td>10.500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Connors, Mr. Patrick</td>\n",
       "      <td>male</td>\n",
       "      <td>70.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>370369</td>\n",
       "      <td>7.750</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Q</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>852</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Svensson, Mr. Johan</td>\n",
       "      <td>male</td>\n",
       "      <td>74.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>347060</td>\n",
       "      <td>7.775</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                         Name   Sex   Age  SibSp  \\\n",
       "PassengerId                                                                     \n",
       "673                 0       2  Mitchell, Mr. Henry Michael  male  70.0      0   \n",
       "117                 0       3         Connors, Mr. Patrick  male  70.5      0   \n",
       "852                 0       3          Svensson, Mr. Johan  male  74.0      0   \n",
       "\n",
       "             Parch      Ticket    Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                      \n",
       "673              0  C.A. 24580  10.500   NaN        S         1  \n",
       "117              0      370369   7.750   NaN        Q         1  \n",
       "852              0      347060   7.775   NaN        S         1  "
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# pareil avec les opérateurs numpy\n",
    "# personnellement je préfère la version précédente mais bon\n",
    "\n",
    "df.loc [ np.logical_and(df['Age'] >= 70, np.logical_not(df['Pclass'] == 1)) ] # bof ..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9c39986",
   "metadata": {
    "tags": []
   },
   "source": [
    "### résumé des méthodes d'indexation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "149dba25",
   "metadata": {},
   "source": [
    "Pour résumer cette partie, nous avons vu trois méthodes d'indexation utilisables avec `loc` :\n",
    "\n",
    "* on peut utiliser une slice, et parce qu'on manipule des index et pas des indices dans ce cas **les bornes sont inclusives** (on va voir tout de suite qu'avec les indices par contre les bornes sont les bornes habituelles, avec la fin exclue)\n",
    "* on peut utiliser une liste explicite, pour choisir exactement et dans le bon ordre les index qui nous intéressent\n",
    "* on peut utiliser un masque, c'est-à-dire une colonne obtenue en appliquant une expression booléenne à la dataframe de départ - cette méthode s'applique sans doute plus volontiers à la sélection de lignes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0171ff13",
   "metadata": {
    "tags": [
     "level_advanced"
    ]
   },
   "source": [
    "Remarquez d'ailleurs, pour les geeks, que si on veut on peut même mélanger ces trois méthodes d'indexation; c'est-à-dire par exemple utiliser une liste pour les lignes et une slice pour les colonnes :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "307c3ecb",
   "metadata": {
    "tags": [
     "level_advanced"
    ]
   },
   "outputs": [],
   "source": [
    "# on peut indexer par exemple\n",
    "# les lignes avec une liste\n",
    "# les colonnes avec une slice"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "3f32966c",
   "metadata": {
    "tags": [
     "level_advanced"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sex</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Cabin</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>450</th>\n",
       "      <td>male</td>\n",
       "      <td>0</td>\n",
       "      <td>113786</td>\n",
       "      <td>C104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>C.A. 29395</td>\n",
       "      <td>F33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>800</th>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>345773</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>678</th>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>4138</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                Sex  SibSp            Ticket Cabin\n",
       "PassengerId                                       \n",
       "450            male      0            113786  C104\n",
       "3            female      0  STON/O2. 3101282   NaN\n",
       "67           female      0        C.A. 29395   F33\n",
       "800          female      1            345773   NaN\n",
       "678          female      0              4138   NaN"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[\n",
    "    # dans la dimension des lignes: une liste\n",
    "    [450, 3, 67, 800, 678], \n",
    "    # dans la dimension des colonnes: une slice\n",
    "    'Sex':'Cabin':2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b966a1a",
   "metadata": {},
   "source": [
    "## travailler avec les indices : **bornes habituelles**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "961c97db",
   "metadata": {},
   "source": [
    "Dans les - rares - cas où on veut travailler avec les indices plutôt qu'avec les index, tout fonctionne presque exactement pareil qu'avec les index, sauf que\n",
    "\n",
    "* on doit utiliser `iloc` au lieu de `loc`, bien entendu\n",
    "* qui supportent les mêmes mécanismes de *slicing* et d'indexation que l'on vient de voir,\n",
    "* et dans ce cas comme on est dans l'espace des indices, **les bornes des slices** se comportent comme les **bornes habituelles (début inclus, fin exclue)**\n",
    "\n",
    "Je vous invite à vérifier ce point par vous même, en remettant à leur valeur originelle la portion de la dataframe que l'on avait un peu arbitrairement multipliée par 1000 tout à l'heure, tout ça en utilisant `iloc`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "ac59cace",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Deceased</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Thomas, Master. Assad Alexander</td>\n",
       "      <td>male</td>\n",
       "      <td>0.42</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2625</td>\n",
       "      <td>8.5167</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Hamalainen, Master. Viljo</td>\n",
       "      <td>male</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1000</td>\n",
       "      <td>1000</td>\n",
       "      <td>250649</td>\n",
       "      <td>14.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Baclini, Miss. Eugenie</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2000</td>\n",
       "      <td>1000</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Baclini, Miss. Helene Barbara</td>\n",
       "      <td>female</td>\n",
       "      <td>0.75</td>\n",
       "      <td>2000</td>\n",
       "      <td>1000</td>\n",
       "      <td>2666</td>\n",
       "      <td>19.2583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Caldwell, Master. Alden Gates</td>\n",
       "      <td>male</td>\n",
       "      <td>0.83</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>248738</td>\n",
       "      <td>29.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                             Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "804                 1       3  Thomas, Master. Assad Alexander    male  0.42   \n",
       "756                 1       2        Hamalainen, Master. Viljo    male  0.67   \n",
       "645                 1       2           Baclini, Miss. Eugenie  female  0.75   \n",
       "470                 1       3    Baclini, Miss. Helene Barbara  female  0.75   \n",
       "79                  1       2    Caldwell, Master. Alden Gates    male  0.83   \n",
       "\n",
       "             SibSp  Parch  Ticket     Fare Cabin Embarked  Deceased  \n",
       "PassengerId                                                          \n",
       "804              0      1    2625   8.5167   NaN        C         0  \n",
       "756           1000   1000  250649  14.5000   NaN        S         0  \n",
       "645           2000   1000    2666  19.2583   NaN        C         0  \n",
       "470           2000   1000    2666  19.2583   NaN        C         0  \n",
       "79               0      2  248738  29.0000   NaN        S         0  "
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# je vous rappelle où on en est\n",
    "df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "11904976",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ellipsis"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# votre mission si vous l'acceptez\n",
    "# rediviser par 1000 les 6 cases, mais à bases d'indices cette fois-ci\n",
    "# donc en utilisant iloc\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9b63ab9",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "## problème de modification de copies (pour les avancés)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "afb29416",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "En première lecture de ce notebook, cette section ne sera compréhensible que par des élèves avancés, les autres pourront y revenir plus tard."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2180b772",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "On va voir rapidement le problème de *tentative* de modification d'une copie d'une dataframe."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d927a88",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "### modification par chaînage d'indexations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd2c6163",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "Supposez qu'on accède à une colonne, par exemple celle de la survie qui s'appelle `Survived`, en utilisant la syntaxe classique d'accès à une clé d'un dictionnaire."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "7c987ccb",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "804    1\n",
       "756    1\n",
       "645    1\n",
       "470    1\n",
       "79     1\n",
       "      ..\n",
       "860    0\n",
       "864    0\n",
       "869    0\n",
       "879    0\n",
       "889    0\n",
       "Name: Survived, Length: 891, dtype: int64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Survived']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c38f8cc0",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "On obtient une seule colonne, elle est de type `pandas.Series`, on le savait déjà."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29c132fd",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "Maintenant que j'ai une colonne, rien ne m'empêche d'accéder à un élément de la colonne, avec la simple notation d'accès à un élément d'un tableau comme dans Python, prenons l'élément d'index 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "72407296",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# so far, so good\n",
    "df['Survived'][1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cde150b",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "Maintenant LA question. Je viens d'accéder à un élément de la colonne `Survived`, puis-je utiliser cette manière d'accéder pour modifier l'élément ?\n",
    "\n",
    "Dit autrement, puis-je ressusciter le pauvre passager d'index 1 en faisant passer son état de survie à 1 par l'affectation `df['Survived'][1] = 1`\n",
    "\n",
    "La réponse est non ! Pourquoi ? parce que `df['Survived'][1]` est une copie ! pas une référence vers une partie de la dataframe `df` !\n",
    "\n",
    "On appelle cela une *indexation par chaînage* (on chaîne `['Survived']`et `[1]`) et bien: *toutes les indexations par chaînage  sont des copies* et ne peuvent pas donner lieu à des modifications ...\n",
    "\n",
    "Vous avez l'obligation d'utiliser `loc` ou `iloc` !"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a848b8a",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "Pour les avancés ce *problème* s'appelle le *chained indexing* et pour plus d'explications regardez là https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy (quand vous en aurez le temps ...)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c188340d",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "### indexation par une liste et modification"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "757d6149",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "On va indexer une dataframe par une liste d'index de colonnes sans utiliser `loc` ni `iloc`. Dans cet exemple on isole les trois colonnes `Survived`, `Pclass` et `Sex`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "d4e4435c",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass     Sex\n",
       "PassengerId                          \n",
       "804                 1       3    male\n",
       "756                 1       2    male\n",
       "645                 1       2  female\n",
       "470                 1       3  female\n",
       "79                  1       2    male"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = df[ ['Survived', 'Pclass', 'Sex'] ]\n",
    "df1.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "122e6977",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "On obtient une dataframe que nous appelons `df1`. Donc vous vous rappelez que nous avons deux possibilité pour la sous-partie d'une dataframe, obtenue par découpage de la dataframe d'origine:\n",
    "   - c'est une copie de la dataframe (vous ne devez pas la modifier)\n",
    "   - c'est une référence sur la dataframe (vous pouvez la modifier)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccec6d38",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "LA question est donc de savoir si `df1` est une copie ou une référence sur votre dataframe ?\n",
    "\n",
    "C'est une copie donc vous ne devez pas tenter de la modifier mais on va le faire.\n",
    "\n",
    "On tente de ressusciter notre pauvre passager d'index 1 en utilisant `loc` sur la sous-dataframe `df1` (on a oublié que `df1` était une copie)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ab2723b",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "On regarde ce que vaut l'élément qu'on veut modifier:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "a9bb4555",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.loc[1, 'Survived']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff18f460",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "ok 0. On tente de le modifier:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "84b58ef9",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/share/miniconda/envs/python-numérique/lib/python3.9/site-packages/pandas/core/indexing.py:1817: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  self._setitem_single_column(loc, value, pi)\n"
     ]
    }
   ],
   "source": [
    "df1.loc[1, 'Survived'] = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba669671",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "Je recois un warning de `pandas` me disant que j'ai potentiellement un problème. Comme il n'est pas sûr que pour moi ca en soit un, il me donne un simple avertissement et non une erreur.\n",
    "\n",
    "En fait, là il m'indique que: si je pensais modifier `df` en passant par `df1` alors je me trompe puisque `df1` est une copie de ma dataframe `df`, donc `df` ne sera pas modifié.\n",
    "\n",
    "Il se peut que ce soit ce que vous voulez (que `df1` soit une copie) ! mais alors pourquoi ne l'avez vous pas clairement indiqué en faisant une copie explicite !"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "500aff95",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "Si mon idée était bien de ne modifier que `df1` parce que je veux une copie de `df`: alors je le code **proprement**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "dab6ea7c",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>804</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>756</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>645</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass     Sex\n",
       "PassengerId                          \n",
       "804                 1       3    male\n",
       "756                 1       2    male\n",
       "645                 1       2  female\n",
       "470                 1       3  female\n",
       "79                  1       2    male"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = df[ ['Survived', 'Pclass', 'Sex'] ].copy()\n",
    "df2.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "ca06e724",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "outputs": [],
   "source": [
    "df2.loc[1, 'Survived'] = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a385bf30",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "Ah voilà qui est mieux !"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "all,-hidden,-heading_collapsed",
   "notebook_metadata_filter": "all,-language_info,-toc,-jupytext.text_representation.jupytext_version,-jupytext.text_representation.format_version",
   "text_representation": {
    "extension": ".md",
    "format_name": "myst"
   }
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  },
  "notebookname": "slicing et dataframe",
  "source_map": [
   16,
   22,
   26,
   30,
   35,
   39,
   42,
   46,
   49,
   53,
   57,
   60,
   64,
   68,
   73,
   81,
   83,
   87,
   93,
   101,
   106,
   110,
   116,
   124,
   128,
   132,
   139,
   143,
   151,
   158,
   160,
   164,
   168,
   176,
   180,
   185,
   189,
   193,
   197,
   207,
   215,
   222,
   227,
   231,
   235,
   239,
   253,
   257,
   261,
   265,
   271,
   275,
   296,
   300,
   304,
   313,
   322,
   325,
   329,
   333,
   341,
   346,
   352,
   359,
   363,
   370,
   374,
   380,
   387,
   399,
   404,
   406,
   410,
   417,
   424,
   428,
   436,
   440,
   448,
   456,
   460,
   470,
   475,
   482,
   486,
   490,
   494,
   498,
   502,
   508,
   512,
   516,
   523,
   535,
   539,
   543,
   547,
   554,
   560,
   568,
   572,
   578,
   582,
   588,
   596,
   600,
   607,
   613
  ],
  "version": "1.0"
 },
 "nbformat": 4,
 "nbformat_minor": 5
}