{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "634a6ccd",
   "metadata": {},
   "source": [
    "<div class=\"licence\">\n",
    "<span>Licence CC BY-NC-ND</span>\n",
    "<span>Valérie Roy</span>\n",
    "</div>\n",
    "\n",
    "<img src=\"media/ensmp-25-alpha.png\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52dbc43f",
   "metadata": {},
   "source": [
    "# regroupement de données  `pandas.DataFrame.groupby`\n",
    "\n",
    "Une table de données `pandas` est en 2 dimensions mais elle peut indiquer des sous-divisions de vos données. Par exemple, les passagers du Titanic sont divisés en femmes et hommes, en passagers de première, deuxième et troisiéme classe. On pourrait même les diviser en classe d'âge, enfants, jeunes, adultes...\n",
    "\n",
    "Des analyses mettant en exergue ces différents groupes de personnes peuvent être intéressantes. Lors du naufrage du Titanic, valait-il mieux être une femme en première classe ou un enfant en troisième ?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3850d4be",
   "metadata": {},
   "source": [
    "On reprend nos données"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e724fa34",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "50c24850",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "3                   1       3   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "1                                      Braund, Mr. Owen Harris    male  22.0   \n",
       "2            Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0   \n",
       "3                                       Heikkinen, Miss. Laina  female  26.0   \n",
       "\n",
       "             SibSp  Parch            Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                          \n",
       "1                1      0         A/5 21171   7.2500   NaN        S  \n",
       "2                1      0          PC 17599  71.2833   C85        C  \n",
       "3                0      0  STON/O2. 3101282   7.9250   NaN        S  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "file = 'titanic.csv'\n",
    "df = pd.read_csv(file, index_col=0)\n",
    "df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cca43755",
   "metadata": {},
   "source": [
    "On va calculer des regroupements en utilisant la fonction `pandas.DataFrame.groupby`, à laquelle on indique un ou plusieurs critères."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02f04a33",
   "metadata": {},
   "source": [
    "## groupement par critère unique"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b61158d",
   "metadata": {},
   "source": [
    "Par exemple, prenons le seul critère de genre des passagers (`Sex`). Cette colonne a deux valeurs: `female` et `male`, nous allons donc obtenir une partition de notre dataframe en deux dataframes : celle des hommes et celle des femmes, sur lesquelles nous allons pouvoir faire des analyses (moyenne...) différencées par genre.\n",
    "\n",
    "Allons-y:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4368eba3",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_by_sex = df.groupby('Sex')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc737d1a",
   "metadata": {},
   "source": [
    "`pandas` calcule les différentes valeurs de la colonne en question (ici `Sex`), et partitionne la dataframe en autant de dataframes que de valeurs différentes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01596a65",
   "metadata": {},
   "source": [
    "`pandas` met les regroupements dans un objet de type `DataFrameGroupBy` (ici de nom `df_by_sex`) qui vous donne accès à de nombreuses fonctionnalités (regardez le help pour plus de détails), nous allons voir ici très peu de choses ici."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f89ff61",
   "metadata": {},
   "source": [
    "### les tailles des groupes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dad111f5",
   "metadata": {},
   "source": [
    "Les objets de type `DataFrameGroupBy` contiennent une fonction très pratique pour récapituler les groupes : `size`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "456a9d8a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Sex\n",
       "female    314\n",
       "male      577\n",
       "dtype: int64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_by_sex.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f764380",
   "metadata": {},
   "source": [
    "### accéder aux sous-dataframes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22059550",
   "metadata": {},
   "source": [
    "On peut aussi itérer sur un objet de type `DataFrameGroupBy` afin de voir les différentes dataframes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "fcaef3d5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "La dataframe de clé 'female' a 314 éléments sur les 891\n",
      "La dataframe de clé 'male' a 577 éléments sur les 891\n"
     ]
    }
   ],
   "source": [
    "for name, subdf in df_by_sex:\n",
    "    print(f\"La dataframe de clé '{name}' a {subdf.shape[0]} éléments sur les {df.shape[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c32fb2e",
   "metadata": {},
   "source": [
    "Voilà, la fonction a bien partitionné votre dataframe en autant de dataframes que de genre des personnes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e8bf168",
   "metadata": {},
   "source": [
    "### les groupes (dictionnaire d'index par clés)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b2095dc",
   "metadata": {},
   "source": [
    "Vous pouvez accéder, au travers du champ `groups` des objets de type `DataFrameGroupBy`, au dictionnaire vous donnant les groupes d'`Index` (ici deux parce `male` et `female`).\n",
    "\n",
    "Chaque clé est une des valeurs possibles (donc à nouveau `male` et `female`), et sa valeur est la liste des index des lignes ayant cette valeur:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "91ff23cb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "female\t→ Int64Index([  2,   3,   4,   9,  10,  11,  12,  15,  16,  19,\n",
      "            ...\n",
      "            867, 872, 875, 876, 880, 881, 883, 886, 888, 889],\n",
      "           dtype='int64', name='PassengerId', length=314)\n",
      "male\t→ Int64Index([  1,   5,   6,   7,   8,  13,  14,  17,  18,  21,\n",
      "            ...\n",
      "            874, 877, 878, 879, 882, 884, 885, 887, 890, 891],\n",
      "           dtype='int64', name='PassengerId', length=577)\n"
     ]
    }
   ],
   "source": [
    "# df_by_sex.groups est un dictionnaire\n",
    "# et voici ses clés et valeurs \n",
    "for k, v in df_by_sex.groups.items():\n",
    "    print(f\"{k}\\t→ {v}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9ad056d",
   "metadata": {},
   "source": [
    "## groupement multi-critères"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5b34a50",
   "metadata": {},
   "source": [
    "Si maintenant on s'intéresse à plusieurs colonnes ? Comment est-ce que ça pourrait se présenter à votre avis ?\n",
    "\n",
    "La solution adoptée, c'est de passer à `groupby`, non plus une seule colonne mais .. une liste de colonnes. \n",
    "\n",
    "Le fonctionnement de `groupby` dans ce cas consiste à \n",
    "\n",
    "* calculer pour chaque colonne les valeurs distinctes (comme dans le cas simple)\n",
    "* et en faire le **produit cartésien** pour obtenir les clés du groupement (incidemment, sous la forme de tuples)\n",
    "\n",
    "Ainsi dans notre exemple si nous prenons les critères :  `Pclass` et`Sex`:\n",
    "\n",
    "* le premier critère donne trois valeurs `1`, `2` et `3` pour les trois classes de navigation\n",
    "* le second donne 2 valeurs `female` et `male`\n",
    "\n",
    "et donc on va avoir 6 tuples qui serviront de clés pour le groupement : (1, 'female'), (1, 'male'), (2, 'female')..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "70b588be",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_by_sex_class = df.groupby(['Pclass', 'Sex'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88078028",
   "metadata": {},
   "source": [
    "### les tailles des groupes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3318f9f8",
   "metadata": {},
   "source": [
    "Pour faire une synthèse on peut utiliser `size()` pour récapituler les groupes; la présentation nous montre bien le produit cartésien qui a été fait"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "28a5d990",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Pclass  Sex   \n",
       "1       female     94\n",
       "        male      122\n",
       "2       female     76\n",
       "        male      108\n",
       "3       female    144\n",
       "        male      347\n",
       "dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_by_sex_class.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73172a19",
   "metadata": {},
   "source": [
    "En une seule ligne:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "b8e3fae4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Pclass  Sex   \n",
       "1       female     94\n",
       "        male      122\n",
       "2       female     76\n",
       "        male      108\n",
       "3       female    144\n",
       "        male      347\n",
       "dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['Pclass', 'Sex']).size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e92466bd",
   "metadata": {},
   "source": [
    "### accéder aux sous-dataframes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26c767fa",
   "metadata": {},
   "source": [
    "De même nous pouvons itérer sur les sous-dataframes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "0981965e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "94 passagers en classe '1' de genre 'female'\n",
      "122 passagers en classe '1' de genre 'male'\n",
      "76 passagers en classe '2' de genre 'female'\n",
      "108 passagers en classe '2' de genre 'male'\n",
      "144 passagers en classe '3' de genre 'female'\n",
      "347 passagers en classe '3' de genre 'male'\n"
     ]
    }
   ],
   "source": [
    "# pour itérer sur les 6 catégories\n",
    "for name, dataframe in df_by_sex_class:\n",
    "    print(f\"{len(dataframe)} passagers en classe '{name[0]}' de genre '{name[1]}'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b08ec883",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "Pour les curieux, une petite astuce, utile à ce stade; on pourrait avoir envie d'utiliser la méthode `.head()` pour afficher chacune des sous-dataframes, en écrivant ceci"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2dbc4ca3",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "outputs": [],
   "source": [
    "# malheureusement ceci ne marche pas !!\n",
    "for name, dataframe in df_by_sex_class:\n",
    "    dataframe.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29648c38",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "source": [
    "En fait ce qui se passe ici, c'est que la méthode `.head()` renvoie un objet que le notebook sait afficher; donc quand on écrit une cellule qui ne contient que (ou dont la dernière instruction est) `df.head()`, cet objet se fait afficher parce que c'est le résultat de la cellule (comme quand `40` se fait affichez quand vous évaluez une cellule avec juste `10+30`)\n",
    "\n",
    "Mais dans le cas du `for` qu'on est en train de vouloir écrire, le résultat du for est `None`, et rien ne se fait afficher; il nous faut donc provoquer l'affichage (un peu comme quand on est obligé d'insérer un `print()` au milieu d'une cellule); voici l'astuce pour arriver à provoquer l'affichage souhaité :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "52cf3af3",
   "metadata": {
    "tags": [
     "level_intermediate"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "2                   1       1   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "2            Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0   \n",
       "\n",
       "             SibSp  Parch    Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                  \n",
       "2                1      0  PC 17599  71.2833   C85        C  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>McCarthy, Mr. Timothy J</td>\n",
       "      <td>male</td>\n",
       "      <td>54.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>17463</td>\n",
       "      <td>51.8625</td>\n",
       "      <td>E46</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                     Name   Sex   Age  SibSp  \\\n",
       "PassengerId                                                                 \n",
       "7                   0       1  McCarthy, Mr. Timothy J  male  54.0      0   \n",
       "\n",
       "             Parch Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                        \n",
       "7                0  17463  51.8625   E46        S  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Nasser, Mrs. Nicholas (Adele Achem)</td>\n",
       "      <td>female</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>237736</td>\n",
       "      <td>30.0708</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                                 Name     Sex  \\\n",
       "PassengerId                                                                  \n",
       "10                  1       2  Nasser, Mrs. Nicholas (Adele Achem)  female   \n",
       "\n",
       "              Age  SibSp  Parch  Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                      \n",
       "10           14.0      1      0  237736  30.0708   NaN        C  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Williams, Mr. Charles Eugene</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>244373</td>\n",
       "      <td>13.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                          Name   Sex  Age  SibSp  \\\n",
       "PassengerId                                                                     \n",
       "18                  1       2  Williams, Mr. Charles Eugene  male  NaN      0   \n",
       "\n",
       "             Parch  Ticket  Fare Cabin Embarked  \n",
       "PassengerId                                      \n",
       "18               0  244373  13.0   NaN        S  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.925</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                    Name     Sex   Age  SibSp  \\\n",
       "PassengerId                                                                  \n",
       "3                   1       3  Heikkinen, Miss. Laina  female  26.0      0   \n",
       "\n",
       "             Parch            Ticket   Fare Cabin Embarked  \n",
       "PassengerId                                                 \n",
       "3                0  STON/O2. 3101282  7.925   NaN        S  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.25</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass                     Name   Sex   Age  SibSp  \\\n",
       "PassengerId                                                                 \n",
       "1                   0       3  Braund, Mr. Owen Harris  male  22.0      1   \n",
       "\n",
       "             Parch     Ticket  Fare Cabin Embarked  \n",
       "PassengerId                                         \n",
       "1                0  A/5 21171  7.25   NaN        S  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# pour que ça fonctionne il faut forcer l'affichage \n",
    "# en utilisant display() qui se trouve dans le module IPython\n",
    "import IPython \n",
    "for name, dataframe in df_by_sex_class:\n",
    "    IPython.display.display(dataframe.head(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76154ce6",
   "metadata": {},
   "source": [
    "À vous de jouer : calculer le `groupby` avec le genre, la classe et si la personne a survécu ou non. Dans quel groupe de personnes reste-il le plus de survivants ? Et le moins ?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "2140ad63",
   "metadata": {},
   "outputs": [],
   "source": [
    "# votre code ici (une petite idée de correction ci-dessous)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "6efd3898",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Pclass  Sex     Survived\n",
       "1       female  0             3\n",
       "                1            91\n",
       "        male    0            77\n",
       "                1            45\n",
       "2       female  0             6\n",
       "                1            70\n",
       "        male    0            91\n",
       "                1            17\n",
       "3       female  0            72\n",
       "                1            72\n",
       "        male    0           300\n",
       "                1            47\n",
       "dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_by_sex_class_survived = df.groupby(['Pclass', 'Sex', 'Survived'])\n",
    "df_by_sex_class_survived.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d8822ae",
   "metadata": {},
   "source": [
    "### les groupes (dictionnaire d'index par clés)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa3c1ca8",
   "metadata": {},
   "source": [
    "Lister les clés\n",
    "\n",
    "Les clés sont des tuples de valeurs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "03da8298",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys([(1, 'female', 0), (1, 'female', 1), (1, 'male', 0), (1, 'male', 1), (2, 'female', 0), (2, 'female', 1), (2, 'male', 0), (2, 'male', 1), (3, 'female', 0), (3, 'female', 1), (3, 'male', 0), (3, 'male', 1)])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_by_sex_class_survived.groups.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22b80c18",
   "metadata": {},
   "source": [
    "Les valeurs sont des listes d'index, ça me permet de retrouver les entrées dans la dataframe d'origine."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1af449c6",
   "metadata": {},
   "source": [
    "Par exemple, si nous voulons accéder aux trois seules femmes de première classe qui n'ont pas survécu. La clé est `(1, 'female', 0)`.\n",
    "\n",
    "Nous allons, cette fois, les rechercher dans la grande dataframe. Remarquez que bien sûr ici on utilise `loc` puisque nous sommes uniquement dans l'espace des index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "4f53aea3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Isham, Miss. Ann Elizabeth</td>\n",
       "      <td>female</td>\n",
       "      <td>50.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17595</td>\n",
       "      <td>28.7125</td>\n",
       "      <td>C49</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>298</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Allison, Miss. Helen Loraine</td>\n",
       "      <td>female</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>113781</td>\n",
       "      <td>151.5500</td>\n",
       "      <td>C22 C26</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>499</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Allison, Mrs. Hudson J C (Bessie Waldo Daniels)</td>\n",
       "      <td>female</td>\n",
       "      <td>25.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>113781</td>\n",
       "      <td>151.5500</td>\n",
       "      <td>C22 C26</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "178                 0       1   \n",
       "298                 0       1   \n",
       "499                 0       1   \n",
       "\n",
       "                                                        Name     Sex   Age  \\\n",
       "PassengerId                                                                  \n",
       "178                               Isham, Miss. Ann Elizabeth  female  50.0   \n",
       "298                             Allison, Miss. Helen Loraine  female   2.0   \n",
       "499          Allison, Mrs. Hudson J C (Bessie Waldo Daniels)  female  25.0   \n",
       "\n",
       "             SibSp  Parch    Ticket      Fare    Cabin Embarked  \n",
       "PassengerId                                                      \n",
       "178              0      0  PC 17595   28.7125      C49        C  \n",
       "298              1      2    113781  151.5500  C22 C26        S  \n",
       "499              1      2    113781  151.5500  C22 C26        S  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[df_by_sex_class_survived.groups[(1, 'female', 0)]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54e163f9",
   "metadata": {},
   "source": [
    "## découper des intervalles de valeurs dans une colonne"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac6475f1",
   "metadata": {},
   "source": [
    "Parfois, nous sommes intéressés, non pas par les différentes valeurs d'une colonne (qui seraient trop nombreuses) mais par des intervalles de ces valeurs.\n",
    "\n",
    "Prenons par exemple la colonne des âges. Si nous faisons un groupement brutalement sur la colonne `Age`, nous obtenons 88 âges différents, ce qui ne nous apporte pas une information intéressante.\n",
    "\n",
    "Par contre ça devient plus intéressant si on raisonne par **classes** d'âges, par exemple les enfants, jeunes, adultes et les plus de 55 ans.\n",
    "   - *'enfant'* disons entre 0 et 12 ans\n",
    "   - *'jeune'* disons entre 12 et 19 ans\n",
    "   - *'adulte'* disons entre 19 et 55 ans\n",
    "   - *'+55'*  et les personnes de plus de 55 ans\n",
    "   \n",
    "Nous aimerions donc avoir une colonne dans notre dataframe avec ces labels pour compléter les âges.\n",
    "   \n",
    "La fonction `pandas.cut`, appliquée à une colonne de votre dataframe, va vous générer une telle colonne, et vous pouvez donner des labels aux intervalles:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "890254bb",
   "metadata": {},
   "source": [
    "Nous allons rajouter la colonne à la dataframe. Sachant que les colonnes d'une dataframe sont les clés d'un dictionnaire, pour ajouter une colonne à votre dataframe, il faut faire comme pour les `dict` en Python."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "513e004f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 'bin' en anglais signifie corbeille ou panier\n",
    "# c'est comme si on mettait les gens dans 4 paniers\n",
    "\n",
    "df['age-periode'] = pd.cut(df['Age'], bins=[0, 12, 19, 55, 100])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c0d2f2e",
   "metadata": {},
   "source": [
    "Je montre les 6 premières lignes des 3 dernières colonnes de la dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "34dc43a1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>age-periode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Q</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Cabin Embarked   age-periode\n",
       "PassengerId                             \n",
       "1             NaN        S  (19.0, 55.0]\n",
       "2             C85        C  (19.0, 55.0]\n",
       "3             NaN        S  (19.0, 55.0]\n",
       "4            C123        S  (19.0, 55.0]\n",
       "5             NaN        S  (19.0, 55.0]\n",
       "6             NaN        Q           NaN"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# on a rajouté une colonne \n",
    "df[df.columns[-3:]].head(6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7aa0569e",
   "metadata": {},
   "source": [
    "Je donne des noms aux périodes d'âge (ici on va rajouter encore une colonne)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "430fe25f",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['name-age-periode'] = pd.cut(df['Age'], bins=[0, 12, 19, 55, 100], \n",
    "                                labels=['children', ' young', 'adult', 'old'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "b14c8b18",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Embarked</th>\n",
       "      <th>age-periode</th>\n",
       "      <th>name-age-periode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "      <td>adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "      <td>adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "      <td>adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>S</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "      <td>adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>S</td>\n",
       "      <td>(19.0, 55.0]</td>\n",
       "      <td>adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Q</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Embarked   age-periode name-age-periode\n",
       "PassengerId                                        \n",
       "1                  S  (19.0, 55.0]            adult\n",
       "2                  C  (19.0, 55.0]            adult\n",
       "3                  S  (19.0, 55.0]            adult\n",
       "4                  S  (19.0, 55.0]            adult\n",
       "5                  S  (19.0, 55.0]            adult\n",
       "6                  Q           NaN              NaN"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df.columns[-3:]].head(6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b286fd2",
   "metadata": {},
   "source": [
    "Et maintenant nous pouvons utiliser cette colonne dans des `groupby`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "477aa2e7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name-age-periode\n",
       "children     69\n",
       " young       95\n",
       "adult       510\n",
       "old          40\n",
       "dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['name-age-periode']).size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "caf8f0e3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name-age-periode  Survived\n",
       "children          0            29\n",
       "                  1            40\n",
       " young            0            56\n",
       "                  1            39\n",
       "adult             0           311\n",
       "                  1           199\n",
       "old               0            28\n",
       "                  1            12\n",
       "dtype: int64"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# etc...\n",
    "df.groupby(['name-age-periode', 'Survived']).size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "51fd2126",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Pclass  Sex     Survived\n",
       "1       female  0             3\n",
       "                1            91\n",
       "        male    0            77\n",
       "                1            45\n",
       "2       female  0             6\n",
       "                1            70\n",
       "        male    0            91\n",
       "                1            17\n",
       "3       female  0            72\n",
       "                1            72\n",
       "        male    0           300\n",
       "                1            47\n",
       "dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# etc..\n",
    "df.groupby(['Pclass', 'Sex', 'Survived', ]).size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d584cff",
   "metadata": {},
   "source": [
    "Vous avez désormais une petite idée de l'utilisation de la fonction `groupby` pour des recherches multi-critères sur une table de données."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "all,-hidden,-heading_collapsed",
   "cell_metadata_json": true,
   "notebook_metadata_filter": "all,-language_info,-toc,-jupytext.text_representation.jupytext_version,-jupytext.text_representation.format_version",
   "text_representation": {
    "extension": ".md",
    "format_name": "myst"
   }
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  },
  "notebookname": "regroupements",
  "source_map": [
   15,
   24,
   32,
   36,
   41,
   45,
   49,
   53,
   59,
   61,
   65,
   69,
   73,
   77,
   79,
   83,
   87,
   90,
   94,
   98,
   104,
   109,
   113,
   131,
   133,
   137,
   141,
   143,
   147,
   149,
   153,
   157,
   163,
   167,
   175,
   181,
   189,
   193,
   197,
   200,
   204,
   210,
   212,
   216,
   222,
   224,
   228,
   244,
   248,
   253,
   257,
   260,
   264,
   269,
   271,
   275,
   279,
   284,
   287
  ],
  "version": "1.0"
 },
 "nbformat": 4,
 "nbformat_minor": 5
}