{ "cells": [ { "cell_type": "markdown", "id": "a447972b", "metadata": {}, "source": [ "
\n", "Licence CC BY-NC-ND\n", "Valérie Roy\n", "
\n", "\n", "" ] }, { "cell_type": "markdown", "id": "f68663f0", "metadata": {}, "source": [ "Notebook très rapide !" ] }, { "cell_type": "code", "execution_count": 1, "id": "c73f77bd", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "id": "966a2844", "metadata": {}, "source": [ "# trier une dataframe" ] }, { "cell_type": "markdown", "id": "ac5026c1", "metadata": {}, "source": [ "il peut être utile de trier une dataframe selon l'ordre d'une colonne (resp. ligne) de la dataframe et le résultat sera une dataframe dont les lignes (resp. colonnes) auront été réordonnées, les index sont conservés (ils bougent avec les lignes qu'ils indexent ...). attention par contre, bien sûr les indices sont modifiés ...\n", "\n", "c'est l'axe qui indiquera si on trie dans l'axe des lignes (auquel cas on trie selon une colonne) ou si on trie dans l'axe des colonnes (auquel cas on trie selon une ligne).\n", "\n", "la fonction est `pandas.dataframe.sort_values`" ] }, { "cell_type": "markdown", "id": "fe6d00ca", "metadata": {}, "source": [ "Vous pouvez décider de trier en place (`inplace=True`) auquel cas la dataframe sur laquelle vous appliquez la fonction est modifiée sinon la fonction renverra une dataframe ... (si vous voulez une copie, comme toujours précisez le, votre code sera plus clair)." ] }, { "cell_type": "markdown", "id": "6823edd5", "metadata": {}, "source": [ "## tri d'une dataframe selon une colonne" ] }, { "cell_type": "markdown", "id": "4f7d0bd8", "metadata": {}, "source": [ "Reprenons notre exemple du titanic." ] }, { "cell_type": "code", "execution_count": 2, "id": "566da332", "metadata": {}, "outputs": [], "source": [ "file = 'titanic.csv'" ] }, { "cell_type": "code", "execution_count": 3, "id": "87530ce2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", "
" ], "text/plain": [ " Survived Pclass \\\n", "PassengerId \n", "1 0 3 \n", "2 1 1 \n", "3 1 3 \n", "4 1 1 \n", "5 0 3 \n", "\n", " Name Sex Age \\\n", "PassengerId \n", "1 Braund, Mr. Owen Harris male 22.0 \n", "2 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 \n", "3 Heikkinen, Miss. Laina female 26.0 \n", "4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 \n", "5 Allen, Mr. William Henry male 35.0 \n", "\n", " SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "1 1 0 A/5 21171 7.2500 NaN S \n", "2 1 0 PC 17599 71.2833 C85 C \n", "3 0 0 STON/O2. 3101282 7.9250 NaN S \n", "4 1 0 113803 53.1000 C123 S \n", "5 0 0 373450 8.0500 NaN S " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(file, index_col=0)\n", "df.head()" ] }, { "cell_type": "markdown", "id": "f5a2f05d", "metadata": {}, "source": [ "Nous voulons trier la dataframe suivant l'ordre de la colonne de `Age`. Dans quel axe (`axis`) devons-nous trier? Vous vous rappelez `axis=0` et `axis=1`?\n", " - si `axis=0` je trie les lignes\n", " - si `axis=1` je trie les colonnes" ] }, { "cell_type": "code", "execution_count": 4, "id": "68d4042a", "metadata": {}, "outputs": [], "source": [ "df_sorted1 = df.sort_values(by='Age', ascending=False, axis=0)" ] }, { "cell_type": "code", "execution_count": 5, "id": "9b0b8373", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
63111Barkworth, Mr. Algernon Henry Wilsonmale80.0002704230.0000A23S
85203Svensson, Mr. Johanmale74.0003470607.7750NaNS
49401Artagaveytia, Mr. Ramonmale71.000PC 1760949.5042NaNC
9701Goldschmidt, Mr. George Bmale71.000PC 1775434.6542A5C
11703Connors, Mr. Patrickmale70.5003703697.7500NaNQ
67302Mitchell, Mr. Henry Michaelmale70.000C.A. 2458010.5000NaNS
74601Crosby, Capt. Edward Giffordmale70.011WE/P 573571.0000B22S
3402Wheadon, Mr. Edward Hmale66.000C.A. 2457910.5000NaNS
\n", "
" ], "text/plain": [ " Survived Pclass Name Sex \\\n", "PassengerId \n", "631 1 1 Barkworth, Mr. Algernon Henry Wilson male \n", "852 0 3 Svensson, Mr. Johan male \n", "494 0 1 Artagaveytia, Mr. Ramon male \n", "97 0 1 Goldschmidt, Mr. George B male \n", "117 0 3 Connors, Mr. Patrick male \n", "673 0 2 Mitchell, Mr. Henry Michael male \n", "746 0 1 Crosby, Capt. Edward Gifford male \n", "34 0 2 Wheadon, Mr. Edward H male \n", "\n", " Age SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "631 80.0 0 0 27042 30.0000 A23 S \n", "852 74.0 0 0 347060 7.7750 NaN S \n", "494 71.0 0 0 PC 17609 49.5042 NaN C \n", "97 71.0 0 0 PC 17754 34.6542 A5 C \n", "117 70.5 0 0 370369 7.7500 NaN Q \n", "673 70.0 0 0 C.A. 24580 10.5000 NaN S \n", "746 70.0 1 1 WE/P 5735 71.0000 B22 S \n", "34 66.0 0 0 C.A. 24579 10.5000 NaN S " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted1.head(8)" ] }, { "cell_type": "markdown", "id": "a9a4cb15", "metadata": {}, "source": [ "Vous voyez que la colonne des `Age` est triée, les lignes ont été changées de place dans la table et leur indexation conservée." ] }, { "cell_type": "markdown", "id": "5b7f0c25", "metadata": { "tags": [] }, "source": [ "Regardons les lignes d'index 673 et 746. Les ages sont identiques, nous aurions pu indiquer une seconde colonne avec `by=[col1, col2]` pour trier les valeurs trouvées identiques par la première colonne.\n", "\n", "Par exemple trions par `Age`, et si les âges sont égaux, trions par `Fare`." ] }, { "cell_type": "code", "execution_count": 6, "id": "c64aa382", "metadata": {}, "outputs": [], "source": [ "df_sorted2 = df.sort_values(by=['Age', 'Fare'], ascending=False, axis=0)" ] }, { "cell_type": "code", "execution_count": 7, "id": "438be334", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
63111Barkworth, Mr. Algernon Henry Wilsonmale80.0002704230.0000A23S
85203Svensson, Mr. Johanmale74.0003470607.7750NaNS
49401Artagaveytia, Mr. Ramonmale71.000PC 1760949.5042NaNC
9701Goldschmidt, Mr. George Bmale71.000PC 1775434.6542A5C
11703Connors, Mr. Patrickmale70.5003703697.7500NaNQ
74601Crosby, Capt. Edward Giffordmale70.011WE/P 573571.0000B22S
67302Mitchell, Mr. Henry Michaelmale70.000C.A. 2458010.5000NaNS
3402Wheadon, Mr. Edward Hmale66.000C.A. 2457910.5000NaNS
\n", "
" ], "text/plain": [ " Survived Pclass Name Sex \\\n", "PassengerId \n", "631 1 1 Barkworth, Mr. Algernon Henry Wilson male \n", "852 0 3 Svensson, Mr. Johan male \n", "494 0 1 Artagaveytia, Mr. Ramon male \n", "97 0 1 Goldschmidt, Mr. George B male \n", "117 0 3 Connors, Mr. Patrick male \n", "746 0 1 Crosby, Capt. Edward Gifford male \n", "673 0 2 Mitchell, Mr. Henry Michael male \n", "34 0 2 Wheadon, Mr. Edward H male \n", "\n", " Age SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "631 80.0 0 0 27042 30.0000 A23 S \n", "852 74.0 0 0 347060 7.7750 NaN S \n", "494 71.0 0 0 PC 17609 49.5042 NaN C \n", "97 71.0 0 0 PC 17754 34.6542 A5 C \n", "117 70.5 0 0 370369 7.7500 NaN Q \n", "746 70.0 1 1 WE/P 5735 71.0000 B22 S \n", "673 70.0 0 0 C.A. 24580 10.5000 NaN S \n", "34 66.0 0 0 C.A. 24579 10.5000 NaN S " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted2.head(8)" ] }, { "cell_type": "markdown", "id": "ef148a5a", "metadata": {}, "source": [ "Vous voyez que les lignes d'index 746 et 673, dont les âges sont égaux, ont été triées par `Fare` et ont changé d'ordre par rapport à la première dataframe `df_sorted1`." ] }, { "cell_type": "markdown", "id": "87b81bbd", "metadata": {}, "source": [ "Si vous voulez modifier la dataframe initiale faites le tri en place en passant le paramètre `inplace=True`." ] }, { "cell_type": "markdown", "id": "c85b779c", "metadata": { "tags": [ "level_intermediate" ] }, "source": [ "Une dernière question: Où sont les lignes dont les valeurs concernées par le tri ne sont pas connues (i.e. `np.nan`)?\n", "\n", "Listons les 5 dernières lignes de la dataframe avec la méthode `pandas.DataFrame.tail`" ] }, { "cell_type": "code", "execution_count": 8, "id": "f74402bd", "metadata": { "tags": [ "level_intermediate" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
86003Razi, Mr. RaihedmaleNaN0026297.2292NaNC
86403Sage, Miss. Dorothy Edith \"Dolly\"femaleNaN82CA. 234369.5500NaNS
86903van Melkebeke, Mr. PhilemonmaleNaN003457779.5000NaNS
87903Laleff, Mr. KristomaleNaN003492177.8958NaNS
88903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
\n", "
" ], "text/plain": [ " Survived Pclass Name \\\n", "PassengerId \n", "860 0 3 Razi, Mr. Raihed \n", "864 0 3 Sage, Miss. Dorothy Edith \"Dolly\" \n", "869 0 3 van Melkebeke, Mr. Philemon \n", "879 0 3 Laleff, Mr. Kristo \n", "889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n", "\n", " Sex Age SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "860 male NaN 0 0 2629 7.2292 NaN C \n", "864 female NaN 8 2 CA. 2343 69.5500 NaN S \n", "869 male NaN 0 0 345777 9.5000 NaN S \n", "879 male NaN 0 0 349217 7.8958 NaN S \n", "889 female NaN 1 2 W./C. 6607 23.4500 NaN S " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(by='Age', ascending=False, axis=0).tail() # oui à la fin" ] }, { "cell_type": "markdown", "id": "eb2c52ae", "metadata": { "tags": [ "level_intermediate" ] }, "source": [ "Il est possible de les mettre en début de la dataframe:" ] }, { "cell_type": "code", "execution_count": 9, "id": "46a01e69", "metadata": { "tags": [ "level_intermediate" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
1812Williams, Mr. Charles EugenemaleNaN0024437313.0000NaNS
2013Masselmani, Mrs. FatimafemaleNaN0026497.2250NaNC
2703Emir, Mr. Farred ChehabmaleNaN0026317.2250NaNC
2913O'Dwyer, Miss. Ellen \"Nellie\"femaleNaN003309597.8792NaNQ
\n", "
" ], "text/plain": [ " Survived Pclass Name Sex Age \\\n", "PassengerId \n", "6 0 3 Moran, Mr. James male NaN \n", "18 1 2 Williams, Mr. Charles Eugene male NaN \n", "20 1 3 Masselmani, Mrs. Fatima female NaN \n", "27 0 3 Emir, Mr. Farred Chehab male NaN \n", "29 1 3 O'Dwyer, Miss. Ellen \"Nellie\" female NaN \n", "\n", " SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "6 0 0 330877 8.4583 NaN Q \n", "18 0 0 244373 13.0000 NaN S \n", "20 0 0 2649 7.2250 NaN C \n", "27 0 0 2631 7.2250 NaN C \n", "29 0 0 330959 7.8792 NaN Q " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(by='Age', ascending=False, axis=0, na_position='first').head()" ] }, { "cell_type": "markdown", "id": "66a04200", "metadata": {}, "source": [ "## tri d'une dataframe selon une ligne" ] }, { "cell_type": "markdown", "id": "2818d360", "metadata": {}, "source": [ "L'exemple du Titanic se prête mal à cet exemple. Créons une dataframe bidon à partir d'un `np.ndarray`. Donnons lui un index de colonnes et un de lignes." ] }, { "cell_type": "code", "execution_count": 10, "id": "d000e597", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
deacb
un-2-8-18-7
deux79153
trois-16-1-3-5
quatre-1-6-479
\n", "
" ], "text/plain": [ " d e a c b\n", "un -2 -8 -1 8 -7\n", "deux 7 9 1 5 3\n", "trois -1 6 -1 -3 -5\n", "quatre -1 -6 -4 7 9" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "petite_df = pd.DataFrame(np.random.randint(-10, 10, 20).reshape(4, 5),\n", " columns=['d', 'e', 'a', 'c', 'b'])\n", "petite_df.index = ['un', 'deux', 'trois', 'quatre']\n", "petite_df" ] }, { "cell_type": "markdown", "id": "566e48be", "metadata": {}, "source": [ "Trions la dataframe par la ligne d'index `un` donc `axis=1`" ] }, { "cell_type": "code", "execution_count": 11, "id": "9ca44468", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ebdac
un-8-7-2-18
deux93715
trois6-5-1-1-3
quatre-69-1-47
\n", "
" ], "text/plain": [ " e b d a c\n", "un -8 -7 -2 -1 8\n", "deux 9 3 7 1 5\n", "trois 6 -5 -1 -1 -3\n", "quatre -6 9 -1 -4 7" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "petite_df.sort_values(by='un', axis=1)" ] }, { "cell_type": "markdown", "id": "da0269af", "metadata": {}, "source": [ "Vous voyez que la ligne *'un'* a été triée et la dataframe réorganisée autour de cette ligne en conservant les index." ] }, { "cell_type": "markdown", "id": "9f937df1", "metadata": {}, "source": [ "## tri des index" ] }, { "cell_type": "markdown", "id": "0d278142", "metadata": {}, "source": [ "Comme son nom d'indique, nous allons pouvoir trier les index d'une dataframe, et comme toujours, suivant l'axe 0 pour les lignes, et suivant l'axe 1 pour les colonnes." ] }, { "cell_type": "markdown", "id": "416b9056", "metadata": {}, "source": [ "Par exemple, nous pouvons trier l'index des colonnes (`axis=1`) de `df`." ] }, { "cell_type": "markdown", "id": "1bac7a89", "metadata": {}, "source": [ "Regardons la dataframe." ] }, { "cell_type": "code", "execution_count": 12, "id": "540a168f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS
603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
\n", "
" ], "text/plain": [ " Survived Pclass \\\n", "PassengerId \n", "1 0 3 \n", "2 1 1 \n", "3 1 3 \n", "4 1 1 \n", "5 0 3 \n", "6 0 3 \n", "\n", " Name Sex Age \\\n", "PassengerId \n", "1 Braund, Mr. Owen Harris male 22.0 \n", "2 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 \n", "3 Heikkinen, Miss. Laina female 26.0 \n", "4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 \n", "5 Allen, Mr. William Henry male 35.0 \n", "6 Moran, Mr. James male NaN \n", "\n", " SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "1 1 0 A/5 21171 7.2500 NaN S \n", "2 1 0 PC 17599 71.2833 C85 C \n", "3 0 0 STON/O2. 3101282 7.9250 NaN S \n", "4 1 0 113803 53.1000 C123 S \n", "5 0 0 373450 8.0500 NaN S \n", "6 0 0 330877 8.4583 NaN Q " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(6)" ] }, { "cell_type": "markdown", "id": "9afe443d", "metadata": {}, "source": [ "Trions ses noms de colonnes et affichons le header de la dataframe." ] }, { "cell_type": "code", "execution_count": 13, "id": "c6fe42b4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeCabinEmbarkedFareNameParchPclassSexSibSpSurvivedTicket
PassengerId
122.0NaNS7.2500Braund, Mr. Owen Harris03male10A/5 21171
238.0C85C71.2833Cumings, Mrs. John Bradley (Florence Briggs Th...01female11PC 17599
326.0NaNS7.9250Heikkinen, Miss. Laina03female01STON/O2. 3101282
435.0C123S53.1000Futrelle, Mrs. Jacques Heath (Lily May Peel)01female11113803
535.0NaNS8.0500Allen, Mr. William Henry03male00373450
\n", "
" ], "text/plain": [ " Age Cabin Embarked Fare \\\n", "PassengerId \n", "1 22.0 NaN S 7.2500 \n", "2 38.0 C85 C 71.2833 \n", "3 26.0 NaN S 7.9250 \n", "4 35.0 C123 S 53.1000 \n", "5 35.0 NaN S 8.0500 \n", "\n", " Name Parch Pclass \\\n", "PassengerId \n", "1 Braund, Mr. Owen Harris 0 3 \n", "2 Cumings, Mrs. John Bradley (Florence Briggs Th... 0 1 \n", "3 Heikkinen, Miss. Laina 0 3 \n", "4 Futrelle, Mrs. Jacques Heath (Lily May Peel) 0 1 \n", "5 Allen, Mr. William Henry 0 3 \n", "\n", " Sex SibSp Survived Ticket \n", "PassengerId \n", "1 male 1 0 A/5 21171 \n", "2 female 1 1 PC 17599 \n", "3 female 0 1 STON/O2. 3101282 \n", "4 female 1 1 113803 \n", "5 male 0 0 373450 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_index(axis=1).head()" ] }, { "cell_type": "markdown", "id": "962eabff", "metadata": {}, "source": [ "Les colonnes ont bien été réordonnées par ordre lexicographique croissant." ] }, { "cell_type": "markdown", "id": "91c1fd2e", "metadata": {}, "source": [ "Maintenant, trions l'index des lignes (*axis=0*) de `df_sorted1` (parce que celui de `df` est déjà trié)." ] }, { "cell_type": "markdown", "id": "379b8a10", "metadata": {}, "source": [ "Regardons avant le tri:" ] }, { "cell_type": "code", "execution_count": 14, "id": "25907f70", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
63111Barkworth, Mr. Algernon Henry Wilsonmale80.0002704230.0000A23S
85203Svensson, Mr. Johanmale74.0003470607.7750NaNS
49401Artagaveytia, Mr. Ramonmale71.000PC 1760949.5042NaNC
9701Goldschmidt, Mr. George Bmale71.000PC 1775434.6542A5C
11703Connors, Mr. Patrickmale70.5003703697.7500NaNQ
67302Mitchell, Mr. Henry Michaelmale70.000C.A. 2458010.5000NaNS
\n", "
" ], "text/plain": [ " Survived Pclass Name Sex \\\n", "PassengerId \n", "631 1 1 Barkworth, Mr. Algernon Henry Wilson male \n", "852 0 3 Svensson, Mr. Johan male \n", "494 0 1 Artagaveytia, Mr. Ramon male \n", "97 0 1 Goldschmidt, Mr. George B male \n", "117 0 3 Connors, Mr. Patrick male \n", "673 0 2 Mitchell, Mr. Henry Michael male \n", "\n", " Age SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "631 80.0 0 0 27042 30.0000 A23 S \n", "852 74.0 0 0 347060 7.7750 NaN S \n", "494 71.0 0 0 PC 17609 49.5042 NaN C \n", "97 71.0 0 0 PC 17754 34.6542 A5 C \n", "117 70.5 0 0 370369 7.7500 NaN Q \n", "673 70.0 0 0 C.A. 24580 10.5000 NaN S " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted1.head(6)" ] }, { "cell_type": "markdown", "id": "8d051729", "metadata": {}, "source": [ "Trions et regardons après le tri." ] }, { "cell_type": "code", "execution_count": 15, "id": "c27814a9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", "
" ], "text/plain": [ " Survived Pclass \\\n", "PassengerId \n", "1 0 3 \n", "2 1 1 \n", "3 1 3 \n", "4 1 1 \n", "5 0 3 \n", "\n", " Name Sex Age \\\n", "PassengerId \n", "1 Braund, Mr. Owen Harris male 22.0 \n", "2 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 \n", "3 Heikkinen, Miss. Laina female 26.0 \n", "4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 \n", "5 Allen, Mr. William Henry male 35.0 \n", "\n", " SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "1 1 0 A/5 21171 7.2500 NaN S \n", "2 1 0 PC 17599 71.2833 C85 C \n", "3 0 0 STON/O2. 3101282 7.9250 NaN S \n", "4 1 0 113803 53.1000 C123 S \n", "5 0 0 373450 8.0500 NaN S " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted1.sort_index(axis=0).head()" ] }, { "cell_type": "markdown", "id": "d22ff67a", "metadata": {}, "source": [ "Voilà l'index `PassengerId` tout bien réordonné !" ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "all,-hidden,-heading_collapsed", "cell_metadata_json": true, "notebook_metadata_filter": "all,-language_info,-toc,-jupytext.text_representation.jupytext_version,-jupytext.text_representation.format_version", "text_representation": { "extension": ".md", "format_name": "myst" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" }, "notebookname": "regroupements dans une dataframe", "source_map": [ 15, 24, 28, 31, 35, 43, 47, 51, 55, 59, 62, 68, 72, 74, 78, 84, 88, 90, 94, 98, 104, 110, 114, 118, 122, 126, 131, 135, 137, 141, 145, 149, 153, 157, 159, 163, 165, 169, 173, 177, 179, 183, 185 ], "version": "1.0" }, "nbformat": 4, "nbformat_minor": 5 }