{
"cells": [
{
"cell_type": "markdown",
"id": "a447972b",
"metadata": {},
"source": [
"
\n",
"Licence CC BY-NC-ND\n",
"Valérie Roy\n",
"
\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "f68663f0",
"metadata": {},
"source": [
"Notebook très rapide !"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c73f77bd",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "966a2844",
"metadata": {},
"source": [
"# trier une dataframe"
]
},
{
"cell_type": "markdown",
"id": "ac5026c1",
"metadata": {},
"source": [
"il peut être utile de trier une dataframe selon l'ordre d'une colonne (resp. ligne) de la dataframe et le résultat sera une dataframe dont les lignes (resp. colonnes) auront été réordonnées, les index sont conservés (ils bougent avec les lignes qu'ils indexent ...). attention par contre, bien sûr les indices sont modifiés ...\n",
"\n",
"c'est l'axe qui indiquera si on trie dans l'axe des lignes (auquel cas on trie selon une colonne) ou si on trie dans l'axe des colonnes (auquel cas on trie selon une ligne).\n",
"\n",
"la fonction est `pandas.dataframe.sort_values`"
]
},
{
"cell_type": "markdown",
"id": "fe6d00ca",
"metadata": {},
"source": [
"Vous pouvez décider de trier en place (`inplace=True`) auquel cas la dataframe sur laquelle vous appliquez la fonction est modifiée sinon la fonction renverra une dataframe ... (si vous voulez une copie, comme toujours précisez le, votre code sera plus clair)."
]
},
{
"cell_type": "markdown",
"id": "6823edd5",
"metadata": {},
"source": [
"## tri d'une dataframe selon une colonne"
]
},
{
"cell_type": "markdown",
"id": "4f7d0bd8",
"metadata": {},
"source": [
"Reprenons notre exemple du titanic."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "566da332",
"metadata": {},
"outputs": [],
"source": [
"file = 'titanic.csv'"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "87530ce2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" PassengerId | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" Heikkinen, Miss. Laina | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Survived Pclass \\\n",
"PassengerId \n",
"1 0 3 \n",
"2 1 1 \n",
"3 1 3 \n",
"4 1 1 \n",
"5 0 3 \n",
"\n",
" Name Sex Age \\\n",
"PassengerId \n",
"1 Braund, Mr. Owen Harris male 22.0 \n",
"2 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 \n",
"3 Heikkinen, Miss. Laina female 26.0 \n",
"4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
"5 Allen, Mr. William Henry male 35.0 \n",
"\n",
" SibSp Parch Ticket Fare Cabin Embarked \n",
"PassengerId \n",
"1 1 0 A/5 21171 7.2500 NaN S \n",
"2 1 0 PC 17599 71.2833 C85 C \n",
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
"4 1 0 113803 53.1000 C123 S \n",
"5 0 0 373450 8.0500 NaN S "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(file, index_col=0)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "f5a2f05d",
"metadata": {},
"source": [
"Nous voulons trier la dataframe suivant l'ordre de la colonne de `Age`. Dans quel axe (`axis`) devons-nous trier? Vous vous rappelez `axis=0` et `axis=1`?\n",
" - si `axis=0` je trie les lignes\n",
" - si `axis=1` je trie les colonnes"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "68d4042a",
"metadata": {},
"outputs": [],
"source": [
"df_sorted1 = df.sort_values(by='Age', ascending=False, axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9b0b8373",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" PassengerId | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 631 | \n",
" 1 | \n",
" 1 | \n",
" Barkworth, Mr. Algernon Henry Wilson | \n",
" male | \n",
" 80.0 | \n",
" 0 | \n",
" 0 | \n",
" 27042 | \n",
" 30.0000 | \n",
" A23 | \n",
" S | \n",
"
\n",
" \n",
" 852 | \n",
" 0 | \n",
" 3 | \n",
" Svensson, Mr. Johan | \n",
" male | \n",
" 74.0 | \n",
" 0 | \n",
" 0 | \n",
" 347060 | \n",
" 7.7750 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 494 | \n",
" 0 | \n",
" 1 | \n",
" Artagaveytia, Mr. Ramon | \n",
" male | \n",
" 71.0 | \n",
" 0 | \n",
" 0 | \n",
" PC 17609 | \n",
" 49.5042 | \n",
" NaN | \n",
" C | \n",
"
\n",
" \n",
" 97 | \n",
" 0 | \n",
" 1 | \n",
" Goldschmidt, Mr. George B | \n",
" male | \n",
" 71.0 | \n",
" 0 | \n",
" 0 | \n",
" PC 17754 | \n",
" 34.6542 | \n",
" A5 | \n",
" C | \n",
"
\n",
" \n",
" 117 | \n",
" 0 | \n",
" 3 | \n",
" Connors, Mr. Patrick | \n",
" male | \n",
" 70.5 | \n",
" 0 | \n",
" 0 | \n",
" 370369 | \n",
" 7.7500 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
" 673 | \n",
" 0 | \n",
" 2 | \n",
" Mitchell, Mr. Henry Michael | \n",
" male | \n",
" 70.0 | \n",
" 0 | \n",
" 0 | \n",
" C.A. 24580 | \n",
" 10.5000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 746 | \n",
" 0 | \n",
" 1 | \n",
" Crosby, Capt. Edward Gifford | \n",
" male | \n",
" 70.0 | \n",
" 1 | \n",
" 1 | \n",
" WE/P 5735 | \n",
" 71.0000 | \n",
" B22 | \n",
" S | \n",
"
\n",
" \n",
" 34 | \n",
" 0 | \n",
" 2 | \n",
" Wheadon, Mr. Edward H | \n",
" male | \n",
" 66.0 | \n",
" 0 | \n",
" 0 | \n",
" C.A. 24579 | \n",
" 10.5000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Survived Pclass Name Sex \\\n",
"PassengerId \n",
"631 1 1 Barkworth, Mr. Algernon Henry Wilson male \n",
"852 0 3 Svensson, Mr. Johan male \n",
"494 0 1 Artagaveytia, Mr. Ramon male \n",
"97 0 1 Goldschmidt, Mr. George B male \n",
"117 0 3 Connors, Mr. Patrick male \n",
"673 0 2 Mitchell, Mr. Henry Michael male \n",
"746 0 1 Crosby, Capt. Edward Gifford male \n",
"34 0 2 Wheadon, Mr. Edward H male \n",
"\n",
" Age SibSp Parch Ticket Fare Cabin Embarked \n",
"PassengerId \n",
"631 80.0 0 0 27042 30.0000 A23 S \n",
"852 74.0 0 0 347060 7.7750 NaN S \n",
"494 71.0 0 0 PC 17609 49.5042 NaN C \n",
"97 71.0 0 0 PC 17754 34.6542 A5 C \n",
"117 70.5 0 0 370369 7.7500 NaN Q \n",
"673 70.0 0 0 C.A. 24580 10.5000 NaN S \n",
"746 70.0 1 1 WE/P 5735 71.0000 B22 S \n",
"34 66.0 0 0 C.A. 24579 10.5000 NaN S "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_sorted1.head(8)"
]
},
{
"cell_type": "markdown",
"id": "a9a4cb15",
"metadata": {},
"source": [
"Vous voyez que la colonne des `Age` est triée, les lignes ont été changées de place dans la table et leur indexation conservée."
]
},
{
"cell_type": "markdown",
"id": "5b7f0c25",
"metadata": {
"tags": []
},
"source": [
"Regardons les lignes d'index 673 et 746. Les ages sont identiques, nous aurions pu indiquer une seconde colonne avec `by=[col1, col2]` pour trier les valeurs trouvées identiques par la première colonne.\n",
"\n",
"Par exemple trions par `Age`, et si les âges sont égaux, trions par `Fare`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c64aa382",
"metadata": {},
"outputs": [],
"source": [
"df_sorted2 = df.sort_values(by=['Age', 'Fare'], ascending=False, axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "438be334",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" PassengerId | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 631 | \n",
" 1 | \n",
" 1 | \n",
" Barkworth, Mr. Algernon Henry Wilson | \n",
" male | \n",
" 80.0 | \n",
" 0 | \n",
" 0 | \n",
" 27042 | \n",
" 30.0000 | \n",
" A23 | \n",
" S | \n",
"
\n",
" \n",
" 852 | \n",
" 0 | \n",
" 3 | \n",
" Svensson, Mr. Johan | \n",
" male | \n",
" 74.0 | \n",
" 0 | \n",
" 0 | \n",
" 347060 | \n",
" 7.7750 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 494 | \n",
" 0 | \n",
" 1 | \n",
" Artagaveytia, Mr. Ramon | \n",
" male | \n",
" 71.0 | \n",
" 0 | \n",
" 0 | \n",
" PC 17609 | \n",
" 49.5042 | \n",
" NaN | \n",
" C | \n",
"
\n",
" \n",
" 97 | \n",
" 0 | \n",
" 1 | \n",
" Goldschmidt, Mr. George B | \n",
" male | \n",
" 71.0 | \n",
" 0 | \n",
" 0 | \n",
" PC 17754 | \n",
" 34.6542 | \n",
" A5 | \n",
" C | \n",
"
\n",
" \n",
" 117 | \n",
" 0 | \n",
" 3 | \n",
" Connors, Mr. Patrick | \n",
" male | \n",
" 70.5 | \n",
" 0 | \n",
" 0 | \n",
" 370369 | \n",
" 7.7500 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
" 746 | \n",
" 0 | \n",
" 1 | \n",
" Crosby, Capt. Edward Gifford | \n",
" male | \n",
" 70.0 | \n",
" 1 | \n",
" 1 | \n",
" WE/P 5735 | \n",
" 71.0000 | \n",
" B22 | \n",
" S | \n",
"
\n",
" \n",
" 673 | \n",
" 0 | \n",
" 2 | \n",
" Mitchell, Mr. Henry Michael | \n",
" male | \n",
" 70.0 | \n",
" 0 | \n",
" 0 | \n",
" C.A. 24580 | \n",
" 10.5000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 34 | \n",
" 0 | \n",
" 2 | \n",
" Wheadon, Mr. Edward H | \n",
" male | \n",
" 66.0 | \n",
" 0 | \n",
" 0 | \n",
" C.A. 24579 | \n",
" 10.5000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Survived Pclass Name Sex \\\n",
"PassengerId \n",
"631 1 1 Barkworth, Mr. Algernon Henry Wilson male \n",
"852 0 3 Svensson, Mr. Johan male \n",
"494 0 1 Artagaveytia, Mr. Ramon male \n",
"97 0 1 Goldschmidt, Mr. George B male \n",
"117 0 3 Connors, Mr. Patrick male \n",
"746 0 1 Crosby, Capt. Edward Gifford male \n",
"673 0 2 Mitchell, Mr. Henry Michael male \n",
"34 0 2 Wheadon, Mr. Edward H male \n",
"\n",
" Age SibSp Parch Ticket Fare Cabin Embarked \n",
"PassengerId \n",
"631 80.0 0 0 27042 30.0000 A23 S \n",
"852 74.0 0 0 347060 7.7750 NaN S \n",
"494 71.0 0 0 PC 17609 49.5042 NaN C \n",
"97 71.0 0 0 PC 17754 34.6542 A5 C \n",
"117 70.5 0 0 370369 7.7500 NaN Q \n",
"746 70.0 1 1 WE/P 5735 71.0000 B22 S \n",
"673 70.0 0 0 C.A. 24580 10.5000 NaN S \n",
"34 66.0 0 0 C.A. 24579 10.5000 NaN S "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_sorted2.head(8)"
]
},
{
"cell_type": "markdown",
"id": "ef148a5a",
"metadata": {},
"source": [
"Vous voyez que les lignes d'index 746 et 673, dont les âges sont égaux, ont été triées par `Fare` et ont changé d'ordre par rapport à la première dataframe `df_sorted1`."
]
},
{
"cell_type": "markdown",
"id": "87b81bbd",
"metadata": {},
"source": [
"Si vous voulez modifier la dataframe initiale faites le tri en place en passant le paramètre `inplace=True`."
]
},
{
"cell_type": "markdown",
"id": "c85b779c",
"metadata": {
"tags": [
"level_intermediate"
]
},
"source": [
"Une dernière question: Où sont les lignes dont les valeurs concernées par le tri ne sont pas connues (i.e. `np.nan`)?\n",
"\n",
"Listons les 5 dernières lignes de la dataframe avec la méthode `pandas.DataFrame.tail`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f74402bd",
"metadata": {
"tags": [
"level_intermediate"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" PassengerId | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 860 | \n",
" 0 | \n",
" 3 | \n",
" Razi, Mr. Raihed | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 2629 | \n",
" 7.2292 | \n",
" NaN | \n",
" C | \n",
"
\n",
" \n",
" 864 | \n",
" 0 | \n",
" 3 | \n",
" Sage, Miss. Dorothy Edith \"Dolly\" | \n",
" female | \n",
" NaN | \n",
" 8 | \n",
" 2 | \n",
" CA. 2343 | \n",
" 69.5500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 869 | \n",
" 0 | \n",
" 3 | \n",
" van Melkebeke, Mr. Philemon | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 345777 | \n",
" 9.5000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 879 | \n",
" 0 | \n",
" 3 | \n",
" Laleff, Mr. Kristo | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 349217 | \n",
" 7.8958 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 889 | \n",
" 0 | \n",
" 3 | \n",
" Johnston, Miss. Catherine Helen \"Carrie\" | \n",
" female | \n",
" NaN | \n",
" 1 | \n",
" 2 | \n",
" W./C. 6607 | \n",
" 23.4500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Survived Pclass Name \\\n",
"PassengerId \n",
"860 0 3 Razi, Mr. Raihed \n",
"864 0 3 Sage, Miss. Dorothy Edith \"Dolly\" \n",
"869 0 3 van Melkebeke, Mr. Philemon \n",
"879 0 3 Laleff, Mr. Kristo \n",
"889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"PassengerId \n",
"860 male NaN 0 0 2629 7.2292 NaN C \n",
"864 female NaN 8 2 CA. 2343 69.5500 NaN S \n",
"869 male NaN 0 0 345777 9.5000 NaN S \n",
"879 male NaN 0 0 349217 7.8958 NaN S \n",
"889 female NaN 1 2 W./C. 6607 23.4500 NaN S "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values(by='Age', ascending=False, axis=0).tail() # oui à la fin"
]
},
{
"cell_type": "markdown",
"id": "eb2c52ae",
"metadata": {
"tags": [
"level_intermediate"
]
},
"source": [
"Il est possible de les mettre en début de la dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "46a01e69",
"metadata": {
"tags": [
"level_intermediate"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" PassengerId | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 6 | \n",
" 0 | \n",
" 3 | \n",
" Moran, Mr. James | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 330877 | \n",
" 8.4583 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
" 18 | \n",
" 1 | \n",
" 2 | \n",
" Williams, Mr. Charles Eugene | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 244373 | \n",
" 13.0000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 20 | \n",
" 1 | \n",
" 3 | \n",
" Masselmani, Mrs. Fatima | \n",
" female | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 2649 | \n",
" 7.2250 | \n",
" NaN | \n",
" C | \n",
"
\n",
" \n",
" 27 | \n",
" 0 | \n",
" 3 | \n",
" Emir, Mr. Farred Chehab | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 2631 | \n",
" 7.2250 | \n",
" NaN | \n",
" C | \n",
"
\n",
" \n",
" 29 | \n",
" 1 | \n",
" 3 | \n",
" O'Dwyer, Miss. Ellen \"Nellie\" | \n",
" female | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 330959 | \n",
" 7.8792 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Survived Pclass Name Sex Age \\\n",
"PassengerId \n",
"6 0 3 Moran, Mr. James male NaN \n",
"18 1 2 Williams, Mr. Charles Eugene male NaN \n",
"20 1 3 Masselmani, Mrs. Fatima female NaN \n",
"27 0 3 Emir, Mr. Farred Chehab male NaN \n",
"29 1 3 O'Dwyer, Miss. Ellen \"Nellie\" female NaN \n",
"\n",
" SibSp Parch Ticket Fare Cabin Embarked \n",
"PassengerId \n",
"6 0 0 330877 8.4583 NaN Q \n",
"18 0 0 244373 13.0000 NaN S \n",
"20 0 0 2649 7.2250 NaN C \n",
"27 0 0 2631 7.2250 NaN C \n",
"29 0 0 330959 7.8792 NaN Q "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values(by='Age', ascending=False, axis=0, na_position='first').head()"
]
},
{
"cell_type": "markdown",
"id": "66a04200",
"metadata": {},
"source": [
"## tri d'une dataframe selon une ligne"
]
},
{
"cell_type": "markdown",
"id": "2818d360",
"metadata": {},
"source": [
"L'exemple du Titanic se prête mal à cet exemple. Créons une dataframe bidon à partir d'un `np.ndarray`. Donnons lui un index de colonnes et un de lignes."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d000e597",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" d | \n",
" e | \n",
" a | \n",
" c | \n",
" b | \n",
"
\n",
" \n",
" \n",
" \n",
" un | \n",
" -2 | \n",
" -8 | \n",
" -1 | \n",
" 8 | \n",
" -7 | \n",
"
\n",
" \n",
" deux | \n",
" 7 | \n",
" 9 | \n",
" 1 | \n",
" 5 | \n",
" 3 | \n",
"
\n",
" \n",
" trois | \n",
" -1 | \n",
" 6 | \n",
" -1 | \n",
" -3 | \n",
" -5 | \n",
"
\n",
" \n",
" quatre | \n",
" -1 | \n",
" -6 | \n",
" -4 | \n",
" 7 | \n",
" 9 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" d e a c b\n",
"un -2 -8 -1 8 -7\n",
"deux 7 9 1 5 3\n",
"trois -1 6 -1 -3 -5\n",
"quatre -1 -6 -4 7 9"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"petite_df = pd.DataFrame(np.random.randint(-10, 10, 20).reshape(4, 5),\n",
" columns=['d', 'e', 'a', 'c', 'b'])\n",
"petite_df.index = ['un', 'deux', 'trois', 'quatre']\n",
"petite_df"
]
},
{
"cell_type": "markdown",
"id": "566e48be",
"metadata": {},
"source": [
"Trions la dataframe par la ligne d'index `un` donc `axis=1`"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "9ca44468",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" e | \n",
" b | \n",
" d | \n",
" a | \n",
" c | \n",
"
\n",
" \n",
" \n",
" \n",
" un | \n",
" -8 | \n",
" -7 | \n",
" -2 | \n",
" -1 | \n",
" 8 | \n",
"
\n",
" \n",
" deux | \n",
" 9 | \n",
" 3 | \n",
" 7 | \n",
" 1 | \n",
" 5 | \n",
"
\n",
" \n",
" trois | \n",
" 6 | \n",
" -5 | \n",
" -1 | \n",
" -1 | \n",
" -3 | \n",
"
\n",
" \n",
" quatre | \n",
" -6 | \n",
" 9 | \n",
" -1 | \n",
" -4 | \n",
" 7 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" e b d a c\n",
"un -8 -7 -2 -1 8\n",
"deux 9 3 7 1 5\n",
"trois 6 -5 -1 -1 -3\n",
"quatre -6 9 -1 -4 7"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"petite_df.sort_values(by='un', axis=1)"
]
},
{
"cell_type": "markdown",
"id": "da0269af",
"metadata": {},
"source": [
"Vous voyez que la ligne *'un'* a été triée et la dataframe réorganisée autour de cette ligne en conservant les index."
]
},
{
"cell_type": "markdown",
"id": "9f937df1",
"metadata": {},
"source": [
"## tri des index"
]
},
{
"cell_type": "markdown",
"id": "0d278142",
"metadata": {},
"source": [
"Comme son nom d'indique, nous allons pouvoir trier les index d'une dataframe, et comme toujours, suivant l'axe 0 pour les lignes, et suivant l'axe 1 pour les colonnes."
]
},
{
"cell_type": "markdown",
"id": "416b9056",
"metadata": {},
"source": [
"Par exemple, nous pouvons trier l'index des colonnes (`axis=1`) de `df`."
]
},
{
"cell_type": "markdown",
"id": "1bac7a89",
"metadata": {},
"source": [
"Regardons la dataframe."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "540a168f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" PassengerId | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" Heikkinen, Miss. Laina | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 6 | \n",
" 0 | \n",
" 3 | \n",
" Moran, Mr. James | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 330877 | \n",
" 8.4583 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Survived Pclass \\\n",
"PassengerId \n",
"1 0 3 \n",
"2 1 1 \n",
"3 1 3 \n",
"4 1 1 \n",
"5 0 3 \n",
"6 0 3 \n",
"\n",
" Name Sex Age \\\n",
"PassengerId \n",
"1 Braund, Mr. Owen Harris male 22.0 \n",
"2 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 \n",
"3 Heikkinen, Miss. Laina female 26.0 \n",
"4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
"5 Allen, Mr. William Henry male 35.0 \n",
"6 Moran, Mr. James male NaN \n",
"\n",
" SibSp Parch Ticket Fare Cabin Embarked \n",
"PassengerId \n",
"1 1 0 A/5 21171 7.2500 NaN S \n",
"2 1 0 PC 17599 71.2833 C85 C \n",
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
"4 1 0 113803 53.1000 C123 S \n",
"5 0 0 373450 8.0500 NaN S \n",
"6 0 0 330877 8.4583 NaN Q "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(6)"
]
},
{
"cell_type": "markdown",
"id": "9afe443d",
"metadata": {},
"source": [
"Trions ses noms de colonnes et affichons le header de la dataframe."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c6fe42b4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Age | \n",
" Cabin | \n",
" Embarked | \n",
" Fare | \n",
" Name | \n",
" Parch | \n",
" Pclass | \n",
" Sex | \n",
" SibSp | \n",
" Survived | \n",
" Ticket | \n",
"
\n",
" \n",
" PassengerId | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 22.0 | \n",
" NaN | \n",
" S | \n",
" 7.2500 | \n",
" Braund, Mr. Owen Harris | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
"
\n",
" \n",
" 2 | \n",
" 38.0 | \n",
" C85 | \n",
" C | \n",
" 71.2833 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" 0 | \n",
" 1 | \n",
" female | \n",
" 1 | \n",
" 1 | \n",
" PC 17599 | \n",
"
\n",
" \n",
" 3 | \n",
" 26.0 | \n",
" NaN | \n",
" S | \n",
" 7.9250 | \n",
" Heikkinen, Miss. Laina | \n",
" 0 | \n",
" 3 | \n",
" female | \n",
" 0 | \n",
" 1 | \n",
" STON/O2. 3101282 | \n",
"
\n",
" \n",
" 4 | \n",
" 35.0 | \n",
" C123 | \n",
" S | \n",
" 53.1000 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" 0 | \n",
" 1 | \n",
" female | \n",
" 1 | \n",
" 1 | \n",
" 113803 | \n",
"
\n",
" \n",
" 5 | \n",
" 35.0 | \n",
" NaN | \n",
" S | \n",
" 8.0500 | \n",
" Allen, Mr. William Henry | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Age Cabin Embarked Fare \\\n",
"PassengerId \n",
"1 22.0 NaN S 7.2500 \n",
"2 38.0 C85 C 71.2833 \n",
"3 26.0 NaN S 7.9250 \n",
"4 35.0 C123 S 53.1000 \n",
"5 35.0 NaN S 8.0500 \n",
"\n",
" Name Parch Pclass \\\n",
"PassengerId \n",
"1 Braund, Mr. Owen Harris 0 3 \n",
"2 Cumings, Mrs. John Bradley (Florence Briggs Th... 0 1 \n",
"3 Heikkinen, Miss. Laina 0 3 \n",
"4 Futrelle, Mrs. Jacques Heath (Lily May Peel) 0 1 \n",
"5 Allen, Mr. William Henry 0 3 \n",
"\n",
" Sex SibSp Survived Ticket \n",
"PassengerId \n",
"1 male 1 0 A/5 21171 \n",
"2 female 1 1 PC 17599 \n",
"3 female 0 1 STON/O2. 3101282 \n",
"4 female 1 1 113803 \n",
"5 male 0 0 373450 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_index(axis=1).head()"
]
},
{
"cell_type": "markdown",
"id": "962eabff",
"metadata": {},
"source": [
"Les colonnes ont bien été réordonnées par ordre lexicographique croissant."
]
},
{
"cell_type": "markdown",
"id": "91c1fd2e",
"metadata": {},
"source": [
"Maintenant, trions l'index des lignes (*axis=0*) de `df_sorted1` (parce que celui de `df` est déjà trié)."
]
},
{
"cell_type": "markdown",
"id": "379b8a10",
"metadata": {},
"source": [
"Regardons avant le tri:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "25907f70",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" PassengerId | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 631 | \n",
" 1 | \n",
" 1 | \n",
" Barkworth, Mr. Algernon Henry Wilson | \n",
" male | \n",
" 80.0 | \n",
" 0 | \n",
" 0 | \n",
" 27042 | \n",
" 30.0000 | \n",
" A23 | \n",
" S | \n",
"
\n",
" \n",
" 852 | \n",
" 0 | \n",
" 3 | \n",
" Svensson, Mr. Johan | \n",
" male | \n",
" 74.0 | \n",
" 0 | \n",
" 0 | \n",
" 347060 | \n",
" 7.7750 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 494 | \n",
" 0 | \n",
" 1 | \n",
" Artagaveytia, Mr. Ramon | \n",
" male | \n",
" 71.0 | \n",
" 0 | \n",
" 0 | \n",
" PC 17609 | \n",
" 49.5042 | \n",
" NaN | \n",
" C | \n",
"
\n",
" \n",
" 97 | \n",
" 0 | \n",
" 1 | \n",
" Goldschmidt, Mr. George B | \n",
" male | \n",
" 71.0 | \n",
" 0 | \n",
" 0 | \n",
" PC 17754 | \n",
" 34.6542 | \n",
" A5 | \n",
" C | \n",
"
\n",
" \n",
" 117 | \n",
" 0 | \n",
" 3 | \n",
" Connors, Mr. Patrick | \n",
" male | \n",
" 70.5 | \n",
" 0 | \n",
" 0 | \n",
" 370369 | \n",
" 7.7500 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
" 673 | \n",
" 0 | \n",
" 2 | \n",
" Mitchell, Mr. Henry Michael | \n",
" male | \n",
" 70.0 | \n",
" 0 | \n",
" 0 | \n",
" C.A. 24580 | \n",
" 10.5000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Survived Pclass Name Sex \\\n",
"PassengerId \n",
"631 1 1 Barkworth, Mr. Algernon Henry Wilson male \n",
"852 0 3 Svensson, Mr. Johan male \n",
"494 0 1 Artagaveytia, Mr. Ramon male \n",
"97 0 1 Goldschmidt, Mr. George B male \n",
"117 0 3 Connors, Mr. Patrick male \n",
"673 0 2 Mitchell, Mr. Henry Michael male \n",
"\n",
" Age SibSp Parch Ticket Fare Cabin Embarked \n",
"PassengerId \n",
"631 80.0 0 0 27042 30.0000 A23 S \n",
"852 74.0 0 0 347060 7.7750 NaN S \n",
"494 71.0 0 0 PC 17609 49.5042 NaN C \n",
"97 71.0 0 0 PC 17754 34.6542 A5 C \n",
"117 70.5 0 0 370369 7.7500 NaN Q \n",
"673 70.0 0 0 C.A. 24580 10.5000 NaN S "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_sorted1.head(6)"
]
},
{
"cell_type": "markdown",
"id": "8d051729",
"metadata": {},
"source": [
"Trions et regardons après le tri."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "c27814a9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" PassengerId | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" Heikkinen, Miss. Laina | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Survived Pclass \\\n",
"PassengerId \n",
"1 0 3 \n",
"2 1 1 \n",
"3 1 3 \n",
"4 1 1 \n",
"5 0 3 \n",
"\n",
" Name Sex Age \\\n",
"PassengerId \n",
"1 Braund, Mr. Owen Harris male 22.0 \n",
"2 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 \n",
"3 Heikkinen, Miss. Laina female 26.0 \n",
"4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
"5 Allen, Mr. William Henry male 35.0 \n",
"\n",
" SibSp Parch Ticket Fare Cabin Embarked \n",
"PassengerId \n",
"1 1 0 A/5 21171 7.2500 NaN S \n",
"2 1 0 PC 17599 71.2833 C85 C \n",
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
"4 1 0 113803 53.1000 C123 S \n",
"5 0 0 373450 8.0500 NaN S "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_sorted1.sort_index(axis=0).head()"
]
},
{
"cell_type": "markdown",
"id": "d22ff67a",
"metadata": {},
"source": [
"Voilà l'index `PassengerId` tout bien réordonné !"
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "all,-hidden,-heading_collapsed",
"cell_metadata_json": true,
"notebook_metadata_filter": "all,-language_info,-toc,-jupytext.text_representation.jupytext_version,-jupytext.text_representation.format_version",
"text_representation": {
"extension": ".md",
"format_name": "myst"
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
},
"notebookname": "regroupements dans une dataframe",
"source_map": [
15,
24,
28,
31,
35,
43,
47,
51,
55,
59,
62,
68,
72,
74,
78,
84,
88,
90,
94,
98,
104,
110,
114,
118,
122,
126,
131,
135,
137,
141,
145,
149,
153,
157,
159,
163,
165,
169,
173,
177,
179,
183,
185
],
"version": "1.0"
},
"nbformat": 4,
"nbformat_minor": 5
}