4 dplyr 排序

在 tidyverse 中，dplyr 是清理数据的最重要的包。下面，我将从最简单的排序函数开始介绍。

我们通常需要手动排序变量。而在排序个案时，因为变量具有算术性，所以我们可以根据某个（或某些）变量的取值，对个案进行快速排序。

排序列（变量）
- relocate() 把变量手动挪到开头
  - .before 和 .after 精确地把变量挪到某个位置
排序行（个案）
- arrange() 根据变量取值排序个案

4.1 `relocate()`

让我们回到前面摘出来的示例数据。

library(tidyverse)

diamonds # 示例数据

# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows

接下来，我将运行一些 relocate() 函数的示例，请特别关注列的顺序变化。

relocate(diamonds, price) # 把price变量提到最前，其余顺序不变

# A tibble: 53,940 × 10
   price carat cut       color clarity depth table     x     y     z
   <int> <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <dbl> <dbl> <dbl>
 1   326  0.23 Ideal     E     SI2      61.5    55  3.95  3.98  2.43
 2   326  0.21 Premium   E     SI1      59.8    61  3.89  3.84  2.31
 3   327  0.23 Good      E     VS1      56.9    65  4.05  4.07  2.31
 4   334  0.29 Premium   I     VS2      62.4    58  4.2   4.23  2.63
 5   335  0.31 Good      J     SI2      63.3    58  4.34  4.35  2.75
 6   336  0.24 Very Good J     VVS2     62.8    57  3.94  3.96  2.48
 7   336  0.24 Very Good I     VVS1     62.3    57  3.95  3.98  2.47
 8   337  0.26 Very Good H     SI1      61.9    55  4.07  4.11  2.53
 9   337  0.22 Fair      E     VS2      65.1    61  3.87  3.78  2.49
10   338  0.23 Very Good H     VS1      59.4    61  4     4.05  2.39
# ℹ 53,930 more rows

4.2 `.before` 和 `.after`

relocate(diamonds, price, .before = cut) # 把price提到cut之前，其余顺序不变

# A tibble: 53,940 × 10
   carat price cut       color clarity depth table     x     y     z
   <dbl> <int> <ord>     <ord> <ord>   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  0.23   326 Ideal     E     SI2      61.5    55  3.95  3.98  2.43
 2  0.21   326 Premium   E     SI1      59.8    61  3.89  3.84  2.31
 3  0.23   327 Good      E     VS1      56.9    65  4.05  4.07  2.31
 4  0.29   334 Premium   I     VS2      62.4    58  4.2   4.23  2.63
 5  0.31   335 Good      J     SI2      63.3    58  4.34  4.35  2.75
 6  0.24   336 Very Good J     VVS2     62.8    57  3.94  3.96  2.48
 7  0.24   336 Very Good I     VVS1     62.3    57  3.95  3.98  2.47
 8  0.26   337 Very Good H     SI1      61.9    55  4.07  4.11  2.53
 9  0.22   337 Fair      E     VS2      65.1    61  3.87  3.78  2.49
10  0.23   338 Very Good H     VS1      59.4    61  4     4.05  2.39
# ℹ 53,930 more rows

relocate(diamonds, price, .after = cut) # 把price提到cut之后，其余顺序不变

# A tibble: 53,940 × 10
   carat cut       price color clarity depth table     x     y     z
   <dbl> <ord>     <int> <ord> <ord>   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  0.23 Ideal       326 E     SI2      61.5    55  3.95  3.98  2.43
 2  0.21 Premium     326 E     SI1      59.8    61  3.89  3.84  2.31
 3  0.23 Good        327 E     VS1      56.9    65  4.05  4.07  2.31
 4  0.29 Premium     334 I     VS2      62.4    58  4.2   4.23  2.63
 5  0.31 Good        335 J     SI2      63.3    58  4.34  4.35  2.75
 6  0.24 Very Good   336 J     VVS2     62.8    57  3.94  3.96  2.48
 7  0.24 Very Good   336 I     VVS1     62.3    57  3.95  3.98  2.47
 8  0.26 Very Good   337 H     SI1      61.9    55  4.07  4.11  2.53
 9  0.22 Fair        337 E     VS2      65.1    61  3.87  3.78  2.49
10  0.23 Very Good   338 H     VS1      59.4    61  4     4.05  2.39
# ℹ 53,930 more rows

4.3 `arrange()`

在整洁数据中，对个案排序需要借助变量。

在小学里，根据学号（变量）排序，变的是小学生（个案）的顺序。
在网购时，根据商品价格（变量）排序，变的是商品（个案）的顺序。
在12306买票时，根据列车发车时间（变量）排序，变的是列车（个案）的顺序。

重要

变量的每个值都是一条个案，所以对单个变量排序的本质是改变个案顺序！

这里我们借助了变量的统计能力（算术性）。反过来说，因为个案不具备统计能力，所以我们无法对变量做类似的排序，只能手动调整它们的位置。

arrange(diamonds, price) # 根据price变量升序（从低到高）

# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows

arrange(diamonds, -price) # 根据price变量降序（从高到低）

# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  2.29 Premium   I     VS2      60.8    60 18823  8.5   8.47  5.16
 2  2    Very Good G     SI1      63.5    56 18818  7.9   7.97  5.04
 3  1.51 Ideal     G     IF       61.7    55 18806  7.37  7.41  4.56
 4  2.07 Ideal     G     SI2      62.5    55 18804  8.2   8.13  5.11
 5  2    Very Good H     SI1      62.8    57 18803  7.95  8     5.01
 6  2.29 Premium   I     SI1      61.8    59 18797  8.52  8.45  5.24
 7  2.04 Premium   H     SI1      58.1    60 18795  8.37  8.28  4.84
 8  2    Premium   I     VS1      60.8    59 18795  8.13  8.02  4.91
 9  1.71 Premium   F     VS2      62.3    59 18791  7.57  7.53  4.7 
10  2.15 Ideal     G     SI2      62.6    54 18791  8.29  8.35  5.21
# ℹ 53,930 more rows

4.1 relocate()

4.2 .before 和 .after

4.3 arrange()

4.1 `relocate()`

4.2 `.before` 和 `.after`

4.3 `arrange()`