3  dplyr 浏览

3.1 print()

当你键入一个对象,而不对它进行任何操作时,R 会自动帮你套上一个 print() 函数。请看下面的例子:

1 + 1
[1] 2
print(1 + 1)
[1] 2

同理,我们阅读 diamonds 的时候,其实是看它打印出来的样子:

library(tidyverse)

diamonds
# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows
print(diamonds) # equivalent
# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows

所以,我们通常不会主动使用 print()

我们还可以用 statart 包的 print_headtail()print_interval() 函数。前者可以打印一个数据的开头几行和结尾几行,后者可以等间距地抽几行(有点像系统抽样)打印:

library(statart)

print_headtail(diamonds)
# A tibble: 53,940 × 10
      carat cut       color clarity depth table price     x     y     z
      <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
    1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
    2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
    3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
    4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
    5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
53936  0.72 Ideal     D     SI1      60.8    57  2757  5.75  5.76  3.5 
53937  0.72 Good      D     SI1      63.1    55  2757  5.69  5.75  3.61
53938  0.7  Very Good D     SI1      62.8    60  2757  5.66  5.68  3.56
53939  0.86 Premium   H     SI2      61      58  2757  6.15  6.12  3.74
53940  0.75 Ideal     D     SI2      62.2    55  2757  5.83  5.87  3.64
# ℹ 53,930 more rows in the middle
# ℹ Use `print_headtail(n = ...)` to see more rows
print_headtail(diamonds, n = 20)
# A tibble: 53,940 × 10
      carat cut       color clarity depth table price     x     y     z
      <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
    1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
    2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
    3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
    4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
    5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
    6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
    7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
    8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
    9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
   10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
53931  0.71 Premium   E     SI1      60.5    55  2756  5.79  5.74  3.49
53932  0.71 Premium   F     SI1      59.8    62  2756  5.74  5.73  3.43
53933  0.7  Very Good E     VS2      60.5    59  2757  5.71  5.76  3.47
53934  0.7  Very Good E     VS2      61.2    59  2757  5.69  5.72  3.49
53935  0.72 Premium   D     SI1      62.7    59  2757  5.69  5.73  3.58
53936  0.72 Ideal     D     SI1      60.8    57  2757  5.75  5.76  3.5 
53937  0.72 Good      D     SI1      63.1    55  2757  5.69  5.75  3.61
53938  0.7  Very Good D     SI1      62.8    60  2757  5.66  5.68  3.56
53939  0.86 Premium   H     SI2      61      58  2757  6.15  6.12  3.74
53940  0.75 Ideal     D     SI2      62.2    55  2757  5.83  5.87  3.64
# ℹ 53,920 more rows in the middle
# ℹ Use `print_headtail(n = ...)` to see more rows
print_interval(diamonds)
# A tibble: 53,940 × 10
      carat cut       color clarity depth table price     x     y     z
      <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
    1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 5994  0.71 Ideal     G     VVS1     61.1    57  3955  5.76  5.8   3.53
11987  1.09 Ideal     F     SI2      61.6    55  5143  6.59  6.65  4.08
17981  1.7  Ideal     H     SI2      62.1    57  7273  7.68  7.63  4.75
23974  1.53 Ideal     F     SI1      61.6    56 12109  7.39  7.34  4.54
29967  0.31 Good      H     SI1      63.6    57   446  4.32  4.33  2.75
35960  0.33 Very Good H     SI1      63      57   475  4.39  4.41  2.77
41954  0.23 Very Good E     VVS2     61.6    61   505  3.92  3.97  2.43
47947  0.71 Very Good J     SI1      63.5    58  1917  5.63  5.67  3.59
53940  0.75 Ideal     D     SI2      62.2    55  2757  5.83  5.87  3.64
# ℹ 53,930 more rows between the intervals
# ℹ Use `print_interval(n = ...)` to see more rows
print_interval(diamonds, n = 20)
# A tibble: 53,940 × 10
      carat cut       color clarity depth table price     x     y     z
      <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
    1  0.23 Ideal     E     SI2      61.5  55     326  3.95  3.98  2.43
 2840  0.9  Good      G     SI2      58.4  55    3269  6.34  6.39  3.72
 5679  1.01 Very Good I     SI2      61.8  60    3885  6.34  6.37  3.93
 8518  1.04 Ideal     F     SI1      62.1  56    4426  6.54  6.47  4.04
11357  0.99 Very Good D     SI2      62.5  57    4993  6.3   6.34  3.95
14195  1.06 Ideal     F     SI1      62.1  57    5758  6.53  6.51  4.05
17034  1    Ideal     F     VS2      62.6  57    6804  6.4   6.35  3.99
19873  1.52 Premium   J     VS1      62.2  59    8426  7.32  7.38  4.57
22712  1.53 Premium   I     VS1      62.4  59   10729  7.3   7.34  4.57
25551  1.6  Very Good G     VS2      61    57   14383  7.55  7.59  4.62
28390  0.33 Ideal     H     VS2      60.7  57     668  4.48  4.45  2.71
31229  0.32 Good      D     SI1      63.6  56     756  4.37  4.34  2.77
34068  0.3  Premium   D     VS2      62    62     851  4.27  4.24  2.64
36907  0.42 Ideal     G     VVS2     62.3  53     961  4.83  4.86  3.02
39746  0.53 Ideal     G     SI2      62.4  56    1093  5.18  5.14  3.22
42584  0.52 Ideal     E     SI1      61.4  57    1330  5.15  5.18  3.17
45423  0.55 Ideal     E     SI1      61.4  55    1668  5.29  5.26  3.24
48262  0.53 Ideal     G     VVS2     60.8  57    1955  5.24  5.28  3.2 
51101  0.56 Very Good G     VVS2     62.1  55.6  2336  5.29  5.31  3.29
53940  0.75 Ideal     D     SI2      62.2  55    2757  5.83  5.87  3.64
# ℹ 53,920 more rows between the intervals
# ℹ Use `print_interval(n = ...)` to see more rows

3.3 names()ds()

library(tidyverse)

# 罗列变量名
names(diamonds)
 [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
 [8] "x"       "y"       "z"      
names_as_column(diamonds)
# A tibble: 10 × 1
   name   
   <chr>  
 1 carat  
 2 cut    
 3 color  
 4 clarity
 5 depth  
 6 table  
 7 price  
 8 x      
 9 y      
10 z      
ds(diamonds, 1:5)
[1] "carat"   "cut"     "color"   "clarity" "depth"  
ds_as_column(diamonds, 1:5)
# A tibble: 5 × 1
  name   
  <chr>  
1 carat  
2 cut    
3 color  
4 clarity
5 depth  

3.4 glimpse()

# 浏览变量列表,以及开头的若干个案
glimpse(diamonds)
Rows: 53,940
Columns: 10
$ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.…
$ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver…
$ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,…
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, …
$ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64…
$ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58…
$ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34…
$ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.…
$ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.…
$ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.…

3.5 view()

# 打开 Excel 式的数据表
view(diamonds)

# browse() 功能更强大,可以选择特定变量
browse(diamonds, 1:3)

这里因为条件的限制无法演示,就在下面贴一些截图吧。大家可以在自己的 RStudio 里面运行代码,尝试一下。

图 3.1: 打开 viewer 界面

图 3.2: 搜索 “1000”

图 3.3: 根据 price 降序

3.6 codebook()

library(statart)

# 查看变量基本信息
codebook(diamonds)
# A tibble: 10 × 4
   variable type        n unique
   <chr>    <chr>   <int>  <int>
 1 carat    double  53940    273
 2 cut      ordered 53940      5
 3 color    ordered 53940      7
 4 clarity  ordered 53940      8
 5 depth    double  53940    184
 6 table    double  53940    127
 7 price    integer 53940  11602
 8 x        double  53940    554
 9 y        double  53940    552
10 z        double  53940    375
# 查看变量详细信息
codebook_detail(diamonds)

id

name

type

missings

values

n

prop

row_id

character

character

character

character

character

character

character

integer

1

carat

numeric

0 (0.0%)

[0.2, 5.01]

53940

1

1

2

cut

ordinal

0 (0.0%)

Fair

1610

3.0%

2

Good

4906

9.1%

2

Very Good

12082

22.4%

2

Premium

13791

25.6%

2

Ideal

21551

40.0%

2

2

3

color

ordinal

0 (0.0%)

D

6775

12.6%

3

E

9797

18.2%

3

F

9542

17.7%

3

G

11292

20.9%

3

H

8304

15.4%

3

I

5422

10.1%

3

J

2808

5.2%

3

3

4

clarity

ordinal

0 (0.0%)

I1

741

1.4%

4

SI2

9194

17.0%

4

SI1

13065

24.2%

4

VS2

12258

22.7%

4

VS1

8171

15.1%

4

VVS2

5066

9.4%

4

VVS1

3655

6.8%

4

IF

1790

3.3%

4

4

5

depth

numeric

0 (0.0%)

[43, 79]

53940

5

5

6

table

numeric

0 (0.0%)

[43, 95]

53940

6

6

7

price

integer

0 (0.0%)

[326, 18823]

53940

7

7

8

x

numeric

0 (0.0%)

[0, 10.74]

53940

8

8

9

y

numeric

0 (0.0%)

[0, 58.9]

53940

9

9

10

z

numeric

0 (0.0%)

[0, 31.8]

53940

10

10

n: 37