# 2.4 R 的数据结构

R 具有多种数据结构，了解 R 中常见的数据结构以及如何使用它们是至关重要的。

### 2.4.1 向量

向量是一个核心的数据结构。它可以理解为一个相同类型 (数字，字符或逻辑) 的元素列表。稍后你将看到表的每一列都将表示为向量。R 可以轻松直观地处理向量，可以用 `c()` 函数创建向量，但这不是唯一的方法。对向量的操作将作用于向量的所有元素。

```r
x<-c(1,3,2,10,5)    #创建含有5个元素的向量
x = c(1,3,2,10,5)
x
```

```
## [1]  1  3  2 10  5
```

```r
y<-1:5              #创建一个包含5个连续整数的向量
y+2                 #加法运算
```

```
## [1] 3 4 5 6 7
```

```r
2*y                 #乘法运算
```

```
## [1]  2  4  6  8 10
```

```r
y^2                 #对每个元素平方运算
```

```
## [1]  1  4  9 16 25
```

```r
2^y                 #对数字2进行对应元素的次方运算
```

```
## [1]  2  4  8 16 32
```

```r
y                   #y 本身并不会被改变
```

```
## [1] 1 2 3 4 5
```

```r
y<-y*2
y                   #此时y发生了变化
```

```
## [1]  2  4  6  8 10
```

```r
r1<-rep(1,3)        # 创造一个长度为3的向量
length(r1)           #向量长度
```

```
## [1] 3
```

```r
class(r1)            # 向量类型
```

```
## [1] "numeric"
```

```r
a<-1                # 实际这是一个长度为1的向量
```

### 2.4.2 矩阵

矩阵是指由行和列组成的数字数组。你可以将其视为向量的一个叠加版本，其中每行或每列都是向量。创建矩阵的最简单方法之一是使用 `cbind()` 组合相等长度的向量，其含义是 “通过列合并”。

```r
x<-c(1,2,3,4)
y<-c(4,5,6,7)
m1<-cbind(x,y);m1
```

```
##      x y
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
## [4,] 4 7
```

```r
t(m1)                #转置
```

```
##   [,1] [,2] [,3] [,4]
## x    1    2    3    4
## y    4    5    6    7
```

```r
dim(m1)              # 返回对象维度
```

```
## [1] 4 2
```

你也可以直接列出元素并指定矩阵:

```r
m2<-matrix(c(1,3,2,5,-1,2,2,3,9),nrow=3)
m2
```

```
##      [,1] [,2] [,3]
## [1,]    1    5    2
## [2,]    3   -1    3
## [3,]    2    2    9
```

矩阵和另一个数据结构数据框都是是表格型的数据结构，你可以按需提取行和列提供给子集。图 2.1 显示了它们是如何工作的。

![](https://kaopubear-1254299507.cos.ap-shanghai.myqcloud.com/picgo/20200718190453.png) 图 2.1: 从矩阵中提取子集

### 2.4.3 数据框（Data Frames）

数据框比框矩阵更加通用，因为不同列可以是不同的数据类型 (数字，字符，因子等)。可以通过 `data.frame()` 函数构造数据框。接下来我们说明了如何从基因组区段或者坐标构建数据框。

```r
chr <- c("chr1", "chr1", "chr2", "chr2")
strand <- c("-","-","+","+")
start<- c(200,4000,100,400)
end<-c(250,410,200,450)
mydata <- data.frame(chr,start,end,strand)
#修改列名
names(mydata) <- c("chr","start","end","strand")
mydata
```

```
##    chr start end strand
## 1 chr1   200 250      -
## 2 chr1  4000 410      -
## 3 chr2   100 200      +
## 4 chr2   400 450      +
```

```r
# 另一种方法
mydata <- data.frame(chr=chr,start=start,end=end,strand=strand)
mydata
```

```
##    chr start end strand
## 1 chr1   200 250      -
## 2 chr1  4000 410      -
## 3 chr2   100 200      +
## 4 chr2   400 450      +
```

有多种方法可以提取数据框的元素，你可以使用列数或列名来提取某些列，也可以使用行号提取某些行，还可以使用逻辑参数来提取数据，例如提取列中值大于某个阈值的所有行。

```r
mydata[,2:4] # 提取2-4列
```

```
##   start end strand
## 1   200 250      -
## 2  4000 410      -
## 3   100 200      +
## 4   400 450      +
```

```r
mydata[,c("chr","start")] # 提取chr和start两列
```

```
##    chr start
## 1 chr1   200
## 2 chr1  4000
## 3 chr2   100
## 4 chr2   400
```

```r
mydata$start # 数据框中的start变量
```

```
## [1]  200 4000  100  400
```

```r
mydata[c(1,3),] # 提取第一和第三行
```

```
##    chr start end strand
## 1 chr1   200 250      -
## 3 chr2   100 200      +
```

```r
mydata[mydata$start>400,] # 提取所有start大于400的行
```

```
##    chr start end strand
## 2 chr1  4000 410      -
```

### 2.4.4 列表

列表可以理解为对象 (组件) 的有序集合。列表允许你收集各种 (可能不相关的) 对象。

```r
# 具有4个成分的列表事例
# 字符串，数值向量，矩阵和标量
w <- list(name="Fred",
       mynumbers=c(1,2,3),
       mymatrix=matrix(1:4,ncol=2),
       age=5.3)
w
```

```
## $name
## [1] "Fred"
##
## $mynumbers
## [1] 1 2 3
##
## $mymatrix
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
##
## $age
## [1] 5.3
```

您可以使用 `[]` 用列表中的位置或名称提取列表的元素。

```r
w[[3]] # 列表的第三个元素
```

```
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
```

```r
w[["mynumbers"]] # 名字为 mynumbers 的元素
```

```
## [1] 1 2 3
```

```r
w$age
```

```
## [1] 5.3
```

### 2.4.5 因子

因子用于存储分类数据，它们对于统计建模很重要，因为分类变量在统计模型中与连续变量会被区别对待。这确保了在统计模型中可以正确地处理分类数据。

```r
features=c("promoter","exon","intron")
f.feat=factor(features)
```

需要注意的一点是，当你使用`read.table()` 来读取数据框或者使用 `data.frame()` 来创造数据框时，字符列默认被存储为因子，如果想要修改这个默认设置可以在这两个函数中设定 `stringsasfactor = FALSE` 。

> 译者注：R 4.0 版本开始，默认设置已经为 `stringsasfactor = FALSE`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://compgenomr.kaopubear.top/02/untitled-1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.