zoukankan      html  css  js  c++  java
  • ipvs--eBPF转载

    https://lwn.net/Articles/740157/

    http://arthurchiao.art/blog/ebpf-and-k8s-zh/

    https://elixir.bootlin.com/linux/v4.15/source/tools/bpf/bpftool/Documentation/bpftool.rst

    https://elixir.bootlin.com/linux/v4.15/source/tools/bpf/bpftool

    参考书籍:https://files.cnblogs.com/files/codestack/OReilly-Linux-Observability-with-BPF-2019.rar

    腾讯云在其使用内核版本中实现了ipvs+ebpf 提升性能 , 解决了iptable-snat对其性能的影响

    patch如下:

    https://github.com/Tencent/TencentOS-kernel/blob/master/net/netfilter/ipvs/ip_vs_bpf_proc.c#L56

    https://github.com/Tencent/TencentOS-kernel/blob/master/net/netfilter/ipvs/ip_vs_bpf_proc.c#L402

    https://github.com/Tencent/TencentOS-kernel/commit/d1a590ff8758af2fc1055899c9bf49f3028243d5#diff-1a592df245508d5f30565cf2860fa670bf1ae15b0d69d07e5328a0b880fc1024

    https://cloud.tencent.com/developer/inventory/600/article/1685332

    在腾讯云的技术实现细节中,就有通过IPVS模块向BPF Map插入数据的逻辑,其关键代码如下所示:

    // https://github.com/Tencent/TencentOS-kernel/blob/master/net/netfilter/ipvs/ip_vs_bpf_proc.c#L56
    struct bpf_map  *conntrack_map;
    
    
    // https://github.com/Tencent/TencentOS-kernel/blob/master/net/netfilter/ipvs/ip_vs_bpf_proc.c#L402
    static ssize_t ip_vs_bpf_write(struct file *file,
                       const char __user *ubuf,
                       size_t count,
                       loff_t *ppos)
    {
        int err = 0;
        struct bpf_map *map = NULL;
        struct bpf_prog *prog1 = NULL;
        struct bpf_prog *prog2 = NULL;
        const char delim[2] = ":";
        char ids[5][20];
        char *token;
        int tag, pid;
        unsigned int mapid, progid1, progid2;
        char buf[100];
        int i = 0;
        char *s = buf;
    ...
    
        /* singleton:conntrack_map is assigned once,
         * and be nulled in module exit
         */
        if (conntrack_map) {
            pr_err("%s %d conntrack_map exists
    ",
                   __func__, __LINE__);
            return -EEXIST;
        }
    ...
    
        err = bpf_conntrack_map_get(pid, mapid,
                        (long long)(resolve_addrs.bpf_map_fops),
                        &map);
        if (err != 0 || !map) {
            pr_err("%s acquire bpf_map failed
    ", __func__);
            return -EINVAL;
        }
    ...
    
        bpf_map_fd = mapid;
    ...
        // got the target bpf map via fd
        conntrack_map = map;
    ...
    }
    
    // https://github.com/Tencent/TencentOS-kernel/blob/master/net/netfilter/ipvs/ip_vs_bpf_proc.c#L306
    static int bpf_conntrack_map_get(int pid,
                     unsigned int fd,
                     unsigned long long addr,
                     struct bpf_map **map)
    {
        struct files_struct *files;
        struct task_struct *task;
        struct file *file;
        int err = 0;
    
    ...
        file = fcheck_files(files, fd);
    ...
        *map = file->private_data;
        if (*map)
            bpf_map_inc2(*map);
    ...
    }

    简单说明下ip_vs_bpf_proc.c文件:

    • ip_vs_bpf_proc.c这个文件开头声明了一个bpf_map结构(来自内核bpf体系)的变量conntrack_map,这个就是将来被IPVS操作的BPF Map对象。
    • 同样是在ip_vs_bpf_proc.c这个文件里,函数ip_vs_bpf_write()内调用了bpf_conntrack_map_get()方法,而它的入参之一mapid在这里就是BPF Map的fd,这个方法内容就是通过fd找到了背后相关数据,并序列化为bpf_map结构的对象,最终赋值给conntrack_map,完成目标BPF Map的获取。

    然后在IPVS模块创建新connection时,对上面conntrack_map插入了必要的元素,关键代码如下所示:

    // https://github.com/Tencent/TencentOS-kernel/blob/master/net/netfilter/ipvs/ip_vs_conn.c#L1098
    static bool ip_vs_conn_new_bpf(struct ip_vs_dest *dest,
                       unsigned int flags,
                       const struct ip_vs_conn_param *p,
                       int *skip)
    {
        int i;
        struct ip_vs_service *svc;
        struct bpf_lb_conn_key key = {};
        struct bpf_lb_conn_key reply_key = {};
        struct bpf_lb_conn_value value = {};
        struct bpf_lb_conn_value reply_value = {};
        struct bpf_lb_conn_value *v;
        int inserted = 0;
        struct bpf_map *map;
    
    ...
        if (!bpf_mode_on)
            return true;
    ...
        // conntrack_map is the same one
        map = conntrack_map;
    ...
        lip = alloc_localip();
        reply_key.sip = key.dip;
        reply_key.sport = key.dport;
        reply_key.dip = lip;
        reply_key.dport = key.sport;
        reply_key.proto = p->protocol;
        reply_key.vip = 0;
        reply_key.vport = 0;
        reply_key.pad = 0;
    
        atomic_set(&reply_value.ref, 0);
        reply_value.sip = key.dip;
        reply_value.sport = key.dport;
        reply_value.dip = key.sip;
        reply_value.dport = key.sport;
        reply_value.proto = p->protocol;
    ...
        if (likely(!map->ops->map_lookup_elem(map, &reply_key))) {
            if (likely(map->ops->map_update_elem(map,
                                     &reply_key,
                                     &reply_value,
                                     BPF_ANY) == 0)) {
                    /* the common case! break the loop */
                    inserted = 1;
                    nf_conntrack_single_unlock(&reply_key,
                                   map->key_size);
                    break;
                }
                /* if lookup ok, shall insert ok since lock is held!*/
                pr_err("map insert key failed
    ");
    
                BPF_STAT_INC(p->ipvs, BPF_NEW_INSERT);
                nf_conntrack_single_unlock(&reply_key, map->key_size);
                return false;
            }
    ...
    }

    上面的代码里使用了map_lookup_elem()map_update_elem()两个操作BPF Map的方法——先查询后更新(插入新的元素),完成对BPF Map的操作

  • 相关阅读:
    布局管理
    菜单和工具栏
    Hello World,本章学习Qt的基本功能
    PyQt5中文手册
    qemu-kvm使用
    查看内核日志
    tmux快捷键
    vim插件YouCompleteMe安装
    git submodule删除多余模块
    vim ctags
  • 原文地址:https://www.cnblogs.com/codestack/p/14006137.html
Copyright © 2011-2022 走看看