eBPF Syscall¶
- Authors:
Alexei Starovoitov <ast@kernel.org>
Joe Stringer <joe@wand.net.nz>
Michael Kerrisk <mtk.manpages@gmail.com>
The primary info for the bpf syscall is available in the man-pages for bpf(2).
bpf() subcommand reference¶
The operation to be performed by the bpf() system call is determined by the cmd argument. Each operation takes an accompanying argument, provided via attr, which is a pointer to a union of type bpf_attr (see below). The size argument is the size of the union pointed to by attr.
- BPF_MAP_CREATE
- Description
Create a map and return a file descriptor that refers to the map. The close-on-exec file descriptor flag (see fcntl(2)) is automatically enabled for the new file descriptor.
Applying close(2) to the file descriptor returned by BPF_MAP_CREATE will delete the map (but see NOTES).
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_MAP_LOOKUP_ELEM
- Description
Look up an element with a given key in the map referred to by the file descriptor map_fd.
The flags argument may be specified as one of the following:
- BPF_F_LOCK
Look up the value of a spin-locked map without returning the lock. This must be specified if the elements contain a spinlock.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_MAP_UPDATE_ELEM
- Description
Create or update an element (key/value pair) in a specified map.
The flags argument should be specified as one of the following:
- BPF_ANY
Create a new element or update an existing element.
- BPF_NOEXIST
Create a new element only if it did not exist.
- BPF_EXIST
Update an existing element.
- BPF_F_LOCK
Update a spin_lock-ed map element.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
May set errno to EINVAL, EPERM, ENOMEM, E2BIG, EEXIST, or ENOENT.
- E2BIG
The number of elements in the map reached the max_entries limit specified at map creation time.
- EEXIST
If flags specifies BPF_NOEXIST and the element with key already exists in the map.
- ENOENT
If flags specifies BPF_EXIST and the element with key does not exist in the map.
- BPF_MAP_DELETE_ELEM
- Description
Look up and delete an element by key in a specified map.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_MAP_GET_NEXT_KEY
- Description
Look up an element by key in a specified map and return the key of the next element. Can be used to iterate over all elements in the map.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
The following cases can be used to iterate over all elements of the map:
If key is not found, the operation returns zero and sets the next_key pointer to the key of the first element.
If key is found, the operation returns zero and sets the next_key pointer to the key of the next element.
If key is the last element, returns -1 and errno is set to ENOENT.
May set errno to ENOMEM, EFAULT, EPERM, or EINVAL on error.
- BPF_PROG_LOAD
- Description
Verify and load an eBPF program, returning a new file descriptor associated with the program.
Applying close(2) to the file descriptor returned by BPF_PROG_LOAD will unload the eBPF program (but see NOTES).
The close-on-exec file descriptor flag (see fcntl(2)) is automatically enabled for the new file descriptor.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_OBJ_PIN
- Description
Pin an eBPF program or map referred by the specified bpf_fd to the provided pathname on the filesystem.
The pathname argument must not contain a dot (“.”).
On success, pathname retains a reference to the eBPF object, preventing deallocation of the object when the original bpf_fd is closed. This allow the eBPF object to live beyond close(bpf_fd), and hence the lifetime of the parent process.
Applying unlink(2) or similar calls to the pathname unpins the object from the filesystem, removing the reference. If no other file descriptors or filesystem nodes refer to the same object, it will be deallocated (see NOTES).
The filesystem type for the parent directory of pathname must be BPF_FS_MAGIC.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_OBJ_GET
- Description
Open a file descriptor for the eBPF object pinned to the specified pathname.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_PROG_ATTACH
- Description
Attach an eBPF program to a target_fd at the specified attach_type hook.
The attach_type specifies the eBPF attachment point to attach the program to, and must be one of bpf_attach_type (see below).
The attach_bpf_fd must be a valid file descriptor for a loaded eBPF program of a cgroup, flow dissector, LIRC, sockmap or sock_ops type corresponding to the specified attach_type.
The target_fd must be a valid file descriptor for a kernel object which depends on the attach type of attach_bpf_fd:
BPF_PROG_TYPE_CGROUP_DEVICE, BPF_PROG_TYPE_CGROUP_SKB, BPF_PROG_TYPE_CGROUP_SOCK, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_PROG_TYPE_SOCK_OPS
Control Group v2 hierarchy with the eBPF controller enabled. Requires the kernel to be compiled with CONFIG_CGROUP_BPF.
BPF_PROG_TYPE_FLOW_DISSECTOR
Network namespace (eg /proc/self/ns/net).
BPF_PROG_TYPE_LIRC_MODE2
LIRC device path (eg /dev/lircN). Requires the kernel to be compiled with CONFIG_BPF_LIRC_MODE2.
BPF_PROG_TYPE_SK_SKB, BPF_PROG_TYPE_SK_MSG
eBPF map of socket type (eg BPF_MAP_TYPE_SOCKHASH).
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_PROG_DETACH
- Description
Detach the eBPF program associated with the target_fd at the hook specified by attach_type. The program must have been previously attached using BPF_PROG_ATTACH.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_PROG_TEST_RUN
- Description
Run the eBPF program associated with the prog_fd a repeat number of times against a provided program context ctx_in and data data_in, and return the modified program context ctx_out, data_out (for example, packet data), result of the execution retval, and duration of the test run.
The sizes of the buffers provided as input and output parameters ctx_in, ctx_out, data_in, and data_out must be provided in the corresponding variables ctx_size_in, ctx_size_out, data_size_in, and/or data_size_out. If any of these parameters are not provided (ie set to NULL), the corresponding size field must be zero.
Some program types have particular requirements:
- BPF_PROG_TYPE_SK_LOOKUP
data_in and data_out must be NULL.
BPF_PROG_TYPE_RAW_TRACEPOINT, BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE
ctx_out, data_in and data_out must be NULL. repeat must be zero.
BPF_PROG_RUN is an alias for BPF_PROG_TEST_RUN.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- ENOSPC
Either data_size_out or ctx_size_out is too small.
- ENOTSUPP
This command is not supported by the program type of the program referred to by prog_fd.
- BPF_PROG_GET_NEXT_ID
- Description
Fetch the next eBPF program currently loaded into the kernel.
Looks for the eBPF program with an id greater than start_id and updates next_id on success. If no other eBPF programs remain with ids higher than start_id, returns -1 and sets errno to ENOENT.
- Return
Returns zero on success. On error, or when no id remains, -1 is returned and errno is set appropriately.
- BPF_MAP_GET_NEXT_ID
- Description
Fetch the next eBPF map currently loaded into the kernel.
Looks for the eBPF map with an id greater than start_id and updates next_id on success. If no other eBPF maps remain with ids higher than start_id, returns -1 and sets errno to ENOENT.
- Return
Returns zero on success. On error, or when no id remains, -1 is returned and errno is set appropriately.
- BPF_PROG_GET_FD_BY_ID
- Description
Open a file descriptor for the eBPF program corresponding to prog_id.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_MAP_GET_FD_BY_ID
- Description
Open a file descriptor for the eBPF map corresponding to map_id.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_OBJ_GET_INFO_BY_FD
- Description
Obtain information about the eBPF object corresponding to bpf_fd.
Populates up to info_len bytes of info, which will be in one of the following formats depending on the eBPF object type of bpf_fd:
struct bpf_prog_info
struct bpf_map_info
struct bpf_btf_info
struct bpf_link_info
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_PROG_QUERY
- Description
Obtain information about eBPF programs associated with the specified attach_type hook.
The target_fd must be a valid file descriptor for a kernel object which depends on the attach type of attach_bpf_fd:
BPF_PROG_TYPE_CGROUP_DEVICE, BPF_PROG_TYPE_CGROUP_SKB, BPF_PROG_TYPE_CGROUP_SOCK, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_PROG_TYPE_SOCK_OPS
Control Group v2 hierarchy with the eBPF controller enabled. Requires the kernel to be compiled with CONFIG_CGROUP_BPF.
BPF_PROG_TYPE_FLOW_DISSECTOR
Network namespace (eg /proc/self/ns/net).
BPF_PROG_TYPE_LIRC_MODE2
LIRC device path (eg /dev/lircN). Requires the kernel to be compiled with CONFIG_BPF_LIRC_MODE2.
BPF_PROG_QUERY always fetches the number of programs attached and the attach_flags which were used to attach those programs. Additionally, if prog_ids is nonzero and the number of attached programs is less than prog_cnt, populates prog_ids with the eBPF program ids of the programs attached at target_fd.
The following flags may alter the result:
- BPF_F_QUERY_EFFECTIVE
Only return information regarding programs which are currently effective at the specified target_fd.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_RAW_TRACEPOINT_OPEN
- Description
Attach an eBPF program to a tracepoint name to access kernel internal arguments of the tracepoint in their raw form.
The prog_fd must be a valid file descriptor associated with a loaded eBPF program of type BPF_PROG_TYPE_RAW_TRACEPOINT.
No ABI guarantees are made about the content of tracepoint arguments exposed to the corresponding eBPF program.
Applying close(2) to the file descriptor returned by BPF_RAW_TRACEPOINT_OPEN will delete the map (but see NOTES).
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_BTF_LOAD
- Description
Verify and load BPF Type Format (BTF) metadata into the kernel, returning a new file descriptor associated with the metadata. BTF is described in more detail at https://www.kernel.org/doc/html/latest/bpf/btf.html.
The btf parameter must point to valid memory providing btf_size bytes of BTF binary metadata.
The returned file descriptor can be passed to other bpf() subcommands such as BPF_PROG_LOAD or BPF_MAP_CREATE to associate the BTF with those objects.
Similar to BPF_PROG_LOAD, BPF_BTF_LOAD has optional parameters to specify a btf_log_buf, btf_log_size and btf_log_level which allow the kernel to return freeform log output regarding the BTF verification process.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_BTF_GET_FD_BY_ID
- Description
Open a file descriptor for the BPF Type Format (BTF) corresponding to btf_id.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_TASK_FD_QUERY
- Description
Obtain information about eBPF programs associated with the target process identified by pid and fd.
If the pid and fd are associated with a tracepoint, kprobe or uprobe perf event, then the prog_id and fd_type will be populated with the eBPF program id and file descriptor type of type bpf_task_fd_type. If associated with a kprobe or uprobe, the probe_offset and probe_addr will also be populated. Optionally, if buf is provided, then up to buf_len bytes of buf will be populated with the name of the tracepoint, kprobe or uprobe.
The resulting prog_id may be introspected in deeper detail using BPF_PROG_GET_FD_BY_ID and BPF_OBJ_GET_INFO_BY_FD.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_MAP_LOOKUP_AND_DELETE_ELEM
- Description
Look up an element with the given key in the map referred to by the file descriptor fd, and if found, delete the element.
For BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK map types, the flags argument needs to be set to 0, but for other map types, it may be specified as:
- BPF_F_LOCK
Look up and delete the value of a spin-locked map without returning the lock. This must be specified if the elements contain a spinlock.
The BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK map types implement this command as a “pop” operation, deleting the top element rather than one corresponding to key. The key and key_len parameters should be zeroed when issuing this operation for these map types.
This command is only valid for the following map types: * BPF_MAP_TYPE_QUEUE * BPF_MAP_TYPE_STACK * BPF_MAP_TYPE_HASH * BPF_MAP_TYPE_PERCPU_HASH * BPF_MAP_TYPE_LRU_HASH * BPF_MAP_TYPE_LRU_PERCPU_HASH
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_MAP_FREEZE
- Description
Freeze the permissions of the specified map.
Write permissions may be frozen by passing zero flags. Upon success, no future syscall invocations may alter the map state of map_fd. Write operations from eBPF programs are still possible for a frozen map.
Not supported for maps of type BPF_MAP_TYPE_STRUCT_OPS.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_BTF_GET_NEXT_ID
- Description
Fetch the next BPF Type Format (BTF) object currently loaded into the kernel.
Looks for the BTF object with an id greater than start_id and updates next_id on success. If no other BTF objects remain with ids higher than start_id, returns -1 and sets errno to ENOENT.
- Return
Returns zero on success. On error, or when no id remains, -1 is returned and errno is set appropriately.
- BPF_MAP_LOOKUP_BATCH
- Description
Iterate and fetch multiple elements in a map.
Two opaque values are used to manage batch operations, in_batch and out_batch. Initially, in_batch must be set to NULL to begin the batched operation. After each subsequent BPF_MAP_LOOKUP_BATCH, the caller should pass the resultant out_batch as the in_batch for the next operation to continue iteration from the current point. Both in_batch and out_batch must point to memory large enough to hold a key, except for maps of type BPF_MAP_TYPE_{HASH, PERCPU_HASH, LRU_HASH, LRU_PERCPU_HASH}, for which batch parameters must be at least 4 bytes wide regardless of key size.
The keys and values are output parameters which must point to memory large enough to hold count items based on the key and value size of the map map_fd. The keys buffer must be of key_size * count. The values buffer must be of value_size * count.
The elem_flags argument may be specified as one of the following:
- BPF_F_LOCK
Look up the value of a spin-locked map without returning the lock. This must be specified if the elements contain a spinlock.
On success, count elements from the map are copied into the user buffer, with the keys copied into keys and the values copied into the corresponding indices in values.
If an error is returned and errno is not EFAULT, count is set to the number of successfully processed elements.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
May set errno to ENOSPC to indicate that keys or values is too small to dump an entire bucket during iteration of a hash-based map type.
- BPF_MAP_LOOKUP_AND_DELETE_BATCH
- Description
Iterate and delete all elements in a map.
This operation has the same behavior as BPF_MAP_LOOKUP_BATCH with two exceptions:
Every element that is successfully returned is also deleted from the map. This is at least count elements. Note that count is both an input and an output parameter.
Upon returning with errno set to EFAULT, up to count elements may be deleted without returning the keys and values of the deleted elements.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_MAP_UPDATE_BATCH
- Description
Update multiple elements in a map by key.
The keys and values are input parameters which must point to memory large enough to hold count items based on the key and value size of the map map_fd. The keys buffer must be of key_size * count. The values buffer must be of value_size * count.
Each element specified in keys is sequentially updated to the value in the corresponding index in values. The in_batch and out_batch parameters are ignored and should be zeroed.
The elem_flags argument should be specified as one of the following:
- BPF_ANY
Create new elements or update a existing elements.
- BPF_NOEXIST
Create new elements only if they do not exist.
- BPF_EXIST
Update existing elements.
- BPF_F_LOCK
Update spin_lock-ed map elements. This must be specified if the map value contains a spinlock.
On success, count elements from the map are updated.
If an error is returned and errno is not EFAULT, count is set to the number of successfully processed elements.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
May set errno to EINVAL, EPERM, ENOMEM, or E2BIG. E2BIG indicates that the number of elements in the map reached the max_entries limit specified at map creation time.
May set errno to one of the following error codes under specific circumstances:
- EEXIST
If flags specifies BPF_NOEXIST and the element with key already exists in the map.
- ENOENT
If flags specifies BPF_EXIST and the element with key does not exist in the map.
- BPF_MAP_DELETE_BATCH
- Description
Delete multiple elements in a map by key.
The keys parameter is an input parameter which must point to memory large enough to hold count items based on the key size of the map map_fd, that is, key_size * count.
Each element specified in keys is sequentially deleted. The in_batch, out_batch, and values parameters are ignored and should be zeroed.
The elem_flags argument may be specified as one of the following:
- BPF_F_LOCK
Look up the value of a spin-locked map without returning the lock. This must be specified if the elements contain a spinlock.
On success, count elements from the map are updated.
If an error is returned and errno is not EFAULT, count is set to the number of successfully processed elements. If errno is EFAULT, up to count elements may be been deleted.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_LINK_CREATE
- Description
Attach an eBPF program to a target_fd at the specified attach_type hook and return a file descriptor handle for managing the link.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_LINK_UPDATE
- Description
Update the eBPF program in the specified link_fd to new_prog_fd.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_LINK_GET_FD_BY_ID
- Description
Open a file descriptor for the eBPF Link corresponding to link_id.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_LINK_GET_NEXT_ID
- Description
Fetch the next eBPF link currently loaded into the kernel.
Looks for the eBPF link with an id greater than start_id and updates next_id on success. If no other eBPF links remain with ids higher than start_id, returns -1 and sets errno to ENOENT.
- Return
Returns zero on success. On error, or when no id remains, -1 is returned and errno is set appropriately.
- BPF_ENABLE_STATS
- Description
Enable eBPF runtime statistics gathering.
Runtime statistics gathering for the eBPF runtime is disabled by default to minimize the corresponding performance overhead. This command enables statistics globally.
Multiple programs may independently enable statistics. After gathering the desired statistics, eBPF runtime statistics may be disabled again by calling close(2) for the file descriptor returned by this function. Statistics will only be disabled system-wide when all outstanding file descriptors returned by prior calls for this subcommand are closed.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_ITER_CREATE
- Description
Create an iterator on top of the specified link_fd (as previously created using BPF_LINK_CREATE) and return a file descriptor that can be used to trigger the iteration.
If the resulting file descriptor is pinned to the filesystem using BPF_OBJ_PIN, then subsequent read(2) syscalls for that path will trigger the iterator to read kernel state using the eBPF program attached to link_fd.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- BPF_LINK_DETACH
- Description
Forcefully detach the specified link_fd from its corresponding attachment point.
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_PROG_BIND_MAP
- Description
Bind a map to the lifetime of an eBPF program.
The map identified by map_fd is bound to the program identified by prog_fd and only released when prog_fd is released. This may be used in cases where metadata should be associated with a program which otherwise does not contain any references to the map (for example, embedded in the eBPF program instructions).
- Return
Returns zero on success. On error, -1 is returned and errno is set appropriately.
- BPF_TOKEN_CREATE
- Description
Create BPF token with embedded information about what BPF-related functionality it allows: - a set of allowed bpf() syscall commands; - a set of allowed BPF map types to be created with BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed; - a set of allowed BPF program types and BPF program attach types to be loaded with BPF_PROG_LOAD command, if BPF_PROG_LOAD itself is allowed.
BPF token is created (derived) from an instance of BPF FS, assuming it has necessary delegation mount options specified. This BPF token can be passed as an extra parameter to various bpf() syscall commands to grant BPF subsystem functionality to unprivileged processes.
When created, BPF token is “associated” with the owning user namespace of BPF FS instance (super block) that it was derived from, and subsequent BPF operations performed with BPF token would be performing capabilities checks (i.e., CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within that user namespace. Without BPF token, such capabilities have to be granted in init user namespace, making bpf() syscall incompatible with user namespace, for the most part.
- Return
A new file descriptor (a nonnegative integer), or -1 if an error occurred (in which case, errno is set appropriately).
- NOTES
eBPF objects (maps and programs) can be shared between processes.
After fork(2), the child inherits file descriptors referring to the same eBPF objects.
File descriptors referring to eBPF objects can be transferred over unix(7) domain sockets.
File descriptors referring to eBPF objects can be duplicated in the usual way, using dup(2) and similar calls.
File descriptors referring to eBPF objects can be pinned to the filesystem using the BPF_OBJ_PIN command of bpf(2).
An eBPF object is deallocated only after all file descriptors referring to the object have been closed and no references remain pinned to the filesystem or attached (for example, bound to a program or device).