Tuesday, July 14, 2009

kernel symbols

A kernel symbol can mean any symbol (variable, routine, etc) in the kernel, but is probably understood to be all the routines available to the user (cat /proc/kallsyms). Exported symbols are symbols which each device driver (or other kernel routine) makes available to the rest of the kernel. This is necessary because some drivers need to share information between them, but generally all symbols in a module are hidden from the rest of the kernel (to avoid the 'namespace pollution' problem). So, for example, any routines and variables in a device driver which need to be accessible to another device driver will have the EXPORT_SYMBOL macro to export it (or EXPORT_GPL_SYMBOL if the author only wants that information to be available to GPL drivers and not non-free drivers).
http://www.linuxquestions.org/questions/programming-9/can-someone-give-me-the-definition-of-kernel-symbol-and-exportsymbol-666456/

The kernel has a couple of macros for making internal symbols available to loadable modules:

EXPORT_SYMBOL(symbol);
EXPORT_SYMBOL_GPL(symbol);

The second form is used for kernel symbols which are only available to modules with a GPL-compatible license. The idea behind GPL-only symbols is that they are so deeply internal to the kernel that any module using them can only be a derived product of the kernel. Either that, or it's a relatively new symbol whose creator simply wanted it to be GPL-only.
http://lwn.net/Articles/171838/

What Are Symbols?

In the context of programming, a symbol is the building block of a program: it is a variable name or a function name. It should be of no surprise that the kernel has symbols, just like the programs you write. The difference is, of course, that the kernel is a very complicated piece of coding and has many, many global symbols.

What Is The Kernel Symbol Table?

The kernel doesn't use symbol names like BytesRead(). It's much happier knowing a variable or function name by the variable or function's address, like c0343f20. Humans, on the other hand, do not appreciate addresses like c0343f20. We prefer to use symbol names like BytesRead(). Normally, this doesn't present much of a problem. The kernel is mainly written in C, so the compiler/linker allows us to use symbol names when we code and allows the kernel to use addresses when it runs. Everyone is happy.

There are situations, however, where we need to know the address of a symbol (or the symbol for an address). This is done by a symbol table, and is very similar to how gdb can give you the function name from an address (or an address from a function name). A symbol table is a listing of all symbols along with their address. Here is an example of a symbol table:

c03441a0 B dmi_broken
c03441a4 B is_sony_vaio_laptop
c03441c0 b dmi_ident
c0344200 b pci_bios_present
c0344204 b pirq_table
c0344208 b pirq_router
c034420c b pirq_router_dev
c0344220 b ascii_buffer
c0344224 b ascii_buf_bytes

You can see that the variable named dmi_broken is at the kernel address c03441a0.


What Is The System.map File?

There are 2 files that are used as a kernel symbol table:

1. /proc/kallsyms
2. System.map

There. You now know what the System.map file is.

Every time you compile a new kernel, the addresses of various symbol names are bound to change.

/proc/kallsyms is a "proc file" that is created on the fly when a kernel boots up. Actually, it's not really a disk file; it's a representation of kernel data which is given the illusion of being a disk file. If you don't believe me, try finding the filesize of /proc/kallsyms. Therefore, it will always be correct for the kernel that is currently running.

However, System.map is an actual file on your filesystem. When you compile a new kernel, your old System.map has wrong symbol information. A new System.map is generated with each kernel compile and you need to replace the old copy with your new copy.


“Bu kadar zeki olmak nasıl” sorusuna ise şu yanıtı veriyor: Zeka da güzellik gibi. Ne kadarına sahip olursanız, hayat da o kadar kolaylaşıyor.


The requested module has a dependancy on another module if the other module defines symbols (variables or functions) that the requested module uses.

Kernel modules are different here, too. In the hello world example, you might have noticed that we used a function, printk() but didn't include a standard I/O library. That's because modules are object files whose symbols get resolved upon insmod'ing. The definition for the symbols comes from the kernel itself; the only external functions you can use are the ones provided by the kernel. If you're curious about what symbols have been exported by your kernel, take a look at /proc/kallsyms.



If lowercase, the symbol is local; if uppercase, the symbol is global (external)."t" and "T" mean that these symbols are defined in the text (code) section of the compiled code (local and global, respectively). For more information, look at the man page for "nm".


adsl2:~/ozan/kernelmodule/test# cat hello.c
#include
#include

int init_module(void)
{
printk(KERN_INFO "init_module() called\n");
return 0;
}

void cleanup_module(void)
{
printk(KERN_INFO "cleanup_module() called\n");
}
adsl2:~/ozan/kernelmodule/test# man nm
adsl2:~/ozan/kernelmodule/test# nm --defined-only hello.ko
00000000 r ____versions
00000020 r __mod_vermagic5
00000000 r __module_depends
00000000 D __this_module
00000000 T cleanup_module
0000000c T init_module
adsl2:~/ozan/kernelmodule/test# nm --undefined-only hello.ko
U printk


adsl2:~/ozan/kernelmodule/test# cat /proc/kallsyms | grep printk
c0104f1e T printk_address
c01139f1 T early_printk
c012283d T printk_timed_ratelimit
c012286d T __printk_ratelimit
c0122872 T printk_ratelimit
c0122c19 T vprintk
c0122ef6 T printk


The file /proc/kallsyms holds all the symbols that the kernel knows about and which are therefore accessible to your modules since they share the kernel's codespace.

3.1.6. Device Drivers

One class of module is the device driver, which provides functionality for hardware like a TV card or a serial port. On unix, each piece of hardware is represented by a file located in /dev named a device file which provides the means to communicate with the hardware. The device driver provides the communication on behalf of a user program. So the es1370.o sound card device driver might connect the /dev/sound device file to the Ensoniq IS1370 sound card. A userspace program like mp3blaster can use /dev/sound without ever knowing what kind of sound card is installed.



In general, a process is not supposed to be able to access the kernel. It can't access kernel memory and it can't call kernel functions. The hardware of the CPU enforces this (that's the reason why it's called `protected mode').


Using standard libraries

You can't do that. In a kernel module you can only use kernel functions, which are the functions you can see in /proc/kallsyms.
Disabling interrupts

You might need to do this for a short time and that is OK, but if you don't enable them afterwards, your system will be stuck and you'll have to power it off.

Each piece of code that can be added to the kernel at runtime is called a module . The Linux kernel offers support for quite a few different types (or classes) of modules, including, but not limited to, device drivers. Each module is made up of object code (not linked into a complete executable) that can be dynamically linked to the running kernel by the insmod program and can be unlinked by the rmmod program.

As a programmer, you know that an application can call functions it doesn't define: the linking stage resolves external references using the appropriate library of functions. printf is one of those callable functions and is defined in libc. A module, on the other hand, is linked only to the kernel, and the only functions it can call are the ones exported by the kernel; there are no libraries to link to. The printk function used in hello.c earlier, for example, is the version of printf defined within the kernel and exported to modules. It behaves similarly to the original function, with a few minor differences, the main one being lack of floating-point support.


Interested readers may want to look at how the kernel supports insmod: it relies on a system call defined in kernel/module.c. The function sys_init_module allocates kernel memory to hold a module (this memory is allocated with vmalloc ; see the Section 8.4 in Chapter 2); it then copies the module text into that memory region, resolves kernel references in the module via the kernel symbol table, and calls the module's initialization function to get everything going.

If you actually look in the kernel source, you'll find that the names of the system calls are prefixed with sys_. This is true for all system calls and no other functions; it's useful to keep this in mind when grepping for the system calls in the sources.

The modprobe utility is worth a quick mention. modprobe, like insmod, loads a module into the kernel. It differs in that it will look at the module to be loaded to see whether it references any symbols that are not currently defined in the kernel. If any such references are found, modprobe looks for other modules in the current module search path that define the relevant symbols. When modprobe finds those modules (which are needed by the module being loaded), it loads them into the kernel as well. If you use insmod in this situation instead, the command fails with an "unresolved symbols" message left in the system logfile.

The lsmod program produces a list of the modules currently loaded in the kernel. Some other information, such as any other modules making use of a specific module, is also provided. lsmod works by reading the /proc/modules virtual file. Information on currently loaded modules can also be found in the sysfs virtual filesystem under /sys/module.

2.4.3. Version Dependency

Bear in mind that your module's code has to be recompiled for each version of the kernel that it is linked to—at least, in the absence of modversions, not covered here as they are more for distribution makers than developers. Modules are strongly tied to the data structures and function prototypes defined in a particular kernel version; the interface seen by a module can change significantly from one kernel version to the next. This is especially true of development kernels, of course.

The kernel does not just assume that a given module has been built against the proper kernel version. One of the steps in the build process is to link your module against a file (called vermagic.o) from the current kernel tree; this object contains a fair amount of information about the kernel the module was built for, including the target kernel version, compiler version, and the settings of a number of important configuration variables. When an attempt is made to load a module, this information can be tested for compatibility with the running kernel. If things don't match, the module is not loaded; instead, you see something like:

# insmod hello.ko
Error inserting './hello.ko': -1 Invalid module format


A look in the system log file (/var/log/messages or whatever your system is configured to use) will reveal the specific problem that caused the module to fail to load.

If you need to compile a module for a specific kernel version, you will need to use the build system and source tree for that particular version. A simple change to the KERNELDIR variable in the example makefile shown previously does the trick.

Kernel interfaces often change between releases. If you are writing a module that is intended to work with multiple versions of the kernel (especially if it must work across major releases), you likely have to make use of macros and #ifdef constructs to make your code build properly. This edition of this book only concerns itself with one major version of the kernel, so you do not often see version tests in our example code. But the need for them does occasionally arise. In such cases, you want to make use of the definitions found in linux/version.h

When using stacked modules, it is helpful to be aware of the modprobe utility. As we described earlier, modprobe functions in much the same way as insmod, but it also loads any other modules that are required by the module you want to load. Thus, one modprobe command can sometimes replace several invocations of insmod (although you'll still need insmod when loading your own modules from the current directory, because modprobe looks only in the standard installed module directories).

The Linux kernel header files provide a convenient way to manage the visibility of your symbols, thus reducing namespace pollution (filling the namespace with names that may conflict with those defined elsewhere in the kernel) and promoting proper information hiding. If your module needs to export symbols for other modules to use, the following macros should be used.

EXPORT_SYMBOL(name);
EXPORT_SYMBOL_GPL(name);

Most kernel code ends up including a fairly large number of header files to get definitions of functions, data types, and variables. We'll examine these files as we come to them, but there are a few that are specific to modules, and must appear in every loadable module. Thus, just about all module code has the following:

#include
#include


module.h contains a great many definitions of symbols and functions needed by loadable modules. You need init.h to specify your initialization and cleanup functions, as we saw in the "hello world" example above, and which we revisit in the next section. Most modules also include moduleparam.h to enable the passing of parameters to the module at load time

No comments: