Thursday, June 12, 2014

Linux Kernel Time Calculation


Time

Linux provides various methods for calculating time. The most frequently used one is time() defined in time.h
But time() returns time_t which represents number of seconds since epoch.
What if we want microsecond resolution? Better use gettimeofday. It takes timeval structure which provides time in microsecond resolution.
What if we want nanosecond resolution? What if we want time interval rather than absolute time?
Linux provides methods to get time with nanosecond resolution, provided your hardware support it. Also Linux allows you to choose the clock source for time calculation.

Clock Source

Linux provides multiple sources based on which, time is calculated. The sources can be found in /sys/devices/system/clocksource/clocksource0/available_clocksource
The current clock source selected by kernel can found in /sys/devices/system/clocksource/clocksource0/current_clocksource

root@jijith-M17xR4:/home/jijith# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
tsc hpet acpi_pm 
In my machine i have the following clocks available
  • TSC
  • TSC stands for Time Stamp Counter. Its a running counter provided by the hardware, which usually provides a count of CPU clock cycles.
  • HPET
  • HPET is the High Precision Event Time. Its also a hardware based counter which provides high resolution timers. But its slower than TSC. So most of the linux kernels prefer TSC if available in the hardware
  • ACPI_PM
  • The ACPI Power Management Timer (or ACPI PMT) is yet another clock device included in almost all ACPI-based motherboards. Its clock signal has a fixed frequency. Its slower than HPET.

Redhat Customer Portal has a good comparison of the performance of various clocks available. Here is a summary:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
# time ./clock_timing

	real	0m0.601s
	user	0m0.592s
	sys	0m0.002s

# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
# time ./clock_timing

	real	0m12.263s
	user	0m12.197s
	sys	0m0.001s

# echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm
# time ./clock_timing

	real	0m24.461s
	user	0m0.504s
	sys	0m23.776s

From the above results the efficiency of generating timestamps, in descending order, is: TSC, HPET, ACPI_PM.

Time Calculation

Time in Nanoseconds

clock_gettime can be used for getting time in nanoseconds(since epoch). It gives result in struct timespec which has the following members

struct timespec {
               time_t   tv_sec;        /* seconds */
               long     tv_nsec;       /* nanoseconds */
           };
clock_gettime also let you specify clock source, identified by clockid_t.
Possible values are
  • CLOCK_REALTIME
  • System wide real time clock.
  • CLOCK_MONOTONIC
  • Clock that cannot be set and represents monotonic time since some unspecified starting point. More like a running counter.
  • CLOCK_MONOTONIC_RAW
  • Similar to CLOCK_MONOTONIC but provides access to a raw hardware-based time that is not subject to NTP adjustments.
To check if your hardware supports nanosecond resolution, use clock_getres and check the value in tv_nsec.If its greater than 10^9 then you can get time in nanoseconds.
Note: You have to link your application with -lrt to use the above methods.

Time interval

If we use CLOCK_REALTIME then we get the absolute time. It can also be used to calculate the time interval, by taking diff of two instances. Any issues?
CLOCK_REALTIME is the system wide real time clock and it can cause drifts in the time if you have ntpdeamon running(or periodical ntdpate). This could result in a negative value of the time interval if you take a diff of two instances.
CLOCK_MONOTONIC_RAW solves this problem. Its a monotonously increasing number which is not subjected to NTP adjustments.

Performance

Usually clock_gettime() is a kernel call and there are overheads in making kernel calls.
Based on kernel configuration, Linux kernel can implement these methods as VDSO(Virtually Dynamic Shared Object). So they can run on user space. If you have CONFIG_GENERIC_TIME_VSYSCALL set to y in the kernel config, then the clock_gettime() will be available as VDSO.
How will you be able to access system time in user space?
Linux kernel timer handler can calculate the time and store it on a global memory. The user space call for clock_gettime() will read the contents of this memory. But it has to be guarded with a spin lock.
So basically, clock_gettime() can run in user space(if its available as VDSO) and can read the time from a global memory, which is already updated by timer interrupt handler.