Under the Hood - Implementing system_clock and steady_clock
Time measurement and interval calculation are crucial components in various software systems, particularly for metrics, tracing, and logging. As these operations are frequently performed, there's a legitimate concern about their potential impact on program performance. To address this concern and gain a deeper understanding of time-related functions, this blog post delves into the implementation details of system_clock and steady_clock. By exploring their underlying mechanisms, we aim to shed light on the efficiency of these time-keeping tools and alleviate worries about performance overhead.
Unveiling the Implementation of system_clock
The key point to understand is that system_clock operates with zero syscalls. As noted in Stack Overflow: How does one do a "zero-syscall clock_gettime" without dynamic linking?:
Call into the
clock_gettimeimplementation in the VDSO, to use code+data exported by the kernel.
According to Wikipedia: vDSO:
vDSO (virtual dynamic shared object) is a kernel mechanism for exporting a carefully selected set of kernel space routines to user space applications so that applications can call these kernel space routines in-process, without incurring the performance penalty of a mode switch from user mode to kernel mode that is inherent when calling these same kernel space routines by means of the system call interface.
In the vDSO, clock_gettime uses the RDTSC instruction to obtain the time, as explained on Stack Exchange: Should I be seeing (non-VDSO) clock_gettime() syscalls on x86_64 using HPET?:
In the vDSO,
clock_gettimeofdayand related functions are reliant on specific clock modes; see__arch_get_hw_counter. If the clock mode isVCLOCK_TSC, the time is read without a syscall, usingRDTSC; if it'sVCLOCK_PVCLOCKorVCLOCK_HVCLOCK, it's read from a specific page to retrieve the information from the hypervisor.
To check the clock mode, AWS re:Post: How do I manage the clock source for EC2 instances running Linux? suggests:
To find the currently set clock source, list the contents of the current_clocksource file:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
In my virtual machine, it shows tsc.
Félix Cloutier: RDTSC describes RDTSC:
The processor monotonically increments the time-stamp counter
MSRevery clock cycle and resets it to 0 whenever the processor is reset. See "Time Stamp Counter" in Chapter 18 of the Intel@64 and IA-32 Architectures Software Developer's Manual, Volume 3B, for specific details of the time stamp counter behavior.
Let's trace the system_clock::now function from GCC to Linux:
system_clock::nowcalls__vdso_clock_gettime.__vdso_clock_gettimecalls__cvdso_clock_gettime_common.__cvdso_clock_gettime_commoncallsdo_hres.do_hrescalls__arch_get_hw_counter- Finally,
__arch_get_hw_countercallsrdtsc_ordered.
system_clock vs. steady_clock: Key Differences
When measuring time, both system_clock and steady_clock are suitable. However, I'm unsure about their differences and which one to use.
The key difference between system_clock and steady_clock lies in their base times: system_clock::now uses CLOCK_REALTIME with clock_gettime, steady_clock::now uses CLOCK_MONOTONIC. In the vDSO, do_hres uses CLOCK_REALTIME and CLOCK_MONOTONIC as indices for vd->basetime to retrieve different base timestamps. I suspect that different indices of vd->basetime provide different base times. However, since __arch_get_vdso_data is a kernel function, I can't call it directly to test this.
system_clock uses vd->basetime[CLOCK_REALTIME] to get its base time, which is not monotonic and can be adjusted at any moment. As noted in the C++ reference: std::chrono::system_clock:
It may not be monotonic: on most systems, the system time can be adjusted at any moment.
Efficiency Comparison: CLOCK_THREAD_CPUTIME_ID vs. system_clock
Besides measuring real time, measuring CPU time is also useful. For example, if a step takes a lot of real time and also a lot of CPU time, it indicates heavy computation (like a for loop). If it uses little CPU time, it might be due to insufficient CPU quota.
In the implementation of __cvdso_clock_gettime_common, when CLOCK_THREAD_CPUTIME_ID is used, it doesn't match any of the masks VDSO_HRES, VDSO_COARSE, or VDSO_RAW. As a result, the function returns -1. This return value triggers the caller function __cvdso_clock_gettime_data to fallback to clock_gettime_fallback, leading to a syscall. Retrieving thread CPU time using CLOCK_THREAD_CPUTIME_ID is slower than using system_clock::now because it requires a syscall.
Benchmark
1 | |
1 | |
The benchmark results support the analysis:
system_clockandsteady_clockhave similar performance, whileCLOCK_THREAD_CPUTIME_IDis significantly slower due to the syscall fallback.- Calling
system_clock::nowcosts 25 nanoseconds, demonstrating its high efficiency.