Under the Hood - Implementing system_clock and steady_clock

Time measurement and interval calculation are crucial components in various software systems, particularly for metrics, tracing, and logging. As these operations are frequently performed, there's a legitimate concern about their potential impact on program performance. To address this concern and gain a deeper understanding of time-related functions, this blog post delves into the implementation details of system_clock and steady_clock. By exploring their underlying mechanisms, we aim to shed light on the efficiency of these time-keeping tools and alleviate worries about performance overhead.

Unveiling the Implementation of system_clock

The key point to understand is that system_clock operates with zero syscalls. As noted in Stack Overflow: How does one do a "zero-syscall clock_gettime" without dynamic linking?:

Call into the clock_gettime implementation in the VDSO, to use code+data exported by the kernel.

According to Wikipedia: vDSO:

vDSO (virtual dynamic shared object) is a kernel mechanism for exporting a carefully selected set of kernel space routines to user space applications so that applications can call these kernel space routines in-process, without incurring the performance penalty of a mode switch from user mode to kernel mode that is inherent when calling these same kernel space routines by means of the system call interface.

In the vDSO, clock_gettime uses the RDTSC instruction to obtain the time, as explained on Stack Exchange: Should I be seeing (non-VDSO) clock_gettime() syscalls on x86_64 using HPET?:

In the vDSO, clock_gettimeofday and related functions are reliant on specific clock modes; see __arch_get_hw_counter. If the clock mode is VCLOCK_TSC, the time is read without a syscall, using RDTSC; if it's VCLOCK_PVCLOCK or VCLOCK_HVCLOCK, it's read from a specific page to retrieve the information from the hypervisor.

To check the clock mode, AWS re:Post: How do I manage the clock source for EC2 instances running Linux? suggests:

To find the currently set clock source, list the contents of the current_clocksource file:

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

In my virtual machine, it shows tsc.

Félix Cloutier: RDTSC describes RDTSC:

The processor monotonically increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset. See "Time Stamp Counter" in Chapter 18 of the Intel@64 and IA-32 Architectures Software Developer's Manual, Volume 3B, for specific details of the time stamp counter behavior.

Let's trace the system_clock::now function from GCC to Linux:

  1. system_clock::now calls __vdso_clock_gettime.
  2. __vdso_clock_gettime calls __cvdso_clock_gettime_common.
  3. __cvdso_clock_gettime_common calls do_hres.
  4. do_hres calls __arch_get_hw_counter
  5. Finally, __arch_get_hw_counter calls rdtsc_ordered.

system_clock vs. steady_clock: Key Differences

When measuring time, both system_clock and steady_clock are suitable. However, I'm unsure about their differences and which one to use.

The key difference between system_clock and steady_clock lies in their base times: system_clock::now uses CLOCK_REALTIME with clock_gettime, steady_clock::now uses CLOCK_MONOTONIC. In the vDSO, do_hres uses CLOCK_REALTIME and CLOCK_MONOTONIC as indices for vd->basetime to retrieve different base timestamps. I suspect that different indices of vd->basetime provide different base times. However, since __arch_get_vdso_data is a kernel function, I can't call it directly to test this.

system_clock uses vd->basetime[CLOCK_REALTIME] to get its base time, which is not monotonic and can be adjusted at any moment. As noted in the C++ reference: std::chrono::system_clock:

It may not be monotonic: on most systems, the system time can be adjusted at any moment.

Efficiency Comparison: CLOCK_THREAD_CPUTIME_ID vs. system_clock

Besides measuring real time, measuring CPU time is also useful. For example, if a step takes a lot of real time and also a lot of CPU time, it indicates heavy computation (like a for loop). If it uses little CPU time, it might be due to insufficient CPU quota.

In the implementation of __cvdso_clock_gettime_common, when CLOCK_THREAD_CPUTIME_ID is used, it doesn't match any of the masks VDSO_HRES, VDSO_COARSE, or VDSO_RAW. As a result, the function returns -1. This return value triggers the caller function __cvdso_clock_gettime_data to fallback to clock_gettime_fallback, leading to a syscall. Retrieving thread CPU time using CLOCK_THREAD_CPUTIME_ID is slower than using system_clock::now because it requires a syscall.

Benchmark

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <chrono>
#include <ctime>
#include <iostream>
int main() {
auto begin = std::chrono::system_clock::now();
for (auto i = 0; i < 1000000; i++) {
std::chrono::system_clock::now();
}
auto end = std::chrono::system_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << std::endl;
//
begin = std::chrono::system_clock::now();
for (auto i = 0; i < 1000000; i++) {
std::chrono::steady_clock::now();
}
end = std::chrono::system_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << std::endl;
//
begin = std::chrono::system_clock::now();
for (auto i = 0; i < 1000000; i++) {
timespec t;
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t);
}
end = std::chrono::system_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << std::endl;
return 0;
}
1
2
3
4
5
# g++ test.cc -O0 -o test
# ./test
24975888
23460648
581884166

The benchmark results support the analysis:

  • system_clock and steady_clock have similar performance, while CLOCK_THREAD_CPUTIME_ID is significantly slower due to the syscall fallback.
  • Calling system_clock::now costs 25 nanoseconds, demonstrating its high efficiency.

Under the Hood - Implementing system_clock and steady_clock
https://clcanny.github.io/2024/08/21/computer-science/programming-language/c++/under-the-hood-implementing-system-clock-and-steady-clock/
作者
JunBin
发布于
2024年8月21日
许可协议