Under the Hood - Implementing system_clock and steady_clock

Time measurement and interval calculation are crucial components in various software systems, particularly for metrics, tracing, and logging. As these operations are frequently performed, there's a legitimate concern about their potential impact on program performance. To address this concern and gain a deeper understanding of time-related functions, this blog post delves into the implementation details of system_clock and steady_clock. By exploring their underlying mechanisms, we aim to shed light on the efficiency of these time-keeping tools and alleviate worries about performance overhead.

Unveiling the Implementation of `system_clock`

The key point to understand is that system_clock operates with zero syscalls. As noted in Stack Overflow: How does one do a "zero-syscall clock_gettime" without dynamic linking?:

Call into the clock_gettime implementation in the VDSO, to use code+data exported by the kernel.

According to Wikipedia: vDSO:

vDSO (virtual dynamic shared object) is a kernel mechanism for exporting a carefully selected set of kernel space routines to user space applications so that applications can call these kernel space routines in-process, without incurring the performance penalty of a mode switch from user mode to kernel mode that is inherent when calling these same kernel space routines by means of the system call interface.

In the vDSO, clock_gettime uses the RDTSC instruction to obtain the time, as explained on Stack Exchange: Should I be seeing (non-VDSO) clock_gettime() syscalls on x86_64 using HPET?:

In the vDSO, clock_gettimeofday and related functions are reliant on specific clock modes; see __arch_get_hw_counter. If the clock mode is VCLOCK_TSC, the time is read without a syscall, using RDTSC; if it's VCLOCK_PVCLOCK or VCLOCK_HVCLOCK, it's read from a specific page to retrieve the information from the hypervisor.

To check the clock mode, AWS re:Post: How do I manage the clock source for EC2 instances running Linux? suggests:

To find the currently set clock source, list the contents of the current_clocksource file:

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

In my virtual machine, it shows tsc.

Félix Cloutier: RDTSC describes RDTSC:

The processor monotonically increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset. See "Time Stamp Counter" in Chapter 18 of the Intel@64 and IA-32 Architectures Software Developer's Manual, Volume 3B, for specific details of the time stamp counter behavior.

Let's trace the system_clock::now function from GCC to Linux:

system_clock::now calls __vdso_clock_gettime.
__vdso_clock_gettime calls __cvdso_clock_gettime_common.
__cvdso_clock_gettime_common calls do_hres.
do_hres calls __arch_get_hw_counter
Finally, __arch_get_hw_counter calls rdtsc_ordered.

`system_clock` vs. `steady_clock`: Key Differences

When measuring time, both system_clock and steady_clock are suitable. However, I'm unsure about their differences and which one to use.

The key difference between system_clock and steady_clock lies in their base times: system_clock::now uses CLOCK_REALTIME with clock_gettime, steady_clock::now uses CLOCK_MONOTONIC. In the vDSO, do_hres uses CLOCK_REALTIME and CLOCK_MONOTONIC as indices for vd->basetime to retrieve different base timestamps. I suspect that different indices of vd->basetime provide different base times. However, since __arch_get_vdso_data is a kernel function, I can't call it directly to test this.

system_clock uses vd->basetime[CLOCK_REALTIME] to get its base time, which is not monotonic and can be adjusted at any moment. As noted in the C++ reference: std::chrono::system_clock:

It may not be monotonic: on most systems, the system time can be adjusted at any moment.

Efficiency Comparison: `CLOCK_THREAD_CPUTIME_ID` vs. `system_clock`

Besides measuring real time, measuring CPU time is also useful. For example, if a step takes a lot of real time and also a lot of CPU time, it indicates heavy computation (like a for loop). If it uses little CPU time, it might be due to insufficient CPU quota.

In the implementation of __cvdso_clock_gettime_common, when CLOCK_THREAD_CPUTIME_ID is used, it doesn't match any of the masks VDSO_HRES, VDSO_COARSE, or VDSO_RAW. As a result, the function returns -1. This return value triggers the caller function __cvdso_clock_gettime_data to fallback to clock_gettime_fallback, leading to a syscall. Retrieving thread CPU time using CLOCK_THREAD_CPUTIME_ID is slower than using system_clock::now because it requires a syscall.

Benchmark

#include <chrono>
#include <ctime>
#include <iostream>
int main() {
  auto begin = std::chrono::system_clock::now();
  for (auto i = 0; i < 1000000; i++) {
    std::chrono::system_clock::now();
  }
  auto end = std::chrono::system_clock::now();
  std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << std::endl;
  //
  begin = std::chrono::system_clock::now();
  for (auto i = 0; i < 1000000; i++) {
    std::chrono::steady_clock::now();
  }
  end = std::chrono::system_clock::now();
  std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << std::endl;
  //
  begin = std::chrono::system_clock::now();
  for (auto i = 0; i < 1000000; i++) {
    timespec t;
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t);
  }
  end = std::chrono::system_clock::now();
  std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << std::endl;
  return 0;
}

# g++ test.cc -O0 -o test
# ./test
24975888
23460648
581884166

The benchmark results support the analysis:

system_clock and steady_clock have similar performance, while CLOCK_THREAD_CPUTIME_ID is significantly slower due to the syscall fallback.
Calling system_clock::now costs 25 nanoseconds, demonstrating its high efficiency.

Computer Science > Programming Language > C++

Under the Hood - Implementing system_clock and steady_clock

https://clcanny.github.io/2024/08/21/computer-science/programming-language/c++/under-the-hood-implementing-system-clock-and-steady-clock/

作者

JunBin

发布于

2024年8月21日

许可协议

Design Document: Enhancing Inode Attributes and Storage Policies in HDFS 上一篇

Understanding Raft within the Context of a Generalized Solution to Distributed Consensus 下一篇

Under the Hood - Implementing system_clock and steady_clock

Unveiling the Implementation of system_clock

system_clock vs. steady_clock: Key Differences

Efficiency Comparison: CLOCK_THREAD_CPUTIME_ID vs. system_clock

Benchmark

Unveiling the Implementation of `system_clock`

`system_clock` vs. `steady_clock`: Key Differences

Efficiency Comparison: `CLOCK_THREAD_CPUTIME_ID` vs. `system_clock`