[Cialug] user vs. sys cpu time
Kendall Bailey
krbailey at gmail.com
Fri Jun 29 09:35:41 CDT 2007
Just an update. I replaced some std::list<> objects with
std::deque<>. The sys cpu time dropped to almost nothing and overall
performance improved by 15%. My guess right now is that the list
operations in the inner loop was causing heap contention which
manifest as high kernel mode time. I haven't tracked down proof for
that theory yet. I had assumed heap contention would cause threads to
block which would show up as idle time. However the ptmalloc
implementation in glibc appearantly creates a new heap arena if none
are available. So with 4 threads, there would be up to four arenas
and the malloc/free calls must have spent a lot of time looking for
one that was not locked. Can anyone confirm that this makes sense?
Thanks,
Kendall
On 6/23/07, Morris Dovey <mrdovey at iedu.com> wrote:
> Kendall Bailey wrote:
> | I have a math optimization app that runs for several minutes on a
> | dual proc/dual core opteron SLES 9 box. Plenty of RAM. It uses 4
> | threads to do some parallel computation to take advantage of all 4
> | cpu cores. Looking at 'top', or using 'time' to measure cpu time,
> | it's logging 30% or more sys time vs. 60-70% user time during the
> | parallel phases. I ran it with 'strace -c' and that says only 0.003
> | seconds are spent in system calls. No system calls going on at all
> | on the 4 threads. So what does the 'sys' time represent as reported
> | by top?
>
More information about the Cialug
mailing list