[Cialug] user vs. sys cpu time

Fri Jun 29 09:35:41 CDT 2007

Just an update.  I replaced some std::list<> objects with
std::deque<>.  The sys cpu time dropped to almost nothing and overall
performance improved by 15%.  My guess right now is that the list
operations in the inner loop was causing heap contention which
manifest as high kernel mode time.  I haven't tracked down proof for
that theory yet.  I had assumed heap contention would cause threads to
block which would show up as idle time.  However the ptmalloc
implementation in glibc appearantly creates a new heap arena if none
are available.  So with 4 threads, there would be up to four arenas
and the malloc/free calls must have spent a lot of time looking for
one that was not locked.  Can anyone confirm that this makes sense?

Thanks,
Kendall

On 6/23/07, Morris Dovey <mrdovey at iedu.com> wrote:
> Kendall Bailey wrote:
> | I have a math optimization app that runs for several minutes on a
> | dual proc/dual core opteron SLES 9 box.  Plenty of RAM.  It uses 4
> | threads to do some parallel computation to take advantage of all 4
> | cpu cores. Looking at 'top', or using 'time' to measure cpu time,
> | it's logging 30% or more sys time vs. 60-70% user time during the
> | parallel phases. I ran it with 'strace -c' and that says only 0.003
> | seconds are spent in system calls.  No system calls going on at all
> | on the 4 threads. So what does the 'sys' time represent as reported
> | by top?
>