Discussion:
CPU spike with system but no with popen
(too old to reply)
Frederick Gotham
2020-10-28 10:58:32 UTC
Permalink
I'm part of a team that develops firmware for embedded Linux devices (arm32 and aarch64).

This week we tracked down a bug whereby "ioctl" was returning error code 11 (EAGAIN) in certain circumstances, whereas previously it was never failing.

We looked through the recent revisions to the codebase, and narrowed it down to the following:
A Linux shell script was opened with "popen", and then "pclose" was called immediately afterward ('pclose' doesn't kill the process - it waits until it's finished). I replaced these two calls with one call to "system".

The process that calls "ioctl" is a different process to the one that was using "popen/pclose" to run a script.

One of my team members theorised that the use of "system" was causing a spike in CPU/RAM usage which meant that "ioctl" came back with "Resource temporarily unavailable". My colleague said that 'system' spawns a new shell, but since we're running a script instead of a binary, I thought 'popen' would need to spawn a new shell too.

I reverted the 'system' call back to 'popen/pclose' and now everything's working fine again.

Has anyone encountered behaviour like this before?
Lew Pitcher
2020-10-28 15:23:33 UTC
Permalink
Post by Frederick Gotham
I'm part of a team that develops firmware for embedded Linux devices (arm32 and aarch64).
This week we tracked down a bug whereby "ioctl" was returning error code
11 (EAGAIN) in certain circumstances, whereas previously it was never
failing.
We looked through the recent revisions to the codebase, and narrowed it
A Linux shell script was opened with "popen", and then "pclose" was
called immediately afterward ('pclose' doesn't kill the process - it
waits until it's finished). I replaced these two calls with one call to
"system".
The process that calls "ioctl" is a different process to the one that
was using "popen/pclose" to run a script.
One of my team members theorised that the use of "system" was causing a
spike in CPU/RAM usage which meant that "ioctl" came back with "Resource
temporarily unavailable". My colleague said that 'system' spawns a new
shell, but since we're running a script instead of a binary, I thought
'popen' would need to spawn a new shell too.
Although implementations differ, the POSIX definitions of both system(3)
and popen(3) say that each call does the equivalent of a
fork(2)
with an
execl("/bin/sh", "sh", "-c", command, (char *) 0);
in the resulting child.

There should be little difference in overhead between either call; the
system(3) version of your code /might/ consume more buffer than the popen
(3) version, as system(3) I/O is typically directed to file, while popen
(3)'s I/O is handled by the invoking process.
Post by Frederick Gotham
I reverted the 'system' call back to 'popen/pclose' and now everything's
working fine again.
Has anyone encountered behaviour like this before?
--
Lew Pitcher
"In Skills, We Trust"
Scott Lurndal
2020-10-28 16:57:20 UTC
Permalink
Post by Lew Pitcher
Post by Frederick Gotham
I'm part of a team that develops firmware for embedded Linux devices (arm32 and aarch64).
This week we tracked down a bug whereby "ioctl" was returning error code
11 (EAGAIN) in certain circumstances, whereas previously it was never
failing.
We looked through the recent revisions to the codebase, and narrowed it
A Linux shell script was opened with "popen", and then "pclose" was
called immediately afterward ('pclose' doesn't kill the process - it
waits until it's finished). I replaced these two calls with one call to
"system".
The process that calls "ioctl" is a different process to the one that
was using "popen/pclose" to run a script.
And this is the key to understanding what is happening, yet you don't
provide sufficient information:

- What file type was the IOCTL issued to? Block? Character? Regular? Fifo? Pipe?
- What was the IOCTL argument? There are hundreds, each of which have unique
error conditions.
Post by Lew Pitcher
Post by Frederick Gotham
One of my team members theorised that the use of "system" was causing a
spike in CPU/RAM usage which meant that "ioctl" came back with "Resource
temporarily unavailable". My colleague said that 'system' spawns a new
shell, but since we're running a script instead of a binary, I thought
'popen' would need to spawn a new shell too.
Although implementations differ, the POSIX definitions of both system(3)
and popen(3) say that each call does the equivalent of a
fork(2)
with an
execl("/bin/sh", "sh", "-c", command, (char *) 0);
in the resulting child.
There should be little difference in overhead between either call; the
system(3) version of your code /might/ consume more buffer than the popen
(3) version, as system(3) I/O is typically directed to file, while popen
(3)'s I/O is handled by the invoking process.
Concur.
Ralf Fassel
2020-10-28 17:45:14 UTC
Permalink
* Frederick Gotham <***@gmail.com>
| We looked through the recent revisions to the codebase, and narrowed it down to the following:
| A Linux shell script was opened with "popen", and then "pclose" was
| called immediately afterward ('pclose' doesn't kill the process - it
| waits until it's finished). I replaced these two calls with one call
| to "system".

One notable difference is that with system() the stdin/stdout goes to
the original stdin/stdout of the calling process, whereas with popen()
one of them (depending on the mode parameter) gets redirected to the FILE*.

If pclose() is called immediately after a "r"-popen(), any write() in
the called process might run into write-to-pipe-with-no-reader (EPIPE),
thus terminating the process if the signal is not caught

cf. write(2):
EPIPE fd is connected to a pipe or socket whose reading end is
closed. When this happens the writing process will also
receive a SIGPIPE signal. (Thus, the write return value is
seen only if the program catches, blocks or ignores this signal.)

With system(), the write will not fail, and the program continues to run.

| My colleague said that 'system' spawns a new shell, but since we're
| running a script instead of a binary, I thought 'popen' would need to
| spawn a new shell too.

It does.

DESCRIPTION
The popen() function opens a process by creating a pipe, forking,
and invoking the shell.

HTH
R'
Kaz Kylheku
2020-10-28 18:46:20 UTC
Permalink
Post by Frederick Gotham
I'm part of a team that develops firmware for embedded Linux devices (arm32 and aarch64).
This week we tracked down a bug whereby "ioctl" was returning error
code 11 (EAGAIN) in certain circumstances, whereas previously it was
never failing.
We looked through the recent revisions to the codebase, and narrowed
it down to the following: A Linux shell script was opened with
"popen", and then "pclose" was called immediately afterward ('pclose'
doesn't kill the process - it waits until it's finished). I replaced
these two calls with one call to "system".
The process that calls "ioctl" is a different process to the one that
was using "popen/pclose" to run a script.
One of my team members theorised that the use of "system" was causing
a spike in CPU/RAM usage which meant that "ioctl" came back with
"Resource temporarily unavailable". My colleague said that 'system'
spawns a new shell, but since we're running a script instead of a
binary, I thought 'popen' would need to spawn a new shell too.
I reverted the 'system' call back to 'popen/pclose' and now
everything's working fine again.
Has anyone encountered behaviour like this before?
There is information missing, which prevents this from being identified
as the root cause.

The alleged CPU/RAM usage has not been directly confirmed; it is only
hypothesized.

The perceived cause effect between system being used instead of popen,
and the ioctl failing with EAGAIN, could be a coincidence. Or there
could be a link, but something other than system resource use.

When ioctls fail, the first thing to do is to find where in the kernel
this is happening: trace that exact ioctl command to the piece of code
in the device which is handling it. Stick a printk into that return case
and reproduce it to confirm that it is that case.

errno codes are very generic and sometimes are assigned to situations
in haphazard ways. Consider for isntance that EAGAIN can be returned
by a socket report that a non-blocking operation didn't complete,
which has nothing to do with system resources.
--
TXR Programming Language: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Rainer Weikusat
2020-10-29 14:34:13 UTC
Permalink
Post by Frederick Gotham
I'm part of a team that develops firmware for embedded Linux devices (arm32 and aarch64).
This week we tracked down a bug whereby "ioctl" was returning error
code 11 (EAGAIN) in certain circumstances, whereas previously it was
never failing.
EAGAIN isn't usually an error which makes any sense for an ioctl
operation as it's meant to communicate than an I/O operation on a file
descriptor switched to non-blocking mode cannot be completed at the
moment, ie, there's either no input available or no buffer space for
more output.

First, one should make sure that the EAGAIN error actually comes from
the ioctl, ie, that the call actually failed (as indicated by the return
code). errno could have been set by an entirely different call. Assuming
this has been ascertained, the next step would be (as already suggested)
to determine why this error is being returned from the corresponding
kernel code.

Loading...