Discussion:
select rtns ready;read returns 0
(too old to reply)
michael potter
2004-11-16 02:00:58 UTC
Permalink
I am seeing a condition in my code that causes a small looping
condition.

select returns that a pipe is ready to read, but when the pipe is read
there is no data to be read. The program returns to the select which
immediately returns that there is data to be read ...

I have added logic to detect this condition and sleep for one second.
This sleep seems to allow the data that is supposed to be in the pipe
to "catch up" with what ever select is looking at to determine that
there is data in the pipe. Adding the one second sleep probably frees
up the cpu so it can deliver the data.

I am getting ready to add some more logic to gather more information
about what I am seeing. It does not happen very often so it is a
difficult problem to track. The one second sleep has turned it into a
low priority problem too.

Can someone in the group give me some insight on what I am seeing?

Not supprisingly, It seems to happen when the cpu is busy.

I have seen this on aix 5.x and solaris 9.x
--
Michael Potter
***@gmail.com
Barry Margolin
2004-11-16 03:03:16 UTC
Permalink
Post by michael potter
I am seeing a condition in my code that causes a small looping
condition.
select returns that a pipe is ready to read, but when the pipe is read
there is no data to be read. The program returns to the select which
immediately returns that there is data to be read ...
When read() returns 0, it means that you have reached EOF. In the case
of a pipe, it means that the writing process has closed its end of the
pipe. There's nothing more to read, you should exit the loop.
--
Barry Margolin, ***@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
michael potter
2004-11-16 23:44:39 UTC
Permalink
read will return 0 if O_NDELAY is set, but as far as i can tell it is
not set, so...

Lets assume that read is not returning zero. I would like to explain
why select and read are in a tight loop. It could be explained if the
buffer on the pipe is small so it takes many reads to get all the
data.

What is the buffer size on an unnamed pipe?
Can I increase it?

My goal is to increase efficiency by reducing the number of reads it
takes to get all the data from the pipe.

Thanks for the "listening" and responding.
Alex Fraser
2004-11-17 01:23:52 UTC
Permalink
Post by michael potter
read will return 0 if O_NDELAY is set, but as far as i can tell it is
not set, so...
A read() on a pipe will return zero when all data have been read and there
are no writers, irrespective of O_NDELAY (or O_NONBLOCK). If you select() on
a pipe in this condition, it will be marked as ready for reading.
Post by michael potter
Lets assume that read is not returning zero. [...]
Is this a hypothetical situation?
Post by michael potter
What is the buffer size on an unnamed pipe?
Implementation defined, but at least 512 bytes, and probably more these days
in most cases.
Post by michael potter
Can I increase it?
I don't think so, at least not in a portable manner. Note that bigger
buffers don't always increase throughput.

Alex
michael potter
2004-11-17 17:12:19 UTC
Permalink
Post by Alex Fraser
Post by michael potter
read will return 0 if O_NDELAY is set, but as far as i can tell it is
not set, so...
A read() on a pipe will return zero when all data have been read and there
are no writers, irrespective of O_NDELAY (or O_NONBLOCK). If you select() on
a pipe in this condition, it will be marked as ready for reading.
I am not saying what you wrote was wrong.
here is an edited fragment from the read man page on aix:
------------
When attempting to read from an empty pipe:
If some process has the pipe open for writing:
If O_NDELAY is set, the read subroutine returns a value of 0.
------------
Post by Alex Fraser
Post by michael potter
Lets assume that read is not returning zero. [...]
Is this a hypothetical situation?
no, i should have been clear: assume there is a writer and there is
data and read is not returning zero...
Post by Alex Fraser
Post by michael potter
What is the buffer size on an unnamed pipe?
Implementation defined, but at least 512 bytes, and probably more these days
in most cases.
Post by michael potter
Can I increase it?
I don't think so, at least not in a portable manner. Note that bigger
buffers don't always increase throughput.
Alex
I am just trying to explain the tight loop that I am seeing. A large
amount of data with a small buffer might explain it.

I have gotten what I want out of this thread: some idea of what to
look for while trying to explain the tight loop that I am seeing. A
large amount of data with a small buffer might explain it.

The way I detect loops in my select is to check the time every N
returns from select. If the time does not increase after N returns,
then I report a potential loop in the log file for my application.
this method has its faults, but it is low overhead.

Thanks for the discussion.
Michael Fuhr
2004-11-17 20:18:45 UTC
Permalink
Here's a more complete excerpt:

When attempting to read from an empty pipe (first-in-first-out (FIFO)):

* If no process has the pipe open for writing, the read returns 0
to indicate end-of-file.

* If some process has the pipe open for writing:
o If O_NDELAY and O_NONBLOCK are clear (the default), the read
blocks until some data is written or the pipe is closed by all
processes that had opened the pipe for writing.
o If O_NDELAY is set, the read subroutine returns a value of 0.
o If O_NONBLOCK is set, the read subroutine returns a value of -1
and sets the global variable errno to EAGAIN.

AIX appears to treat O_NDELAY and O_NONBLOCK differently; on many
systems they behave the same way and one may be #define'd as the
other. People using other systems are probably accustomed to the
behavior AIX uses for O_NONBLOCK. Is there a reason you're using
O_NDELAY instead of O_NONBLOCK?

I don't know how AIX's select() behaves in the face of these settings,
but are you sure you're using it correctly? Could we see an example
of your code?
--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
Alex Fraser
2004-11-17 22:34:51 UTC
Permalink
Post by Michael Fuhr
* If no process has the pipe open for writing, the read returns 0
to indicate end-of-file.
o If O_NDELAY and O_NONBLOCK are clear (the default), the read
blocks until some data is written or the pipe is closed by all
processes that had opened the pipe for writing.
o If O_NDELAY is set, the read subroutine returns a value of 0.
o If O_NONBLOCK is set, the read subroutine returns a value of -1
and sets the global variable errno to EAGAIN.
So for a pipe with O_NDELAY set, it's impossible to tell the difference
between "no writer" and "one or more writers but no data at this time"? That
doesn't seem terribly useful(!). What am I missing?

Alex
Geoff Clare
2004-11-18 14:02:26 UTC
Permalink
Post by Alex Fraser
So for a pipe with O_NDELAY set, it's impossible to tell the difference
between "no writer" and "one or more writers but no data at this time"? That
doesn't seem terribly useful(!). What am I missing?
You're not missing anything. This misfeature of O_NDELAY is exactly
the reason for it being withdrawn from the UNIX standards in 1988 (XPG3)
and replaced with O_NONBLOCK.
--
Geoff Clare <***@gclare.org.uk>
Barry Margolin
2004-11-18 02:13:57 UTC
Permalink
Post by Michael Fuhr
* If no process has the pipe open for writing, the read returns 0
to indicate end-of-file.
o If O_NDELAY and O_NONBLOCK are clear (the default), the read
blocks until some data is written or the pipe is closed by all
processes that had opened the pipe for writing.
o If O_NDELAY is set, the read subroutine returns a value of 0.
o If O_NONBLOCK is set, the read subroutine returns a value of -1
and sets the global variable errno to EAGAIN.
But since he's using select() to wait for data to be available in the
pipe, these last two cases shouldn't be relevant.
--
Barry Margolin, ***@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
Michael Fuhr
2004-11-18 02:48:29 UTC
Permalink
Post by Barry Margolin
Post by Michael Fuhr
o If O_NDELAY is set, the read subroutine returns a value of 0.
o If O_NONBLOCK is set, the read subroutine returns a value of -1
and sets the global variable errno to EAGAIN.
But since he's using select() to wait for data to be available in the
pipe, these last two cases shouldn't be relevant.
True, assuming that he's using select() correctly and that AIX's
select() behaves as on other systems with respect to marking a
descriptor as readable. I requested that he post some code so we
can check out the first assumption before challenging the second.
--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
Alex Fraser
2004-11-17 22:34:04 UTC
Permalink
[snip]
Post by michael potter
Post by Alex Fraser
Post by michael potter
Lets assume that read is not returning zero. [...]
Is this a hypothetical situation?
no, i should have been clear: assume there is a writer and there is
data and read is not returning zero...
Why assume something can easily be established as a fact (or not)?

[snip]
Post by michael potter
The way I detect loops in my select is to check the time every N
returns from select. If the time does not increase after N returns,
then I report a potential loop in the log file for my application.
this method has its faults, but it is low overhead.
Can't you reset the "select() returns" counter every time you successfully
read/write some data on any descriptor?

Alex
michael potter
2004-11-18 21:11:33 UTC
Permalink
Guys,

Thanks for all your comments. I think the read is select/read
combination is working as documented. I think I was fooled by an
unusually large amount of data being written to the pipe and the weak
nature of my loop detection algorithm.

As for posting my code: i have wrapper around select so that it is
easier to port to other systems. are you interested in seeing that
wrapper?
I would not mind getting feedback on it. I am a self taught c/unix
programmer (thank you k&r and richard stevens) and would not mind a
critique.

Thanks,
potter.
Michael Fuhr
2004-11-19 06:56:37 UTC
Permalink
Post by michael potter
Thanks for all your comments. I think the read is select/read
combination is working as documented. I think I was fooled by an
unusually large amount of data being written to the pipe and the weak
nature of my loop detection algorithm.
I wonder if the writer is closing the pipe, which, as Barry has
pointed out, would cause read() to return 0. Although O_NDELAY
causes read() to return 0 if a writer has the pipe open but there's
no data in the pipe, if you're using select() correctly then you
shouldn't encounter this case. Select() should mark the descriptor
as readable only if data is present or upon detecting EOF, so if
read() returns 0 then you've reached EOF; for a FIFO this means
that all writers have closed. With other file types you'd typically
close the descriptor because no more data will arrive, but with a
FIFO you can keep reading and eventually get new data when another
process opens the FIFO for writing. However, between the EOF and
the next writer's open(), select() will report that the descriptor
is readable and read() will return 0, which is what might be causing
your tight loop. I'm basing this analysis on experiments I performed
on Solaris 9.

I don't know of a reliable way to avoid the tight loop. You could
close and immediately re-open the FIFO upon detecting EOF, which
should cause select() to block until data is present or you reach
EOF again. Unfortunately this introduces a race condition: if a
writer opens the FIFO before the reader's close() then you could
lose data.

Can anybody find fault with my analysis or come up with a way to
avoid the tight loop that occurs after EOF and before the next
writer opens the FIFO? If not then I wonder if introducing a sleep()
might indeed be the best workaround.
--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
Barry Margolin
2004-11-17 04:43:20 UTC
Permalink
Post by michael potter
Thanks for the "listening" and responding.
Then why didn't you actually read my response? I told you why read
returns 0 -- you've reached EOF.
--
Barry Margolin, ***@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
Loading...