Discussion:
Webserver PHP runs C++ program
(too old to reply)
Frederick Gotham
2020-09-24 09:50:07 UTC
Permalink
I've recently started doing web GUI programming.

On the web server, I have a PHP script that uses the "exec" function to run my C++ program.

My C++ program performs two HTTPS requests, and depending on the data it gets back, it might perform 2 or 3 more HTTPS requests. My program then prints HTML code to stdout. The PHP script takes this HTML and throws it up on the end user's screen as a webpage.

My C++ program could fall down in several ways. Any of the HTTPS requests could fail, or return partial (or corrupt) data. There could be an uncaught exception from the networking code, or a segfault in a 3rd party library. It could fail in lots of ways.

My C++ code at the moment is quite clean, and I don't want to litter it with error-handling code.

One thing I could do is throw an "std::runtime_error" whenever anything goes wrong, then let these exceptions propagate up to 'main', and then in 'main' just restart the whole program.

Another option would be to kill my program with "std::exit(EXIT_FAILURE)" when anything goes wrong. Then I would have a Linux shell script that restarts my program. The rationale of the Linux shell script would be:
"Run the C++ program and check that it returns EXIT_SUCCESS. If it doesn't return EXIT_SUCCESS, then try to restart it. If it fails 5 times in a row, stop trying."

I would also make it a little more complicated:
"Put a time limit of 4 seconds on the C++ program -- if it runs into 5 seconds then kill it and start it again (up to a max of 5 times)".

A simple Linux script to constantly restart a program if it fails looks like this:

#!/bin/sh
until my_program; do
echo "Program 'my_program' crashed with exit code $?. Respawning.." >&2
sleep 1
done

So next to try 5 times, I could do:

#!/bin/sh
succeeded=0

for i in {1..5}
do
output=$(./myprogram)
status=$?

if [ ${status} -eq 0 ]; then
echo -n ${output} #This prints the HTML to stdout
succeeded=1
break
fi

sleep 1
done

if [ ${succeeded} -eq 0 ]; then
echo -n "<h2>Error</h2>"
exit 1
fi

And then finally to give it a max time of 4 seconds, use the program "timeout" which will exit with status 124 if it times out:

#!/bin/sh
succeeded=0

for i in {1..5}
do
output=$(timeout --signal SIGKILL 4 ./myprogram)
status=$?

if [ ${status} -eq 0 ]; then
echo -n ${output} #This prints the HTML to stdout
succeeded=1
break
fi

sleep 1
done

if [ ${succeeded} -eq 0 ]; then
echo -n "<h2>Error</h2>"
exit 1
fi

And so then in my C++ program, I'd have;

inline void exitfail(void) { std::exit(EXIT_FAILURE); }

And then in my C++ program if I'm parsing the HTML I get back, and something's wrong:

string const html = PerformHTTPSrequest(. . .);

size_t const i = html.rfind("<diameter>");

if ( string::npos == i ) exitfail();

So this way, if my C++ program fails in any way, an entire new process is spawned to try again (which might be the right thing to do if it's a runtime error for example to do with loading a shared library).

Any thoughts or advice on this?
James K. Lowden
2020-09-24 15:14:22 UTC
Permalink
On Thu, 24 Sep 2020 02:50:07 -0700 (PDT)
Post by Frederick Gotham
My C++ code at the moment is quite clean, and I don't want to litter
it with error-handling code.
80% of any reliable program is error handling. A program is easier to
*read* without error handling, but will fail in mysterious ways, far
from the source of trouble.

Without error handling, your code is not "clean"; it is merely simple.
Post by Frederick Gotham
"Run the C++ program and check that it returns EXIT_SUCCESS. If
it doesn't return EXIT_SUCCESS, then try to restart it. If it fails 5
times in a row, stop trying."
Why 5? Why not 2 or 15? Or better: zero.

You're incorrectly assuming that all failures should be treated the
same and that every failure deserves a retry. But if the file is not
there, no amount of trying to open it will succeed. If the network is
down, only by chance might it come back up in time for the Nth retry.

What you really want to do is tell the user what happened, and either
give him a chance to fix it, or die with an anguished message about the
futility of it all.

Broadly, there are two kinds of errors: logic errors and runtime
errors.

Logic errors are the programmer's fault, allowing things to happen that
"can't happen". No amount of user action can repair what can only be
fixed by changing the program. Such errors are usually fatal.

Runtime errors are external failures or pilot error. A file may be
missing, a wrong key pressed, a misconfiguration. Some are
fatal -- say, because the program can't parse its configuration file --
and others are recoverable. Part of the art of programming is
distinguishing between the two, and making the user's path to recovery
as painless as possible.

It all comes back to the UI. Every program has a layer that "talks to"
the user. Every recoverable problem has a meaning in that context.
The lower levels' job is to report the status back to the UI, where it
can be interpreted in the user's context.

MS-DOS had an error-handling system that violated that principle. If a
system call failed, it would print a message on the screen that later
became a joke: "a)bort, r)etry, or i)gnore?" Because the message was
produced by the OS at the point of failure, any full-screen application
(which was most of them) risked having its pretty screen corrupted by
the message. Worse, it was often unclear (in the user's context) what
had failed and which action should be taken. One eventually learned
that "retry" only worked reliably if some positive action was taken,
such as inserting the right floppy disk, or turning the printer on.
Ignoring the error only led to more errors. Usually, the only real
answer was "abort". Frequently, the next step was to reach for the Big
Red Switch and reboot the computer.

I hope you see the parallel to your own question. Decades later,
you're asking whether the supervisor should blindly press "r" 5 times
for every error. We know from experience that won't turn out well!

Happy hacking.

--jkl
Frederick Gotham
2020-09-28 13:31:57 UTC
Permalink
Post by James K. Lowden
Post by Frederick Gotham
My C++ code at the moment is quite clean, and I don't want to litter
it with error-handling code.
80% of any reliable program is error handling.
I don't agree with this figure. I have open-source programs of my own, and in my day job I am employed to develop firmware 37 hours a week. Like anything in programming code (e.g. C++, Javascript), duplication should be avoided -- including in error-handling.

If a program has 80% error-handling code then I would be very sceptical unless the code is doing something that is _extremely_ error-prone (for example generating an analogue signal from a source that is big-banging digital samples and which have independent clock sources).

A robust approach to error-handling is to avoid the pursuit of perfection, and instead to make the best of a bad situation. Let me give an example:
Let's say there's a command line program that takes two command line arguments as follows:

./some_program first_arg second_arg

And now let's say that this program, when it succeeds, takes at most 5 milliseconds to run. (If it takes more than 5ms then it invariably fails).

So we can use the Linux "timeout" program to discard the processes that last longer than 50ms:

timeout -s SIGKILL 5ms ./some_program first_arg second_arg

Furthermore from testing we know that if the program takes less than 5ms to run, then it succeeds 45% of the time.

Lastly, let's say that this program fails in such an unpredictable fashion that it is extremely unlikely (less than 1 in a million chance) that two erroneous runs will give the same output.

So here's how we turn our 5-millisecond program that succeeds 20% of the time into a a 20-millisecond program that succeeds 99.9% of the time:

(1) Discard processes lasting longer than 5ms
(2) Take the md5 sum of an output that might be correct. Run it again and take another md5 sum of another output that might be correct.
(3) If two md5 sums match, then take it to be correct.

It's nice to be able to code something knowing that it will work every time, but it's also important to able to write and use programs that will fail frequently.

In my day job writing firmware, the device has to be able to recover very quickly from all kinds of failure -- and the most reliable way to do that is to kill the process and restart it.
Rainer Weikusat
2020-09-28 14:13:52 UTC
Permalink
Post by Frederick Gotham
Post by James K. Lowden
Post by Frederick Gotham
My C++ code at the moment is quite clean, and I don't want to litter
it with error-handling code.
80% of any reliable program is error handling.
I don't agree with this figure. I have open-source programs of my own,
and in my day job I am employed to develop firmware 37 hours a
week. Like anything in programming code (e.g. C++, Javascript),
duplication should be avoided -- including in error-handling.
This presumably refers to handling system call errors: Usually, the code
depends on actions supposed to be initated by system calls to actually
be performed. Hence, system calls shouldn't be allowed to fail silently.

[...]
Post by Frederick Gotham
A robust approach to error-handling is to avoid the pursuit of
perfection, and instead to make the best of a bad situation. Let
me give an example: Let's say there's a command line program that
That's presumably the same "robust approach" which is responsible for
all these everyday computer systems which keep falling over causing more
or less damage to their users, eg, real example, an ATM suddenly
shutting itself down with a user's bank card inside (happened to me a
couple of weeks ago).
Post by Frederick Gotham
./some_program first_arg second_arg
And now let's say that this program, when it succeeds, takes at most 5
milliseconds to run. (If it takes more than 5ms then it invariably
fails).
This is impossible to know on a general purpose, multi-programming
computer system running an unknown set of applications. For a very
simplistic example, someone could have stopped the program with the
intent to continue it later.

And if "the internet" is involved, things start to become really
variable.
Post by Frederick Gotham
So we can use the Linux "timeout" program to discard the processes
timeout -s SIGKILL 5ms ./some_program first_arg second_arg
Furthermore from testing we know that if the program takes less than
5ms to run, then it succeeds 45% of the time.
Translated into plain English, this means "from testing, we know that we
have absolutely no idea what the code is doing in practice but we've
managed to convince ourselves that 'fails at least 55% of the time' is
good enough for users" (whose time and assets can be wasted freely as
our employer doesn't have to pay for that!).
t***@gmail.com
2020-09-28 17:16:10 UTC
Permalink
Post by Rainer Weikusat
That's presumably the same "robust approach" which is responsible for
all these everyday computer systems which keep falling over causing more
or less damage to their users, eg, real example, an ATM suddenly
shutting itself down with a user's bank card inside (happened to me a
couple of weeks ago).
I've seen ATM's spontaneously reboot like that and swallow a card.

The way to deal with things like this is:
If the process fails or if the watchdog timer runs out, or if the entire device loses power and regains power, just release anything that might need to be given back (e.g. an ATM card or a fuel rod).
Kaz Kylheku
2020-09-28 18:09:53 UTC
Permalink
Post by t***@gmail.com
Post by Rainer Weikusat
That's presumably the same "robust approach" which is responsible for
all these everyday computer systems which keep falling over causing more
or less damage to their users, eg, real example, an ATM suddenly
shutting itself down with a user's bank card inside (happened to me a
couple of weeks ago).
I've seen ATM's spontaneously reboot like that and swallow a card.
I suspect it retains the card deliberately. After booting up, the
system has no idea whether the cardholder is still present.

It can't just hand out the card to any random passerby.

To be able to return the card, the system need an authenticated session
which has not timed out.

We can make systems that return objects, such as compact discs, even if
power cycled unexpectedly.
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Frederick Gotham
2020-09-28 20:56:13 UTC
Permalink
Post by Kaz Kylheku
It can't just hand out the card to any random passerby.
To be able to return the card, the system need an authenticated session
which has not timed out.
My proposal for a solution:

Have a totally separate 8-bit microcontroller, for example Microchip PIC12LF1552, that behaves like a watchdog. On the main computer, use a GPIO pin to set one of the PIC's pins high every second to restart the timer. If the PIC goes 8 seconds without being kicked, the PIC immediately ejects the bank card.

So if the main computer freezes or spontaneously reboots, the PIC will shoot the card out within 8 seconds. However if there's a power cut and the PIC power cycles, then it does _not_ eject the card (because some time may have passed and now somebody else is standing at the ATM).
b***@nuttyella.co.uk
2020-09-29 08:17:32 UTC
Permalink
On Mon, 28 Sep 2020 13:56:13 -0700 (PDT)
Post by Kaz Kylheku
It can't just hand out the card to any random passerby.
=20
To be able to return the card, the system need an authenticated session
which has not timed out.
Have a totally separate 8-bit microcontroller, for example Microchip PIC12L=
F1552, that behaves like a watchdog. On the main computer, use a GPIO pin t=
o set one of the PIC's pins high every second to restart the timer. If the =
PIC goes 8 seconds without being kicked, the PIC immediately ejects the ban=
k card.
I suspect all cash machines, just like servers, already have some kind of
seperate lights out cpu sitting on the motherboard which could do this.
Rainer Weikusat
2020-09-28 21:07:49 UTC
Permalink
Post by Kaz Kylheku
Post by t***@gmail.com
Post by Rainer Weikusat
That's presumably the same "robust approach" which is responsible for
all these everyday computer systems which keep falling over causing more
or less damage to their users, eg, real example, an ATM suddenly
shutting itself down with a user's bank card inside (happened to me a
couple of weeks ago).
I've seen ATM's spontaneously reboot like that and swallow a card.
I suspect it retains the card deliberately. After booting up, the
system has no idea whether the cardholder is still present.
It can't just hand out the card to any random passerby.
Based on the same logic, the system shouldn't ever hand out money or
return a card because it cannot possibly know if the card holder hasn't
meanwhile run away for some reason :->.

In this case, it just initiated what appeared to be a regular Windows
shutdown after the amount was entered and came back up with a message
stating that it would be out of order now --- a classic case of "Shit
happened. You lost."
Jorgen Grahn
2020-09-30 19:01:46 UTC
Permalink
Post by Kaz Kylheku
Post by t***@gmail.com
Post by Rainer Weikusat
That's presumably the same "robust approach" which is responsible for
all these everyday computer systems which keep falling over causing more
or less damage to their users, eg, real example, an ATM suddenly
shutting itself down with a user's bank card inside (happened to me a
couple of weeks ago).
I've seen ATM's spontaneously reboot like that and swallow a card.
I suspect it retains the card deliberately. After booting up, the
system has no idea whether the cardholder is still present.
If it trusts its clock and can tell it was down only $short_time,
releasing the card would make everyone happier.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Bit Twister
2020-09-28 14:44:56 UTC
Permalink
Post by Frederick Gotham
Post by James K. Lowden
Post by Frederick Gotham
My C++ code at the moment is quite clean, and I don't want to litter
it with error-handling code.
Christ on a cracker. How hard would it be have a check_status function
after any operation returning a status to be checked.
Code would be no dirtier.
Post by Frederick Gotham
Post by James K. Lowden
80% of any reliable program is error handling.
I don't agree with this figure. I have open-source programs of my own, and in my day job I am employed to develop firmware 37 hours a week. Like anything in programming code (e.g. C++, Javascript), duplication should be avoided -- including in error-handling.
If a program has 80% error-handling code then I would be very sceptical unless the code is doing something that is _extremely_ error-prone (for example generating an analogue signal from a source that is big-banging digital samples and which have independent clock sources).
A robust approach to error-handling is to avoid the pursuit of perfection, and instead to make the best of a bad situation.
Sounds like bullshit to me and maybe the methodology of some OS that is
to remain nameless.

As a maintenance programmer I took over an application. Got a call about
2am Sunday from the computer operator that "my app's cron job" failed.

Drove into work, quick glance at the log, a few tests later, determined it
was a network problem. Told the computer operator to call network services.

Spent Monday expanding the batch job by ~97% to make it identify all the
possible system failure points and provide the operator with instructions
on who to contact about a system failure. It was quite rewarding every
time I saw an email about a failure outside my control.

I also found that setting strict compile/link/runtime controls pointed out
syntax/coding errors. Testing all return status/code values with good
error messages greatly improved fault isolation/locating time.

Extensive testing of user land input with detailed reason for input
rejection reduced time spent on the phone helping users.
Rainer Weikusat
2020-09-24 16:14:40 UTC
Permalink
Frederick Gotham <***@gmail.com> writes:

[...]
Post by Frederick Gotham
My C++ program performs two HTTPS requests, and depending on the data
it gets back, it might perform 2 or 3 more HTTPS requests. My program
then prints HTML code to stdout. The PHP script takes this HTML and
throws it up on the end user's screen as a webpage.
[...]
Post by Frederick Gotham
"Put a time limit of 4 seconds on the C++ program -- if it runs
into 5 seconds then kill it and start it again (up to a max of 5
times)".
While the principle is generally sensible, I consider this (very short)
timeout a really bad idea. Stuff sometimes takes longer on the internet
and software should be capable of handling that. If this is somehow
interactive, a way a user could use to cause an abort if he doesn't want
to wait any longer would IMO make more sense.

Otherwise, the program should implement the timeout in a sensible, ie,
abort a HTTPS request after not receiving any data for a "long" time (I
would make this at least a minute).
Lew Pitcher
2020-09-25 14:54:57 UTC
Permalink
Post by Frederick Gotham
I've recently started doing web GUI programming.
On the web server, I have a PHP script that uses the "exec" function to run my C++ program.
[snip]
Post by Frederick Gotham
So this way, if my C++ program fails in any way, an entire new process
is spawned to try again (which might be the right thing to do if it's a
runtime error for example to do with loading a shared library).
Any thoughts or advice on this?
I'd look at this from the web-application user's point of view.

As a user, I visit a web page, possibly fill out a form, and click
"Submit" (or do something else that kicks off your program). I expect the
web page to change (even if it is just a "confirmation" message), or some
other recognizable activity to occur (such as an email to be sent, etc)
as a result.

As that user, I have a problem if
- the web page does not respond in a timely manner, or
- the web page does not perform the action requested, or
- the web page provides a response that is not relevant to any activity
I can perform

Let me explain:

If the web page takes "too long" (I believe the UX people measure this in
fractions of seconds), then, as a user, I'm likely to either retry my
action, or abandon it completely. Your program /cannot/ wait too long to
respond; it must complete and report back (either success /or/ failure)
in no more than a few seconds.

If the web page doesn't visibly perform the action requested (update a
database, send an email, play a video, etc.), then, as a user, I'm again
likely to either retry or abandon my action. Your program /must/ report
back with timely, relevant information.

If the web page (or your program below it) cannot, for any reason,
perform the requested action, it should /only/ report (to me, the user)
that the action cannot be performed, give a short nontechnical
explanation as to /why/ the action cannot be performed, and (if possible)
give instructions as to what /I, the user/ can do about it. The web page
should /not/ report /to me/ such irrelevant things as which line of code
failed, what the network error message was that the program received, or
any other /technical/ stuff. The page should still generate all that
stuff, but report it to YOU, the technical support (say, in an email or a
dump file), not me.

What does all this mean to your C++ program?

Don't retry for too long; think total elapsed time in milliseconds, not
seconds.

Don't just hang on failure; report /something/ to the user.

Don't present obscure error messages on the web page; send them in an
email to the appropriate tech support contact.

Just my 2 cents worth.

HTH
--
Lew Pitcher
"In Skills, We Trust"
Rainer Weikusat
2020-09-25 20:13:34 UTC
Permalink
Post by Lew Pitcher
Post by Frederick Gotham
I've recently started doing web GUI programming.
On the web server, I have a PHP script that uses the "exec" function to
run my C++ program.
[snip]
Post by Frederick Gotham
So this way, if my C++ program fails in any way, an entire new process
is spawned to try again (which might be the right thing to do if it's a
runtime error for example to do with loading a shared library).
Any thoughts or advice on this?
I'd look at this from the web-application user's point of view.
As a user, I visit a web page, possibly fill out a form, and click
"Submit" (or do something else that kicks off your program). I expect the
web page to change (even if it is just a "confirmation" message), or some
other recognizable activity to occur (such as an email to be sent, etc)
as a result.
As that user, I have a problem if
- the web page does not respond in a timely manner, or
- the web page does not perform the action requested, or
- the web page provides a response that is not relevant to any activity
I can perform
If the web page takes "too long" (I believe the UX people measure this in
fractions of seconds), then, as a user, I'm likely to either retry my
action, or abandon it completely. Your program /cannot/ wait too long to
respond; it must complete and report back (either success /or/ failure)
in no more than a few seconds.
"The UX people" don't understand the difference between local and
distributed applications because that's "something technical" :->. A
networked application cannot "report succcess or failure" "in a few
seconds" because that's not how stuff works on the internet. The only
thing such an application is guaranteed to be able to do ("within a few
seconds") is to tell the user that it refuses to perform the requested
action because "the UX people" believe "giving up quickly" is better
than "eventually succeeding".

This means a user needs some sort of feedback that work on the request
is ongoing and a way to stop processing of the request if he so
desires. Eg, assuming I'm trying to buy some item from a limited
quantity, in happy pre-corona-times, this would usually have been a
concert ticket. I absolutely do not want processing to be aborted
"after a few seconds" with a message "UX people are impatient. The
money will be booked back onto your account automatically within two
weeks. Better luck next time!".
Loading...