Post by Kaz Kylheku Post by Kenny McCormack
If I use popen(str,"r") where str is supplied by an untrusted user, what can
go wrong? Note: I'm not debating whether or not it is safe (I'm pretty
sure of the answer), but rather, I'm looking for an example of an unsafe
string (I.e., something an attacker would do).
For instance, let str = "rm -rf ~".
Popen runs arbitrary shell commands.
Executing a shell command from an untrusted source is exactly the same
thing as logging in remotely to the system via SSH using a public
terminal, and then walking away so that anyone else can use the session.
Post by Kenny McCormack
Also, and this is related, is there a version of popen() (or some library
or something available) that is bidirectional - i.e., you can both write
and read from it - for example, you could run the Unix 'sort' utility this
way - send it some data, then read back the sorted result (*).
No. You have to "sandbox" the contents of "str" yourself before passing
it to popen.
For instance you could define your own scripting language (some safe
subset of the shell, probably). In this sandboxed language, unsafe things are
somehow impossible to write (in what ways, to be decided by your design).
You write a compiler for this language whose output is the regular shell
language, and that output is fed to popen(), system(), or to
execl("/bin/sh", "/bin/sh" "-c", str, ...) etc.
Even if that compiler outputs code that uses unsafe features of the
shell language, they are not used in unsafe ways, because the
translation preserves the safe semanics of the sandboxed language.
Here is a trivial example.
Suppose we write a validator for this language.
expr -> expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '(' expr ')'
ident := [a-zA-Z][a-zA-Z0-9]+
number := [-/+]?[0-9]+
If we validate the string to conform to this language, then it loks
like "a + 3 / 4" and whatnot.
We reject strings that don't conform.
Then we can safely do this---almost!
snprintf(big_buffer, .... "echo $(( %s ))", str);
/* check for truncation */
FILE *pipe = popen(big_buffer, "r");
We have defined a safe arithmetic language that we can use the shell to
execute. It won't clobber anything in our host environment.
However, it provides unfettered access to environment variables.
Suppose that the environment has a sensitive, integer-valued environment
variable SECRET_ENV_VAR. The untrusted user can supply that expression
and thereby learn the value of that variable.
Thus, suppose we take this idea further and define a more useful
language than just a calculator language. We have to guard against
leaking secrets from the environment.
One way would be namespacing. The variables in our language like ABC
or def would not translate into the same-named shell variables, but
into, say, sb_ABC and sb_def ("sb" == sandbox).
We could allow that language to have some environment manipulation.
For that we would provide some API. Only certain environment variables
would be loaded into sandboxed variables. For instance if we consider
TERM to be safe, we could pre-load sb_TERM with the value of TERM.
Likewise, we would have a carefully controlled "export" feature, which
only allows certain variables.
If ABC is an export-allowed variable, then the statement
"export ABC=42" in the sandboxed scripting language would
translate to "sb_ABC=42; export ABC=$sb_ABC". I.e. set the local
variable, and then also export the corresponding environment variable
which really has to be called ABC.
Our compiler would gather a list of all variables referenced by the
program, and then for that subset of those variables which are
"environment-allowed", it would emit an initial code block like:
sb_FOO=$FOO ; sb_BAR=$BAR ; ...
# BAZ is not on the whitelist so doesn't appear above
to fetch the value of all referenced whitelisted values from the
environment. Thus the language could access the env var $FOO and $BAR,
but $BAZ would appear uninitialized even if there is such an environment
TXR Programming Language: http://nongnu.org/txr