HTML Form Processing with PHP - php[architect] Magazine August 2014

Joe • July 10, 2023

learning phparch writing

Warning:

This post content may not be current, please double check the official documentation as needed.

This post may also be in an unedited form with grammatical or spelling mistakes, purchase the August 2014 issue from http://phparch.com for the professionally edited version.

HTML Form Processing with PHP

Forms are everywhere. Every application that interacts with a user uses forms in some manner. Whether you are logging into a website, filling out a survey, or signing up for a service you are interacting with a form that someone had to create in order to get some info from you.

You have a PHP application that needs input. How do you safely, securely, and reliably obtain that input or other data from your users?

The act of terminating a SQL statement and adding extra statements into an input to be handled by a database is called SQL Injection. This type of attack is often used to manipulate the database often with malicious intent. If you have ever done any SQL injection reading you have probably come across Little Bobby Tables. If you are not familiar then I would like to direct your attention to http://xkcd.com/327. If the idea of SQL injection is new to you allow me to explain what is happening. When the school ran the name "Robert'); DROP TABLE Students;" through their application, the database dropped the table named Students. This is a great example of what can happen when you are careless with your user submitted data.

<?
include("../includes/fmGlobals.php");

OpenConnection($hostName,$userName,$password,$database);

$SQL = "INSERT INTO fmDownloads (DocID, UserID, DLUnix,
    DLDate, DLTime)
        VALUES (" . $_REQUEST["id"] . ",
                " . $_REQUEST["userid"] . ",
                " . time() . ",
                '" . WriteDate(StraightDate(localtime()))
                    . "',
                '" . GetTime(localtime()) . "')";
DoQuery1($SQL);

$SQL = "SELECT DocFile FROM fmDocuments WHERE ID = "
    . $_REQUEST["id"];
$RS = mysql_fetch_array(DoQuery1($SQL));

$rd = "Location: ../documents/" . trim($RS["DocFile"]);

CloseAll();
header($rd);

Listing 1

Listing 1 is an example of bad code. A friend of mine had posted a similar code example taken from an application that he was working on. He was amazed at how terrible this code was and that the client had been sold a product with such glaring security risks.

What is this code even doing? Some new PHP developers may look at this and see nothing wrong. Seasoned veterans, especially those familiar with the renaissance PHP has been going through lately, would be quick to point out the various ways this code could be better. Lines 1-4 are including the database connection information and opening a SQL connection using the information from the included file. Line 6 we assign a SQL insert query to a variable. On line 14 we have a function that runs the query. Line 16 we are assigning a new select query to the previously used variable and on line 18 we run this query, fetch the results, and then assign those results to a variable. Line 20 we assign a path that we will redirect to based on a static string and a dynamic variable. Lines 22 and 23 we close any open database connections and we pass our redirect string to the header function and send our user along.

"But the code runs, why is it bad?"

Using PHP Short tags should be avoided because they must be manually enabled in the php.ini file. I once spent more hours than I am comfortable disclosing chasing down a bug that only appeared in a vagrant development machine but not on the same site on the production machine. After some hours going back and forth I realized that the production server had short_open_tag enabled but my vagrant machine did not. This is a prime example why you should always practice like you play, or develop like you production.

Listing 1 contains no data sanitization or validation. This is an common attack vector for many malicious users who are looking to exploit any hole in your application. You need to make sure the data you are receiving from your users is not only safe, but it is the type of data you expect. You want to make sure that their phone number is the correct length and contains numbers and not a long complex SQL string designed to drop your students table.

Since the code is not validating or sanitizing the user submitted input the code is saving the input directly into the database. This is a real world example of a SQL injection attack waiting to happen. The code is also not using PHP Data Objects (PDO). PDO is a lightweight extension for interacting with databases. PDO has been around since PHP 5.1 and is a PECL extension for 5.0. There is no reason for any code to not be using this instead of directly connecting to the database.

How do you safely get form data? Sanitize it! This removes malicious content from your input.

How do you securely get form data? Submit your forms over SSL! This prevents anyone from eavesdropping on your form's data.

How do you reliably get form data? Validate it! This ensures the form's email field actually contains a properly formatted email address.

To set up the next examples, I want to establish some assumptions. We are going to assume the code already has some validation that input exists. This is typically JavaScript but could also be handled on the server side. We assume the form will be using a POST or GET method. Our last assumption will be that everything is happening over SSL. (You have updated your openssl and reissued your certificates against new keys since Heartbleed Right?)

<div class="container">
<?php
if(isset($_POST['name'])){
    //form has been submitted
    ?>
    Hello <?php echo $_POST['name']; ?>
    <?php
} else {
    ?>
    <h1>Basic HTML Form</h1>
    <form name="basic_form" method="POST" action="#">
        <label>Name:
        <input type="text" name="name" id="name" value="">
        </label>
        <label>
        <input type="submit" id="submit" value="Submit">
        </label>
    </form>
    <?php
}
?>
</div> <!-- /container -->

Listing 2

Listing 2 is a basic HTML form using bootstrap elements for the styling. Lines 3 through 8 are our submitted form processing code and lines 10 through 18 is our form's HTML code. If we entered "Joe" into the input box and clicked submit, our output would be "Hello Joe".

What if someone clever comes along? What if someone malicious comes along?

Figure 1: Someone clever came along

Figure 1 answers both questions. This is an example of what can go wrong when you just print out input to the browser. If this input was being saved in a database it could easily be adjusted to a proper SQL injection attack in order to reveal, modify, or delete the contents of the database.

Where did we go wrong in Listing 2? Line 6 we echo out the post variable without sanitizing it or validating it. This leaves our form exposed to different types of attacks. The pop up window shown in figure 1 was generated by the user inputting a JavaScript alert in our name field. is all it takes.

If you are thinking to yourself "Oh, I should go review some of my forms right NOW!", hold that thought and while yes you should; let us take a look at how you prevent such an attack so you know what to do in order to protect your forms.

Looking back to Listing 2 on line 6:

Hello <?php echo $_POST['name']; ?>

We will wrap the variable with a function:

Hello <?php echo htmlentities($_POST['name']); ?>

The function htmlentities() means "don't parse the contents as HTML". So when a user submits our form with a JavaScript alert the output of the form can be seen here:

Hello script>alert('I aM l3et HaX0R..')</script>

The variable's content is parsed and if a character that has a HTML character entity equivalent it will be translated to that equivalent, instead of rendered as HTML so there is no JavaScript alert. If you view the source of our submitted form you will see what this means:

Hello &lt;script&gt;alert('i AI aM l3et HaX0R..')&lt;/script&gt;

There are different ways to sanitize and even validate data. Often the terms sanitize and validate are used interchangeably but it is important to remember the difference. Sanitizing your data is only removing any potential malicious elements from it. Validating your data is only ensuring that it is a string, number, or email address. You should always sanitize your data and always validate your data where appropriate.

There are three common ways to sanitize your data:

  • htmlentities()
  • htmlspecialchars()
  • filter_var()
    • Supports filters for sanitizing and validating

The htmlentities() function will translate all characters which have HTML character entity equivalents to these entities. Example:

echo htmlentities("<script>alert('My Résumé')</script>");
// Outputs: &lt;script&gt;alert('My R&eacute;sum&eacute;')&lt;/script&gt;

The htmlspecialchars() function will only translate characters that have special significance in HTML. Example:

echo htmlspecialchars("<script>alert('My Résumé')</script>");
// Outputs: &lt;script&gt;alert('My Résumé')&lt;/script&gt;

There is a great explanation of htmlentities() VS htmlspecialchars() on Stack Overflow: http://stackoverflow.com/questions/46483/htmlentities-vs-htmlspecialchars

Should you use filter_var() or htmlentities() or htmlspecialchars()? If you are working with PHP prior to version 5.2.0 then you will not have access to filter_var(). If you are working with special characters and want to ensure they are saved to a database without being encoded you will want to use htmlspecialchars(). Personally I prefer to use filter_var() since it allows very specific validation and sanitization flags.

But I am a one-person shop. I trust my users. If you have said either of the two previous statements about your application you are sorely mistaken. You cannot trust input from anyone. This includes APIs or other applications. Always sanitize and validate to the farthest extent you possibly can. Just because you are unable to see how your form could be exploited or abused does not mean it is impervious to abuse. You do not have to be a security expert. You have to be security conscious and take the proper steps to protect your forms.

A basic example of how your form could be exploited would be via a cURL request. A cURL request would bypass all client side error checking you may be doing via JavaScript. Assume we have Listing 2 on a server at http://some-server/form.php. We could use a 1 line cURL request from the terminal to post data to our form: cURL --data "name=CurlPosting" http://some-server/form.php. In the terminal we would see the raw HTML output from the submitted form. This example would print out our html elements and "hello CurlPosting". This is an incredibly basic way someone could interact with your form in a way you may have not considered. There are much more elaborate ways to use cURL to interact with forms. This is a prime example of why you should not rely on anything from the client side to sanitize or validate your data. Use JavaScript to make sure form fields contain data or that options are selected that you require and then check it again in PHP.

Should you sanitize or validate? Both! Remember that sanitizing your data only removes any potentially malicious contents while validating your data ensures that what you expect to be an email address or a phone number actually is.

Data validation is not just a way to make your application handle input better. Say you have a contact form that is requesting a phone number and an email address from a user so that your company can contact them later with some information. What happens if you do not validate the user's input and you only end up receiving half the phone number or the email address is missing the top-level domain? If you are unable to contact the person you have lost a potential sale or some other request that could negatively impact your business. If your data was properly validated, this would not be an issue.

Sanitizing Data

We now know why sanitizing our data is important. Imagine you have a form with an drop down select field and an input box. The select field has options for various salutations, "Miss", "Mrs.", "Mr.", and so on. The input field is just asking for a name. Listing 5 would be an example of the form and how we could sanitize the user submitted data. We use the filter_var() function and we pass it three arguments. First we pass in the variable we want to sanitize. Next we pass in FILTER_SANITIZE_STRING, which is a flag that tells the function we want to ensure the variable is a string, and strip out anything that could potentially be malicious (script tags). Lastly, we pass FILTER_FLAG_STRIP_HIGH which is optional. FILTER_FLAG_STRIP_HIGH strips bytes from the variable that have a numerical value greater than 127. Characters with bytes with values greater than 127 are non-ascii characters. You should be very careful with this flag especially when dealing with names. I only mention it here to demonstrate how finely you are able to sanitize user input.

<div class="container">
<?php
if(isset($_POST['name'])){
    //form has been submitted
    $salutation = filter_var($_POST['salutation'],
        FILTER_SANITIZE_STRING,
        FILTER_FLAG_STRIP_HIGH);
    $name = filter_var($_POST['name'],
        FILTER_SANITIZE_STRING,
        FILTER_FLAG_STRIP_HIGH);
    $greeting = 'Hello ' . $salutation . ' ' . $name;
    echo $greeting;
} else {
    ?>
    <h1>Sanitizing Data</h1>
    <form name="basic_form" method="POST" action="#">
        <label>Salutation:
            <select name="salutation" id="salutation">
                <option value="Miss">Miss</option>
                <option value="Mrs.">Mrs.</option>
                <option value="Ms.">Ms.</option>
                <option value="Mr.">Mr.</option>
                <option value="Dr.">Dr.</option>
            </select>
        </label>
        <label>Name:
        <input type="text" name="name" id="name" value="">
        </label>
        <label>
        <input type="submit" id="submit" value="Submit">
        </label>
    </form>
    <?php
}
?>
</div> <!-- /container -->

Listing 5

Wait a minute! Why are we sanitizing the salutation variable? We specified that in the form so it must be safe. Wrong! Just because the user can't freely edit the input as they are able to in an input box or text area does not mean the data is safe. Malicious users will often modify your form and try to submit it with altered values to try to catch you off guard. Remember you cannot trust anything from the user. Always sanitize everything from the user. This includes checkboxes and any other form field types you may be using.

Validating User Input

Imagine we have a form with multiple input boxes for name, age, email, and a select box for salutation. An example of how we could validate all these fields can be seen in Listing 3.

//form has been submitted
$salutation = filter_var($_POST['salutation'],
    FILTER_SANITIZE_STRING);
$name = filter_var($_POST['name'], FILTER_SANITIZE_STRING);
$age = filter_var($_POST['age'],
    FILTER_SANITIZE_NUMBER_INT);
$age_filter = filter_var($age, FILTER_VALIDATE_INT,
  array('options'=>array(
    'min_range'=>'13',
    'max_range'=>'110'
    )));
if($age_filter){
    $age_message = 'You are ' . $age . ' years old.<br />';
} else {
    $age_message = "We don't know how old you are.<br />";
}
$email = filter_var($_POST['email'], FILTER_SANITIZE_EMAIL);
$filter_email = filter_var($email,FILTER_VALIDATE_EMAIL);
if($filter_email){
    $email_message = 'Your email is ' . $email . '<br />';
} else {
    $email_message = "We don't know your email.<br />";
}
$greeting = 'Hello ' . $salutation . ' ' . $name . '<br />';

echo $greeting;
echo $age_message;
echo $email_message;

Listing 3

As previously mentioned you should validate your data to the farthest extent you can. Name fields are one of those fields that are very difficult to validate. I often skip validating these fields entirely since I don't feel I should tell a user what their name should be. The fields in Listing 3 that we want to validate are the age and email fields. We want to ensure the age is within 13 to 110. We want to make sure the email appears to be a valid email comprised of a username, the @ symbol, and a top-level domain. We also want to ensure we are sanitizing all of the input as well.

Normally you can check if a number is within a rage by simply writing an if statement to see if the variable is greater than or lesser than your expected range. With filter_var() you can pass an options array to the FILTER_VALIDATE_INT flag and have this check done for you. The options array should have min_rage and max_rage as keys with the values of your range. If this returns anything other than false, we can safely assume that the number was within our supplied range. You can see this in Listing 3 lines 7-16.

Validating an email address is just as easy. You pass the FILTER_VALIDATE_EMAIL and if anything other than false is returned then the email is formatted correctly. An important note is that this does not mean the email is actually a working email address. You should still send out a test message if you require a functional email.

Another important note is filter_var() returns back the filtered data if the input passes the flag or flags sent with it. If something does not pass filter_var() returns false.

Our data is Validated and Sanitized. What do we do with it now?

The hard part is over. The input is free of any potentially malicious code and is in a format that we expect it to be in. You can now pass the data long to the next part of your application.

Remember that bad idea?

Listing 1 was terrifyingly terrible code. Listing 4 would be one way to clean up the mess and take advantage of some of the tools we have previously used.

<?php
include("../includes/fmGlobals.php");
$id = htmlentities($_REQUEST["id"]);
$userid = htmlentities($_REQUEST["userid"]);
$unixtime = time();
$date = date('d-m-Y');
$localtime = date('G:H:s');
$dbh = new PDO('mysql:host=localhost;dbname=test',
    $user,
    $pass);

$stmt = $dbh->prepare("INSERT INTO fmDownloads (DocID,
    UserID, DLUnix, DLDate, DLTime)
                        VALUES (:DocID, :UserID, :DLUnix,
                        :DLDate, :DLTime)");

$params = array(':DocID' => $id,
                ':UserID' => $userid,
                ':DLUnix' => $unixtime,
                ':DLDate' => $date,
                ':DLTime' => $localtime);
$stmt->execute($params);

$stmt2 = $dbh->prepare("SELECT DocFile FROM fmDocuments
    WHERE ID = :id");
$params2 = array(':DocID' => $id);
$stmt2->execute($params2);
$data = $stmt2->fetchAll();

$redirect = "Location: ../documents/"
    . trim($data['0']["DocFile"]);
header($redirect);

Listing 4

The most significant change here that we have not previously explored is the code is now using PDO to handle the database connection. In addition to PDO we are using a prepared statement to insert the data into the database. Prepared statements are like a template that you can pass variables as parameters to. If you are familiar with any of the PHP template engines, prepared statements are very much the same thing for your SQL statements.

Line 12 of Listing 4 shows how we can assign placeholders in our SQL statement and line 17 shows how we relate those placeholders to our actual data. When PDO executes it looks at the $params array and substitutes the placeholders for the real data.

Prepared statements are such an incredibly useful tool that PDO will emulate this functionality if the database driver itself does not support the use of prepared statements. The query can be run multiple times with different parameters without having to run the prepare() method each time. Using prepared statements helps prevent against SQL injection by not manually needing to quote the parameters, PDO handles this automatically for you during prepare().

Recap

It is important to remember that there are always different ways to do things. This should not be hard rule for how you should sanitize your data. All of the examples here have been broken down to the bare minimums to help out new comers and old veterans alike. Most modern frameworks will already do a lot of this validation and sanitization for you. Make sure you consult the documentation to ensure you know where and how the framework is handling this for you.

Warning:

This post content may not be current, please double check the official documentation as needed.

This post may also be in an unedited form with grammatical or spelling mistakes, purchase the August 2014 issue from http://phparch.com for the professionally edited version.