Sunday, December 20, 2009

PHP coding guidelines

PHP coding guidelines

The guidelines that I follow when writing my PHP scripts; can be helpful to have something like this if you're working on a joint project.


N.B. These are only the guidelines that I personally chose to follow for the code I write. This is not an indication that any other coding styles, guidelines or standards are wrong. Feel free to use this document as a template to manage your own coding guideline and change whatever you wish as you see fit.


Why are guidelines important?


First of all, let's make one point crystal clear: it doesn't matter what your guidelines are, so long as everyone understands and sticks to them. I've worked on a team where the person in charge preferred to put braces following the expression, rather than on a line all by themselves. Whilst I didn't necessarily agree with this convention, I stuck to it because it made maintaining the whole project much easier.


It cannot be emphasised enough that guidelines are only useful if they are followed. It's no use having a definitive set of coding guidelines for a joint programming project if everyone continues to write in their own style regardless. It is arguable that you can get away with different coding styles if every team member works on a different section which is encapsulated and therefore their coding style doesn't affect the other developers. Unfortunately, this only holds true until a developer leaves and someone else has to take over their role.


If you are running a joint project, you might consider putting your foot down and basically refuse to accept any code that does not conform to your published guidelines, regardless of its technical merit (or lack thereof). This may sound somewhat draconian and off-putting to developers at first, but once everyone begins to code to the guidelines you'll find it a lot easier to manage the project and you'll get more work done in the same time. It will require a lot of effort from some of your developers who don't want to abandon their coding habits, but at the end of the day different coding styles will cause more problems than they're worth.


Editor settings



Tabs v. spaces


Ahh, the endless debate of tabs v. spaces. To be honest, I recommend using tabs all the time for the simple reasons that it is much easier to align everything using them, and tabs require less space in a file than multiple spaces. Also, if you set your editor to display a tab character as 8 spaces, but another developer prefers to have them represented as 4 spaces, it doesn't matter in the slightest.


A mention needs to be made of the PEAR project, who insist on using four spaces instead of tabs in their guidelines. If you want to write code for them, just do it in tabs and then do a quick search/replace to change them all to four spaces.


Linefeeds


The three major operating systems (Unix, Windows and Mac OS) use different ways to represent the end of a line. Unix systems use the newline character (\n), Mac systems use a carriage return (\r), and Windows systems are terribly wasteful in that they use a carriage return followed by a line feed (\r\n).



I use simple newlines all the time, because the Windows way just doubles the size of your line breaks and the Mac OS way is technically incorrect in that a carriage return should only return you to the beginning of the line, ala the old typewriter systems.


If you develop on Windows (and many people do), either set up your editor to save files in Unix format or run a utility that converts between the two file formats.


Naming conventions


Variable names


A lot of textbooks (particulary those about Visual C++) will try to drum hungarian notation into your head. Basically, this means having rules such as pre-pending g_ to global variables, i to integer data types etc. Not only is a lot of this irrelevant to PHP (being a typeless language), it also produces variable names such as g_iPersonAge which, to be honest, are not easy to read at a glance and often end up looking like a group of random characters strung together without rhyme or reason.


Variable names should be all lowercase, with words separated by underscores. For example, $current_user is correct, but $currentuser, $currentUser and $CurrentUser are not.



Names should be descriptive, but also concise. Wherever possible, keep variable names to under 15 characters, although be prepared to sacrifice a few extra characters to improve clarity. There's no hard and fast rule when it comes to the length of a variable name, so just try and be as concise as possible without affecting clarity too much. Generally speaking, the smaller the scope of a variable, the more concise you should be, so global variables will usually have the longest names (relative to all others) whereas variables local to a loop might have names consisting only of a single character.


Constants should follow the same conventions as variables, except use all uppercase to distinguish them from variables. So USER_ACTIVE_LEVEL is correct, but USERACTIVELEVEL or user_active_level would be incorrect.


Loop indices


This is the only occassion where short variable names (as small as one character in length) are permitted, and indeed encouraged. Unless you already have a specific counting variable, use $i as the variable for the outermost loop, then go onto $j for the next most outermost loop etc. However, do not use the variable $l (lowercase 'L') in any of your code as it looks too much like the number 'one'.



Example of nested loops using this convention:



<?php
for ( $i = 0; $i < 5; $i++ )
{
    for (
$j = 0; $j < 4; $j++ )
    {
        for (
$k = 0; $k < 3; $k++ )
        {
            for (
$m = 0; $m < 2; $m++ )
            {
                
foo($i, $j, $k, $m);
            }
        }
    }
}
?>



If, for some reason, you end up nesting loops so deeply that you get to $z, consider re-writing your code. I've written programs (in Visual Basic, for my sins) with loops nested four levels deep and they were complicated enough. If you use these guidelines in a joint project, you may way to impose an additional rule that states a maximum nesting of x levels for loops and perhaps for other constructs too.


Function names


Function names should follow the same guidelines as variable names, although they should include a verb somewhere if at all possible. Examples include get_user_data() and validate_form_data(). Basically, make it as obvious as possible what the function does from its name, whilst remaining reasonably concise. For example, a_function_which_gets_user_data_from_a_file() would not be appropriate!



Function arguments


Since function arguments are just variables used in a specific context, they should follow the same guidelines as variable names.


It should be possible to tell the main purpose of a function just by looking at the first line, e.g. get_user_data($username). By examination, you can make a good guess that this function gets the user data of a user with the username passed in the $username argument.


Function arguments should be separated by spaces, both when the function is defined and when it is called. However, there should not be any spaces between the arguments and the opening/closing brackets.


Some examples of correct/incorrect ways to write functions:



<?php
get_user_data
( $username, $password ); // incorrect: spaces next to brackets
get_user_data($username,$password); // incorrect: no spaces between arguments
get_user_data($a, $b); // ambiguous: what do variables $a and $b hold?
get_user_data($username, $password); // correct
?>



Code layout


Including braces


Braces should always be included when writing code using if, for, while etc. blocks. There are no exceptions to this rule, even if the braces could be omitted. Leaving out braces makes code harder to maintain in the future and can also cause bugs that are very difficult to track down.



Some examples of correct/incorrect ways to write code blocks using braces:



<?php
/* These are all incorrect */

if ( condition ) foo();

if (
condition )
    
foo();

while (
condition )
    
foo();

for (
$i = 0; $i < 10; $i++ )
    
foo($i);

/* These are all correct */

if ( condition )
{
    
foo();
}

while (
condition )
{
    
foo();
}

for (
$i = 0; $i < 10; $i++ )
{
    
foo($i);
}
?>



Where to put the braces


Braces should always be placed on a line on their own; again there are no exceptions to this rule. Braces should also align properly (use tabs to achieve this) so a closing brace is always in the same column as the corresponding opening brace. For example:



<?php
if ( condition )
{
    while (
condition )
    {
        
foo();
    }
}
?>



I know that a lot of programmers prefer to put the first brace on the first line of the block they are encoding, to prevent wasting a line. However, I strongly disagree with this as it makes code much easier to read (in my opinion) if braces line up correctly.


Spaces between tokens


There should always be one space on either side of a token in expressions, statements etc. The only exceptions are commas (which should have one space after, but none before), semi-colons (which should not have spaces on either side if they are at the end of a line, and one space after otherwise). Functions should follow the rules laid out already, i.e. no spaces between the function name and the opening bracket and no space between the brackets and the arguments, but one space between each argument.


Control statements such as if, for, while etc. should have one space on either side of the opening bracket, and one space before the closing bracket. However, individual conditions inside these brackets (e.g. ($i < 9) || ($i > 16)) should not have spaces between their conditions and their opening/closing brackets.



In these examples, each pair shows the incorrect way followed by the correct way:



<?php
$i
=0;
$i = 0;

if((
$i<2 )||( $i>5 ))
if ( (
$i < 2) || ($i > 5) )

foo ( $a,$b,$c )
foo($a, $b, $c)

$i=($j<5)?$j:5
$i
= ($j < 5) ? $j : 5
?>




Operator precedence


I doubt very much that any developer knows the exact precedence of all the operators in PHP. Even if you think you know the order, don't guess because chances are you'll get it wrong and cause an unexpected bug that will be very difficult to find. Also, it will make maintaining your program a living nightmare for anyone who doesn't know the precedence tables in so much depth. Always use brackets to make it absolutely clear what you are doing.



<?php
$i
= $j < 5 || $k > 6 && $m == 9 || $n != 10 ? 1 : 2; // What *is* going on here?!?
$i = ( (($j < 5) || $k > 6)) && (($m == 9) || ($n != 10)) ) ? 1 : 2; // Much clearer
?>



N.B. If you are using expressions like the one above you should split them up into smaller chunks if possible, even if you are already laying them out in the correct format. No-one should have to debug code like that, there's too many conditions and logic operators.


SQL code layout


When writing SQL queries, capitialise all SQL keywords (SELECT, FROM, VALUES, AS etc.) and leave everything else in the relevant case. If you are using WHERE clauses to return data corresponding to a set of conditions, enclose those conditions in brackets in the same way you would for PHP if blocks, e.g. SELECT * FROM users WHERE ( (registered = 'y') AND ((user_level = 'administrator') OR (user_level = 'moderator')) ).



General guidelines


Quoting strings


Strings in PHP can either be quoted with single quotes ('') or double quotes (""). The difference between the two is that the parser will use variable-interpolation in double-quoted strings, but not with single-quoted strings. So if your string contains no variables, use single quotes and save the parser the trouble of attempting to interpolate the string for variables, like so:



<?php
$str
= "Avoid this - it just makes more work for the parser."; // Double quotes
$str = 'This is much better.' // Single quotes
?>



Likewise, if you are passing a variable to a function, there is no need to use double quotes:



<?php
foo
("$bar"); // No need to use double quotes
foo($bar); // Much better
?>



Finally, when using associative arrays, you should include the key within single quotes to prevent any ambiguities, especially with constants:



<?php
$foo
= $bar[example]; // Wrong: what happens if 'example' is defined as a constant elsewhere?
$foo = $bar['example']; // Correct: no ambiguity as to the name of the key
?>



However, if you are accessing an array with a key that is stored in a variable, you can simply use:



<?php
$foo
= $bar[$example];
?>


Shortcut operators



The shortcut operators ++ and -- should always be used on a line of their own, with the exception of for loops. Failure to do this can cause obscure bugs that are incredibly difficult to track down. For example:



<?php
$foo
[$i++] = $j; // Wrong: relies on $i being incremented after the expression is evaluated
$foo[--$j] = $i; // Wrong: relies on $j being decremented before the expression is evaluated
$foo[$i] = $j;
$i++; // Correct: obvious when $i is incremented

$j--;
$foo[$j] = $i; // Correct: obvious when $j is decremented
?>



Optional shortcut constructs


As well as the useful increment and decrement shortcuts, there are two other ways in which you can make your PHP code easier to use. The first is to replace if statements where you are assigning one of two values to a variable based on a conditional. You may be tempted to write something like this:



<?php
if ( isset($_POST['username']) )
{
    
$username = $_POST['username'];
}
else
{
    
$username = '';
}

if ( isset(
$_POST['password']) )
{
    
$password = md5($_POST['password']);
}
else
{
    
$password = '';
}
?>



Whilst the above code works and makes it obvious what you are doing, it's not the easiest or clearest way if you want to run through a list of different variables and do a similar thing to all of them. A more compact way would be to use the ternary operator ? : like so:



<?php
$username
= isset($_POST['username']) ? $_POST['username'] : '';
$password = isset($_POST['password']) ? md5($_POST['password']) : '';
?>



I would recommend using the latter notation wherever you are checking assigning a number of variables one of two values depending on a boolean expression, simply because it makes the code easier to scan and also makes it obvious what you are doing without being unnecessarily verbose.


Use constants where possible


If a value is not going to change throughout the execution of your script, then use a constant rather than a variable to define it. That way, if you do change the value by accident, the PHP parser will output an error and allow you to fix the problem, without it causing any unforeseen side effects.


Remember that constants should never be enclosed in strings, either single or double. You must always use concatenation if you wish to include a constant's value in a string.


Turn on all error reporting


A lot of code I've downloaded from the web and tried to use has failed on my machines because the developers switched off the E_NOTICE flag in their PHP configuration for some reason or another. As soon as I bring it onto my system, where error reporting is set to E_ALL (i.e. show all errors, including references to variables being used without being initialised), the PHP interpreter spews out dozens of error messages which I then have to fix before I can proceed to use the script.



What you need to remember as a developer is that the person who uses your script may not have exactly the same php.ini configuration as you so you aim for the worst possible case, i.e. when some awkard person like me has all error reporting enabled. If your code works with E_ALL set, then it will also work with any other error reporting configuration, including when all error reporting is turned off (e.g. on sites where PHP errors are logged to a file instead).


It's not impossible to get scripts to work with all error reporting turned on, because I've managed it for all the code on all the sites I've ever written PHP code for. Admittedly I've not written any major projects such as a CMS, but if you write proper object oreintated code and make sure that each class doesn't produce any errors, you should be able to make your program as a whole run without errors under normal circumstances.


Of course, on a production site you might want to turn off all errors, or at least redirect them to a file, to avoid admitting to users that your scripts are broken in some way (we know that scripts can't handle every possible error, but users often don't see things this way!). That's perfectly fine and in many cases the recommended action to take. So long as your scripts work with all error reporting turned on, it doesn't matter where they are deployed.