Phptidy

From ThorstensHome
Jump to: navigation, search

phptidy is a great utility to re-format your php code. It is open source, so I can do my hobby programming on it. Don't be confused if I say "hacking" instead of "programming" :)

Contents

tokenizing

If you want to program a compiler, or understand how phptidy does its work, you will need to understand the process of parsing, tokenizing and lexing. phptidy is a wrapper around the php tokenizer that is built into php (token_get_all).

As a short example, let's look at the following php program:

<?php
eCho "hello world";
?>

In order to execute this, the php interpreter needs to understand that echo is the echo command, not a comment and not something to output. See it here:

# php phptidy.php tokens hello.php
Running without configuration file
Find functions and includes .
Process files
 hello.php
T_OPEN_TAG <?php\n
T_ECHO eCho
T_WHITESPACE \
T_CONSTANT_ENCAPSED_STRING "hello\ world"
;
T_WHITESPACE \n
T_CLOSE_TAG ?>\n

You see, the source code is not strictly divided into words, but into tokens. "hello world" forms one token, echo forms one, and the semicolon as well.

So php knows internally

  • which strings belong together (eCho is separated by spaces, "hello world" by quotation marks)
  • how the echo command is written (the strange "eCho")
  • that "eCho" is a token of the type T_ECHO (a first-class citizen in php), a command to output

Every token that is not trivial as a semicolon gets a number how it is understood. This number is a constant with a name, e.g. T_ECHO or T_CONSTANT_ENCAPSED_STRING. And it gets a value, the information what it contains. This can be the actual string or "eCho", "ecHo" or "echo".


Now comes the cool part, we modify the code slightly and see that php understands it differently:

# cat hello.php
<?php
//echo "hello world";
?>
# php phptidy.php tokens hello.php
Running without configuration file
Find functions and includes .
Process files
 hello.php
T_OPEN_TAG <?php\n
T_COMMENT //echo\ "hello\ world";\n
T_CLOSE_TAG ?>\n

Beautiful, isn't it? You see that echo is now part of a token that is understood as a comment (T_COMMENT). And this is exactly what the php code says.

Call graphs for phptidy

phptidy() -(calls)-> combine_tokens()

phptidy() is called several times till the source is consistent or a counter is reached.

Know How

find newlines

How to find a newline that phptidy outputs:

function combine_tokens($tokens) {
  $out = "";
  foreach ( $tokens as $key => $token ) {
  if (strpos($token[1], "\n")!==false) $out.="new";

add a token

Here is an example how you can add a token using phptidy's combine_tokens function. I add a docblock directly after the initial

<?php

token:

function combine_tokens($tokens) {
  $out = "";
  $err = "";
  $mytoken=array(366,"/**tstaerk was here*/");
  array_splice($tokens,1,0,$mytoken);
  foreach ( $tokens as $key => $token ) {

Specific patches

wrap lines

I want phptidy to be able to wrap lines that get too long. It will look like this:

# cat little.php
<?php
echo "123456789012345678901234567890123456789012345678901234567890"
."123456789012345678901234567890";
?>
# php phptidy.php tokens little.php
Running without configuration file
Find functions and includes .
Process files
 little.php
T_OPEN_TAG <?php\n
T_ECHO echo
T_WHITESPACE \
T_CONSTANT_ENCAPSED_STRING "123456789012345678901234567890123456789012345678901234567890"
T_WHITESPACE \n
.
T_CONSTANT_ENCAPSED_STRING "123456789012345678901234567890"
;
T_WHITESPACE \n
T_CLOSE_TAG ?>\n

See also