Quantcast
Channel: Matthew Daly
Viewing all 158 articles
Browse latest View live

Put your Laravel controllers on a diet

$
0
0

MVC frameworks are a tremendously useful tool for modern web development. They offer easy ways to carry out common tasks, and enforce a certain amount of structure on a project.

However, that doesn’t mean using them makes you immune to bad practices, and it’s quite easy to wind up falling into certain anti-patterns. Probably the most common is the Fat Controller.

What is a fat controller?

When I first started out doing professional web development, CodeIgniter 2 was the first MVC framework I used. While I hadn’t used it before, I was familiar with the general concept of MVC. However, I didn’t appreciate that when referring to the model layer as a place for business logic, that wasn’t necessarily the same thing as the database models.

As such, my controllers became a dumping ground for anything that didn’t fit into the models. If it didn’t interact with the database, I put it in the controller. They quickly became bloated, with many methods running to hundreds of lines of code. The code base became hard to understand, and when I was under the gun on projects I found myself copying and pasting functionality between controllers, making the situation even worse. Not having an experienced senior developer on hand to offer criticism and advice, it was a long time before I realised that this was a problem or how to avoid it.

Why are fat controllers bad?

Controllers are meant to be simple glue code that receives requests and returns responses. Anything else should be handed off to the model layer. As noted above, however, that’s not the same as putting it in the models. Your model layer can consist of many different classes, not just your Eloquent models, and you should not fall into the trap of thinking your application should consist of little more than models, views and controllers.

Placing business logic in controllers can be bad for many reasons:

  • Code in controllers can be difficult to write automated tests for
  • Any logic in a controller method may need to be repeated elsewhere if the same functionality is needed for a different route, unless it’s in a private or protected method that is called from elsewhere, in which case it’s very hard to test in isolation
  • Placing it in the controller makes it difficult to pull out and re-use on a different project
  • Making your controller methods too large makes them complex and hard to follow

As a general rule of thumb, I find that 10 lines of code for any one method for a controller is where it starts getting a bit much. That said, it’s not a hard and fast rule, and for very small projects it may not be worthwhile. But if a project is large and needs to be maintained for any reasonable period of time, you should take the trouble to ensure your controllers are as skinny as is practical.

Nowadays Laravel is my go-to framework and I’ve put together a number of strategies for avoiding the fat controller anti-pattern. Here’s some examples of how I would move various parts of the application out of my controllers.

Validation

Laravel has a nice, easy way of getting validation out of the controllers. Just create a custom form request for your input data, as in this example:

<?php
namespace App\Http\Requests;
use Illuminate\Foundation\Http\FormRequest;
class CreateRequest extends FormRequest
{
/**
* Determine if the user is authorized to make this request.
*
* @return bool
*/
public function authorize()
{
return true;
}
/**
* Get the validation rules that apply to the request.
*
* @return array
*/
public function rules()
{
return [
'email' => 'required|email'
];
}
}

Then type-hint the form request in the controller method, instead of Illuminate\Http\Request:

<?php
namespace App\Http\Controllers;
use App\Http\Requests\CreateRequest;
class HomeController extends Controller
{
public function store(CreateRequest $request)
{
// Process request here..
}
}

Database access and caching

For non-trivial applications I normally use decorated repositories to handle caching and database access in one place. That way my caching and database layers are abstracted out into separate classes, and caching is nearly always handled seamlessly without having to do much work.

Complex object creation logic

If I have a form or API endpoint that needs to:

  • Create more than one object
  • Transform the incoming data in some way
  • Or is non-trivial in any other way

I will typically pull it out into a separate persister class. First, you should create an interface for this persister class:

<?php
namespace App\Contracts\Persisters;
use Illuminate\Database\Eloquent\Model;
interface Foo
{
/**
* Create a new Model
*
* @param array $data
* @return Model
*/
public function create(array $data);
/**
* Update the given Model
*
* @param array $data
* @param Model $model
* @return Model
*/
public function update(array $data, Model $model);
}

Then create the persister class itself:

<?php
namespace App\Persisters;
use Illuminate\Database\Eloquent\Model;
use App\Contracts\Repositories\Foo as Repository;
use App\Contracts\Persisters\Foo as FooContract;
use Illuminate\Database\DatabaseManager;
use Carbon\Carbon;
class Foo implements FooContract
{
protected $repository;
protected $db;
public function __construct(DatabaseManager $db, Repository $repository)
{
$this->db = $db;
$this->repository = $repository;
}
/**
* Create a new Model
*
* @param array $data
* @return Model
*/
public function create(array $data)
{
$this->db->beginTransaction();
$model = $this->repository->create([
'date' => Carbon::parse($data['date'])->toDayDateTimeString(),
]);
$this->db->commit();
return $model;
}
/**
* Update the given Model
*
* @param array $data
* @param Model $model
* @return Model
*/
public function update(array $data, Model $model)
{
$this->db->beginTransaction();
$updatedmodel = $this->repository->update([
'date' => Carbon::parse($data['date'])->toDayDateTimeString(),
$model
]);
$this->db->commit();
return $updatedmodel;
}
}

Then you can set up the persister in a service provider so that type-hinting the interface returns the persister:

<?php
namespace App\Providers;
use Illuminate\Support\ServiceProvider;
class AppServiceProvider extends ServiceProvider
{
/**
* Bootstrap any application services.
*
* @return void
*/
public function boot()
{
//
}
/**
* Register any application services.
*
* @return void
*/
public function register()
{
$this->app->bind(
'App\Contracts\Persisters\Foo',
'App\Persisters\Foo',
});
}
}

This approach means that complex logic, such as creating multiple related objects, can be handled in a consistent way, even if it needs to be called from multiple places.

Triggering actions as a result of something

Events are tailor-made for this use case, and Laravel documents them very well, so I won’t repeat it here. Suffice to say, if something needs to happen, but the response sent by the application doesn’t necessarily depend on it returning something immediately, then it’s probably worth considering making it an event. If it’s going to be called from multiple places, it’s even more worthwhile.

For instance, if you have a contact form, it’s worth taking the time to create an event for when a new contact is received, and handle proessing the contact within the listener for that event. Also, doing so means you can queue that event and have it handled outside the context of the application, so that it responds to the user more quickly. If you’re sending an acknowledgement email for a new user registration, you don’t need to wait for that email to be sent before you return the response, so queueing it can improve response times.

Interacting with third-party services

If you have some code that needs to interact with a third-party service or API, it can get quite complex, especially if you need to process the content in some way. It therefore makes sense to pull that functionality out into a separate class.

For instance, say you have some code in your controller that uses an HTTP client to fetch some data from a third-party API and display it in the view:

public function index(Request $request)
{
$data = $this->client->get('http://api.com/api/items');
$items = [];
foreach ($data as $k => $v) {
$item = [
'name' => $v['name'],
'description' => $v['data']['description'],
'tags' => $v['data']['metadata']['tags']
];
$items[] = $item;
}
return view('template', [
'items' => $items
]);
}

This is a very small example (and a lot simpler than most real-world instances of this issue), but it illustrates the principle. Not only does this code bloat the controller, it might also be used elsewhere in the application, and we don’t want to copy and paste it elsewhere - therefore it makes sense to extract it to a service class.

<?php
namespace App\Services
use GuzzleHttp\ClientInterface as GuzzleClient;
class Api
{
protected $client;
public function __construct(GuzzleClient $client)
{
$this->client = $client;
}
public function fetch()
{
$data = $this->client->get('http://api.com/api/items');
$items = [];
foreach ($data as $k => $v) {
$item = [
'name' => $v['name'],
'description' => $v['data']['description'],
'tags' => $v['data']['metadata']['tags']
];
$items[] = $item;
}
return $items;
}
}

Our controller can then type-hint the service and refactor that functionality out of the method:

public function __construct(App\Services\Api $api)
{
$this->api = $api;
}
public function index(Request $request)
{
$items = $this->api->fetch();
return view('template', [
'items' => $items
]);
}

Including common variables in the view

If data is needed in more than one view (eg show the user’s name on every page when logged in), consider using view composers to retrieve this data rather than fetching them in the controller. That way you’re not having to repeat that logic in more than one place.

Formatting content for display

Logically this belongs in the view layer, so you should write a helper to handle things like formatting dates. For more complex stuff, such as formatting HTML, you should be doing this in Blade (or another templating system, if you’re using one) - for instance, when generating an HTML table, you should consider using a view partial to loop through them. For particularly tricky functionality, you have the option of writing a custom Blade directive.

The same applies for rendering other content - for rendering JSON you should consider using API resources or Fractal to get any non-trivial logic for your API responses out of the controller. Blade templates can also work for non-HTML content such as XML.

Anything else…

These examples are largely to get you started, and there will be occasions where something doesn’t fall into any of the above categories. However, the same principle applies. Your controllers should stick to just receiving requests and sending responses, and anything else should normally be deferred to other classes.

Fat controllers make developer’s lives very difficult, and if you want your code base to be easily maintainable, you should be willing to refactor them ruthlessly. Any functionality you can pull out of the controller becomes easier to reuse and test, and as long as you name your classes and methods sensibly, they’re easier to understand.


Unit testing your Laravel controllers

$
0
0

In my previous post I mentioned some strategies for refactoring Laravel controllers to move unnecessary functionality elsewhere. However, I didn’t cover testing them. In this post I will demonstrate the methodology I use for testing Laravel controllers.

Say we have the following method in a controller:

public function store(Request $request)
{
$document = new Document($request->only([
'title',
'text',
]));
$document->save();
event(new DocumentCreated($document));
return redirect()->route('/');
}

This controller method does three things:

  • Return a response
  • Create a model instance
  • Fire an event

Our tests therefore need to pass it all its external dependencies and check it carries out the required actions.

First we fake the event facade:

    Event::fake();

Next, we create an instance of Illuminate\Http\Request to represent the HTTP request passed to the controller:

$request = Request::create('/store', 'POST',[
'title' => 'foo',
'text' => 'bar',
]);

If you’re using a custom form request class, you should instantiate that in exactly the same way.

Then, instantiate the controller, and call the method, passing it the request object:

$controller = new MyController();
$response = $controller->store($request);

You can then test the response from the controller. You can test the status code like this:

    $this->assertEquals(302, $response->getStatusCode());

You may also need to check the content of the response matches what you expect to see, by retrieving $response->getBody()->getContent().

Next, retrieve the newly created model instance, and verify it exists:

$document = Document::where('title', 'foo')->first();
$this->assertNotNull($document);

You can also use assertEquals() to check the attributes on the model if appropriate. Finally, you check the event was fired:

Event::assertDispatched(DocumentCreated::class, function ($event) use ($document) {
return $event->document->id === $document->id;
});

This test should not concern itself with any functionality triggered by the event, only that the event gets triggered. The event should have separate unit tests in which the event is triggered, and then the test verifies it carried out the required actions.

Technically, these don’t quite qualify as being unit tests because they hit the database, but they should cover the controller adequately. To make them true unit tests, you’d need to implement the repository pattern for the database queries rather than using Eloquent directly, and mock the repository, so you can assert that the mocked repository receive the right data and have it return the expected response.

Here is how you might do that with Mockery:

$mock = Mockery::mock('App\Contracts\Repositories\Document');
$mock->shouldReceive('create')->with([
'title' => 'foo',
'text' => 'bar',
])->once()->andReturn(true);
$controller = new MyController($mock);

As long as your controllers are kept as small as possible, it’s generally not too hard to test them. Unfortunately, fat controllers become almost impossible to test, which is another good reason to avoid them.

Check your code base is PHP 7 ready with PHP Compatibility

$
0
0

I’ve recently started a new job and as part of that I’m working on a rather substantial legacy code base. In fact, it was so legacy that it was still in Subversion - needless to say the very first thing I did was migrate it to Git. One of the jobs on our radar for this code base is to migrate it to from PHP 5.4 to 5.6, and subsequently to PHP 7. I’ve been using it locally in 5.6 without issue so far, but I’ve also been looking around for an automated tool to help catch potential problems.

I recently discovered PHP Compatibility which is a set of sniffs for PHP CodeSniffer that can be used to detect code that will be problematic in a particular PHP version. As I use CodeSniffer extensively already, it’s a good fit for my existing toolset.

To install it, add the following dependencies to your composer.json:

"require-dev": {
"dealerdirect/phpcodesniffer-composer-installer": "^0.4.3",
"squizlabs/php_codesniffer": "^2.5",
"wimg/php-compatibility": "^8.1"
},

Then update your phpcs.xml to look something like this:

<ruleset name="PHP_CodeSniffer">
<description>The coding standard for my app.</description>
<file>./</file>
<arg value="np"/>
<rule ref="PSR2"/>
<rule ref="PHPCompatibility"/>
<config name="testVersion" value="7.2-"/>
</ruleset>

As you can see, it’s possible to use it alongside existing coding standards such as PSR2. Note the testVersion config key - the value specified is the PHP version we’re testing against. Here we’re specifying PHP 7.2.

Obviously, the very best way to guard against breakages in newer versions of PHP is to have a comprehensive test suite, but legacy code bases by definition tend to have little or no tests. By using PHP Compatibility, you should at least be able to catch syntax problems without having to audit the code base manually.

Using stored procedures in your web app

$
0
0

In the last few days I’ve done something I’ve never done before, namely written a stored procedure for a web app. Like most web developers, I know enough about SQL to be able to formulate some fairly complex queries, but I hadn’t really touched on control flow functions or stored procedures, and in my experience they tend to be the province of the dedicated database administrator, not us web devs, who will typically delegate more complex functionality to our application code.

In this case, there were a number of factors influencing my decision to use a stored procedure for this:

  • The application was a legacy application which had been worked on by developers of, shall we say, varying skill levels. As a result the database schema was badly designed, with no prospect of changing it without causing huge numbers of breakages
  • The query in question was used to generate a complex report that was quite time-consuming, therefore the optimisations from using a stored procedure were worthwhile.
  • The report required that data be grouped by a set of categories which were stored in a separate table, which meant the table had to be pivoted (transformed from rows to columns), resulting in an incredibly complex dynamic query that had to be constructed on the fly by concatenating different SQL strings. In PostgreSQL, this can be done fairly easily using the crosstab function, but MySQL doesn’t have native support for anything like this.

Historically, one issue with using stored procedures has been that it kept business logic out of the application code, meaning they are not stored in version control. However, most modern frameworks provide some support for migrations, and since they are intended to be used to make changes to the database, they are the obvious place to define the stored procedure. This particular application was built with an older framework that didn’t come with migrations, so we’d installed Phinx to handle those for us. Initially, I defined the stored procedure inside a migration that ran a raw query to create the stored procedure, as in this example:

public function up()
{
$query = <<<EOF
CREATE PROCEDURE IF NOT EXISTS foo
BEGIN
SELECT * FROM foo;
END
EOF;
$this->execute($query);
}
public function down()
{
$this->execute('DROP PROCEDURE IF EXISTS foo');
}

Once this is done, you can then use your framework’s particular support for raw queries to call CALL foo() whenever your stored procedure needs to be executed.

However, we soon ran into an issue. It turns out mysqldump doesn’t export stored procedures by default, so there was a risk that anyone working on the code base might import the database from an SQL file and not get the migrations. I’d used the Symfony Console component to create a simple command-line tool, reminiscent of Laravel’s Artisan, so I used that to create a command to set up the stored procedure, amended the migration to call that command, and placed a check in the application where the procedure was called so that if it was not defined the command would be called and the procedure would be created. In most cases this wouldn’t be an issue.

Having now had experience using stored procedures in a web application, there are a number of issues they raise:

  • It’s hard to make queries flexible, whereas with something like Eloquent it’s straightforward to conditionally apply WHERE statements.
  • While storing them in migrations is a practical solution, if the database is likely to be imported rather than created from scratch during development it can be problematic.
  • They aren’t easily portable, not just between database management systems, but between different versions - the production server was using an older version of MySQL, and it failed to create the procedure. It’s therefore good practice for your migrations to check the procedure was created successfully and raise a noisy exception if they failed.

Conversely, they do bring certain benefits:

  • For particularly complex transactions that don’t change, such as generating reports, they are a good fit since they reduce the amount of data that needs to be sent to the database and allow the query to be pre-optimised somewhat.
  • If a particular query is unusually expensive, is called often, and can’t be cached, it may improve performance to make it a stored procedure.
  • Doing a query in a for loop is usually a very big no-no. However, if there really is no way to avoid it (and this should almost never happen), it would make sense to try to do it in a stored procedure using SQL rather than in application code since that would minimise the overhead.
  • If multiple applications need to work with the same database, using stored procedures for queries in more than one application removes the need to reimplement or copy over the code for the query in the second application - they can just call the same procedure, and if it needs to be changed it need only be done once.

Honestly, I’m not sure I’m ever likely to again come across a scenario where using a stored procedure in a web application would be beneficial, but it’s been very interesting delving into aspects of SQL that I don’t normally touch on and I’ve picked up on some rarely-used SQL statements that I haven’t used before, such as GROUP_CONCAT() and CASE. With the widespread adoption of migrations in most frameworks, I think that the argument that using stored procedures keeps application logic out of version control holds any water, since developers can generally be trusted to store changes to database structure in their migrations and not start messing them around, so the same applies for stored procedures. Report generation seems to be the ideal use case since this invariably involves complex queries that run regularly and don’t change often, and this is where I expect it would be most likely I’d have cause to use them again.

Making Wordpress less shit

$
0
0

I’m not going to sugarcoat it. As a developer, I think Wordpress is shit, and I’m not alone in that opinion. Its code base dates from a time before many of the developments of the last few years that have hugely improved PHP as a language, as well as the surrounding ecosystem such as Composer and PSR-FIG, and it’s likely it couldn’t adopt many of those without making backward-incompatible changes that would affect its own ecosystem of plugins and themes. It actively forces you to write code that is far less elegant and efficient than what you might write with a proper framework such as Laravel, and the quality of many of the plugins and themes around is dire.

Unfortunately, it’s also difficult to avoid. Over a quarter of all websites run Wordpress, and most developers will have to work with it at some point in their careers. However, there are ways that you can improve your experience when working with Wordpress somewhat. In this post I’m going to share some methods you can use to make Wordpress less painful to use.

This isn’t a post about the obvious things like “Use the most recent version of PHP you can”, “Use SSL”, “Install this plugin”, “Use Vagrant/Lando” etc - I’m assuming you already know stuff like that for bog standard Wordpress development. Nor is it about actually developing Wordpress plugins or themes. Instead, this post is about bringing your Wordpress development workflow more into line with how you develop with MVC frameworks like Laravel, so that you have a better experience working with and maintaining Wordpress sites. We can’t solve the fundamental issues with Wordpress, but we can take some steps to make it easier to work with.

Use Bedrock

Bedrock is still Wordpress, but reorganized so that:

  • The Wordpress core, plugins and themes can be managed with Composer for easier updates
  • The configuration can be done with a .env file that can be kept out of version control, rather than putting it in wp-config.php
  • The web root is isolated to limit access to the files

In short, it optimizes Wordpress for how modern developers work. Arguably that’s at the expense of site owners, since it makes it harder for non-developers to manage the site, however for any Wordpress site that’s sufficiently complex to need development work done that’s a trade-off worth making. I’ve been involved in projects where Wordpress got used alongside an MVC framework for some custom functionality, and in my experience it caused a world of problems when updating plugins and themes because version control would get out of sync, so moving that to use Composer to manage them instead would have been a huge win.

Using Bedrock means that if you have a parent theme you use all the time, or custom plugins of your own, you can install them using Composer by adding the Git repositories to your composer.json, making it easier to re-use functionality you’ve already developed. It also makes recovery easier in the event of the site being compromised, because the files outside the vendor directory will be in version control, and you can delete the vendor directory and re-run composer install to replace the rest. By comparison, with a regular Wordpress install, if it’s compromised you can’t always be certain you’ve got all of the files that have been changed. Also, keeping Wordpress up to date becomes a simple matter of running composer update regularly, verifying it hasn’t broken anything, and then deploying it to production.

Bedrock uses WPackagist, which regularly scans the Wordpress Subversion repository for plugins and themes, so at least for plugins and themes published on the Wordpress site, it’s easy to install them. Paid plugins may be more difficult - I’d be inclined to put those in a private Git repository and install them from there, although I’d be interested to know if anyone else uses another method for that.

If you can’t use Bedrock, use WP CLI

If for any reason you can’t use Bedrock for a site, then have a look at WP CLI. On the server, you can use it to install and manage both plugins and themes, as well as the Wordpress core.

It’s arguably even more useful locally, as it can be used to generate scaffolding for plugins, themes (including child themes based on an existing theme), and components such as custom post types or taxonomies. In short, if you do any non-trivial amount of development with Wordpress you’ll probably find a use for it. Even if you can use Bedrock, you’re likely to find WP CLI handy for the scaffolding.

Upgrade the password encryption

I said this wouldn’t be about using a particular plugin, but this one is too important. Wordpress’s password hashing still relies on MD5, which is far too weak to be considered safe. Unfortunately, Wordpress still supports PHP versions as old as 5.2, and until they drop it they can’t really switch to something more secure.

wp-password-bcrypt overrides the password functionality of Wordpress to use Bcrypt, which is what modern PHP applications use. As a result, the hashes are considerably stronger. Given that Wordpress is a common target for hackers, it’s prudent to ensure your website is as secure as you can possibly make it.

If you use Bedrock, it uses this plugin by default, so it’s already taken care of for you.

Use a proper templating system

PHP is a weird hybrid of a programming language and a templating system. As such, it’s all too easy to wind up with too much logic in your view layer, so it’s a good idea to use a proper templating system if you can. Unfortunately, Wordpress doesn’t support that out of the box.

However, there are some third-party solutions for this. Sage uses Laravel’s Blade templating system (and also comes with Webpack preconfigured), while Timber lets you use Twig.

Use the Wordpress REST API for AJAX where you can

Version 4.7 of Wordpress introduced the Wordpress REST API, allowing the data to be exposed via RESTful endpoints. As a result, it should now be possible to build more complex and powerful user interfaces for that data. For instance, if you were using Wordpress to build a site for listing items for sale, you could create a single-page web app for the front end using React.js and Redux, and use the API to submit it, then show the submitted items.

I’m not a fan of the idea the Wordpress developers seem to have of trying to make it some kind of all-singing, all-dancing universal platform for the web, and the REST API seems to be part of that idea, but it does make it a lot easier than it was in the past to do something a bit out of the ordinary with Wordpress. In some cases it might be worth using Wordpress as the backend for a headless CMS, and the REST API makes that a practical approach. For simpler applications that just need to make a few AJAX calls, using the REST API is generally going to be more elegant and practical than any other approach to AJAX with Wordpress. It’s never going to perform as well or be as elegant as a custom-built REST API, but it’s definitely a step forward compared to the hoops you used to have to jump through to handle AJAX requests in Wordpress.

Summary

Wordpress is, and will remain for the foreseeable future, a pain in the backside to develop for compared to something like Laravel, and I remain completely mystified by the number of people who seem to think it’s the greatest thing since sliced bread. However, it is possible to make things better if you know how - it’s just that some of this stuff seems to be relatively obscure. In particular, discovering Bedrock is potentially game-changing because it makes it so much easier to keep the site under version control.

Rendering different views for mobile and desktop clients in Laravel

$
0
0

This was a bit of a weird post to write. It started out explaining how I resolved an issue years ago on a CodeIgniter site, but amended to work for Laravel. In the process, I realised it made sense to implement it as middleware, and I ended up pulling it out into a package. However, it’s still useful to understand the concept behind it, even if you prefer to just install the complete package, because your needs might be slightly different to mine.

On web development forums, it’s quite common to see variants of the following question:

How do I redirect a user on a mobile device to a mobile version of the site?

It’s quite surprising that this is still an issue that crops up. For many years, it’s been widely accepted that the correct solution for this problem is responsive design. However, there are ways in which this may not be adequate for certain applications. For instance, you may have an application where certain functionality only makes sense in a certain context, or your user interface may need to be optimised for specific environments.

The trouble is that a dedicated mobile site isn’t a good idea either. Among other things, it means that users can’t easily use the same bookmarks between desktop and mobile versions, and can result in at least some of the server-side logic being duplicated.

Fortunately, there is another way - dynamic serving allows you to render different content based on the user agent. You can also easily enable users to switch between desktop and mobile versions themselves if their client isn’t detected correctly or they just prefer the other one. I’ve implemented this years ago for a CodeIgniter site. Here’s how you might implement it in Laravel, although if you understand the principle behind it, it should be easy to adapt for any other framework.

Don’t try to implement mobile user agent detection yourself. Instead, find an implementation that’s actively maintained and install it with Composer. That way you can be reasonably sure that as new mobile devices come onto the market the package will detect them correctly as long as you keep it up to date. I would be inclined to go for Agent, since it has Laravel support baked in.

We could just use Agent to serve up different content based on the user agent. However, user agent strings are notoriously unreliable - if a new mobile device appears and it doesn’t show up correctly in Agent, users could find themselves forced to use the wrong UI. Instead, we need to check for a flag in the session that indicates if the session is mobile or not. If it’s not set, we set it based on the user agent. That way, if you need to offer functionality to override the detected session type, you can just update that session variable to correct that elsewhere in the application. I would be inclined to use a button in the footer that makes an AJAX request to toggle the flag, then reloads the page.

You also need to set the HTTP response header Vary: User-Agent to notify clients (including not only search engines, but also proxies at either end of the connection, such as Varnish or Squid) that the response will differ by user agent, in order to prevent users being served the wrong version.

Middleware is the obvious place to do this. Here’s a middleware that sets the session variable and the appropriate response headers:

<?php
namespace App\Http\Middleware;
use Closure;
use Jenssegers\Agent\Agent;
use Illuminate\Contracts\Session\Session;
class DetectMobile
{
protected $agent;
protected $session;
public function __construct(Agent $agent, Session $session)
{
$this->agent = $agent;
$this->session = $session;
}
/**
* Handle an incoming request.
*
* @param \Illuminate\Http\Request $request
* @param \Closure $next
* @return mixed
*/
public function handle($request, Closure $next)
{
if (!$this->session->exists('mobile')) {
if ($this->agent->isMobile() || $this->agent->isTablet()) {
$this->session->put('mobile', true);
} else {
$this->session->put('mobile', false);
}
}
$response = $next($request);
return $response->setVary('User-Agent');
}
}

Now, you could then work with the session directly to retrieve the mobile flag, but as you may be working in the view, it makes sense to create helpers for this:

<?php
if (!function_exists('is_mobile')) {
function is_mobile()
{
$session = app()->make('Illuminate\Contracts\Session\Session');
return $session->get('mobile') == true;
}
}
if (!function_exists('is_desktop')) {
function is_desktop()
{
$session = app()->make('Illuminate\Contracts\Session\Session');
return $session->get('mobile') == false;
}
}

Now, if you want to serve up completely different views, you can use these helpers in your controllers. If you instead want to selectively show and hide parts of the UI based on the user agent, you can instead use these in the views to determine what parts of the page should be shown.

Agent offers more functionality than just detecting if a user agent is a mobile or desktop device, and you may find this useful as a starting point for developing middleware for detecting bots, or showing different content to users based on their device type or operating system. If you just need to detect if a user is a mobile or desktop client, this middleware should be sufficient.

Console applications with the Symfony Console component

$
0
0

Recently I’ve had the occasion to add a series of console commands to a legacy application. This can be made straightforward by using the Symfony console component. In this post I’ll demonstrate how to write a simple console command for clearing a cache folder.

The first step is to install the Console component:

$ composer require symfony/console

Then we write the main script for the application. I usually save mine as console - note that we don’t want to have to type out a file extension, so instead we use the shebang:

#!/user/bin/env php
<?php
require __DIR__.'/vendor/autoload.php';
use Symfony\Component\Console\Application;
define('CONSOLE_ROOT', __DIR__);
$app = new Application();
$app->run();

In this case, I’ve defined CONSOLE_ROOT as the directory in which the console command is run - that way, the commands can use it to refer to the application root.

We can then run our console application as follows:

$ php console
Console Tool
Usage:
command [options] [arguments]
Options:
-h, --help Display this help message
-q, --quiet Do not output any message
-V, --version Display this application version
--ansi Force ANSI output
--no-ansi Disable ANSI output
-n, --no-interaction Do not ask any interactive question
-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
Available commands:
help Displays help for a command
list Lists commands

This displays the available commands, but you’ll note that there are none except for help and list. We’ll remedy that. First, we’ll register a command:

$app->add(new App\Console\ClearCacheCommand);

This has to be done in console, after we create $app, but before we run it.

Don’t forget to update the autoload section of your composer.json to register the namespace:

"autoload": {
"psr-4": {
"App\\Console\\": "src/Console/"
}
},

Then create the class for that command. This class must extend Symfony\Component\Console\Command\Command, and must have two methods:

  • configure()
  • execute()

In addition, the execute() method must accept two arguments, an instance of Symfony\Component\Console\Input\InputInterface, and an instance of Symfony\Component\Console\Output\OutputInterface. There are used to retrieve input and display output.

Let’s write our command:

<?php
namespace App\Console;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
class ClearCacheCommand extends Command
{
protected function configure()
{
$this->setName('cache:clear')
->setDescription('Clears the cache')
->setHelp('This command clears the application cache');
}
protected function execute(InputInterface $input, OutputInterface $output)
{
$dir = CONSOLE_ROOT.DIRECTORY_SEPARATOR.'cache';
$this->deleteTree($dir);
$output->writeln('Cache cleared');
}
private function deleteTree($dir)
{
$files = array_diff(scandir($dir), array('.','..'));
foreach ($files as $file) {
(is_dir("$dir/$file")) ? $this->deleteTree("$dir/$file") : unlink("$dir/$file");
}
return rmdir($dir);
}
}

As you can see, in the configure() method, we set the name, description and help text for the command.

The execute() method is where the actual work is done. In this case, we have some code that needs to be called recursively, so we have to pull it out into a private method. Once that’s done we use $output->writeln() to write a line to the output.

Now, if we run our console task, we should see our new command:

$ php console
Console Tool
Usage:
command [options] [arguments]
Options:
-h, --help Display this help message
-q, --quiet Do not output any message
-V, --version Display this application version
--ansi Force ANSI output
--no-ansi Disable ANSI output
-n, --no-interaction Do not ask any interactive question
-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
Available commands:
help Displays help for a command
list Lists commands
cache
cache:clear Clears the cache

And we can see it in action too:

$ php console cache:clear
Cache cleared

For commands that need to accept additional arguments, you can define them in the configure() method:

$this->addArgument('file', InputArgument::REQUIRED, 'Which file do you want to delete?')

Then, you can access it in the execute() method using InputInterface:

$file = $input->getArgument('file');

This tutorial is just skimming the surface of what you can do with the Symfony Console components - indeed, many other console interfaces, such as Laravel’s Artisan, are built on top of it. If you have a legacy application built in a framework that lacks any sort of console interface, such as CodeIgniter, then you can quite quickly produce basic console commands for working with that application. The documentation is very good, and with a little work you can soon have something up and running.

Building a letter classifier in PHP with Tesseract OCR and PHP ML

$
0
0

PHP isn’t the first language that springs to mind when it comes to machine learning. However, it is practical to use PHP for machine learning purposes. In this tutorial I’ll show you how to build a pipeline for classifying letters.

The brief

Before I was a web dev, I was a clerical worker for an FTSE-100 insurance company, doing a lot of work that nowadays is possible to automate away, if you know how. When they received a letter or other communication from a client, it would be sent to be scanned on. Once scanned, a human would have to look at it to classify it, eg was it a complaint, a request for information, a request for a quote, or something else, as well as assign it to a policy number. Let’s imagine we’ve been asked to build a proof of concept for automating this process. This is a good example of a real-world problem that machine learning can help with.

As this is a proof of concept we aren’t looking to build a web app for this - for simplicity’s sake this will be a command-line application. Unlike emails, letters don’t come in an easily machine-readable format, so we will be receiving them as PDF files (since they would have been scanned on, this is a reasonable assumption). Feel free to mock up your own example letters using your own classifications, but I will be classifying letters into four groups:

  • Complaints - letters expressing dissatisfaction
  • Information requests - letters requesting general information
  • Surrender quotes - letters requesting a surrender quote
  • Surrender forms - letters requesting surrender forms

Our application will therefore take in a PDF file at one end, and perform the following actions on it:

  • Convert the PDF file to a PNG file
  • Use OCR (optical character recognition) to convert the letter to plain text
  • Strip out unwanted whitespace
  • Extract any visible policy number from the text
  • Use a machine learning library to classify the letter, having taught it using prior examples

Sound interesting? Let’s get started…

Introducing pipelines

As our application will be carrying out a series of discrete steps on our data, it makes sense to use the pipeline pattern for this project. Fortunately, the PHP League have produced a excellent package implementing this. We can therefore create a single class for each step in the process and have it handle that in isolation.

We’ll also use the Symfony Console component to implement our command-line application. For our machine learning library we will be using PHP ML, which requires PHP 7.1 or greater. For OCR, we will be using Tesseract, so you will need to install the underlying Tesseract OCR library, as well as support for your language. On Ubuntu you can install these as follows:

$ sudo apt-get install tesseract-ocr tesseract-ocr-eng

This assumes you are using English, however you should be able to find packages to support many other languages. Finally, we need ImageMagick to be installed in order to convert PDF files to PNG’s.

Your composer.json should look something like this:

{
"name": "matthewbdaly/letter-classifier",
"description": "Demo of classifying letters in PHP",
"type": "project",
"require": {
"league/pipeline": "^0.3.0",
"thiagoalessio/tesseract_ocr": "^2.2",
"php-ai/php-ml": "^0.6.2",
"symfony/console": "^4.0"
},
"require-dev": {
"phpspec/phpspec": "^4.3",
"psy/psysh": "^0.8.17"
},
"autoload": {
"psr-4": {
"Matthewbdaly\\LetterClassifier\\": "src/"
}
},
"license": "MIT",
"authors": [
{
"name": "Matthew Daly",
"email": "matthewbdaly@gmail.com"
}
]
}

Next, let’s write the outline of our command-line client. We’ll load a single class for our processor command. Save this as app:

#!/usr/bin/env php
<?php
require __DIR__.'/vendor/autoload.php';
use Symfony\Component\Console\Application;
use Matthewbdaly\LetterClassifier\Commands\Processor;
$application = new Application();
$application->add(new Processor());
$application->run();

Next, we create our command. Save this as src/Commands/Processor.php:

<?php
namespace Matthewbdaly\LetterClassifier\Commands;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Input\InputArgument;
use League\Pipeline\Pipeline;
use Matthewbdaly\LetterClassifier\Stages\ConvertPdfToPng;
use Matthewbdaly\LetterClassifier\Stages\ReadFile;
use Matthewbdaly\LetterClassifier\Stages\Classify;
use Matthewbdaly\LetterClassifier\Stages\StripTabs;
use Matthewbdaly\LetterClassifier\Stages\GetPolicyNumber;
class Processor extends Command
{
protected function configure()
{
$this->setName('process')
->setDescription('Processes a file')
->setHelp('This command processes a file')
->addArgument('file', InputArgument::REQUIRED, 'File to process');
}
protected function execute(InputInterface $input, OutputInterface $output)
{
$file = $input->getArgument('file');
$pipeline = (new Pipeline)
->pipe(new ConvertPdfToPng)
->pipe(new ReadFile)
->pipe(new StripTabs)
->pipe(new GetPolicyNumber)
->pipe(new Classify);
$response = $pipeline->process($file);
$output->writeln("Classification is ".$response['classification']);
$output->writeln("Policy number is ".$response['policy']);
}
}

Note how our command accepts the file name as an argument. We then instantiate our pipeline and pass it through a series of classes, each of which has a single role. Finally, we retrieve our response and output it.

With that done, we can move on to implementing our first step. Save this as src/Stages/ConvertPdfToPng.php:

<?php
namespace Matthewbdaly\LetterClassifier\Stages;
use Imagick;
class ConvertPdfToPng
{
public function __invoke($file)
{
$tmp = tmpfile();
$uri = stream_get_meta_data($tmp)['uri'];
$img = new Imagick();
$img->setResolution(300, 300);
$img->readImage($file);
$img->setImageDepth(8);
$img->setImageFormat('png');
$img->writeImage($uri);
return $tmp;
}
}

This stage fetches the file passed through, and converts it into a PNG file, stores it as a temporary file, and returns a reference to it. The output of this stage will then form the input of the next. This is how pipelines work, and it makes it easy to break up a complex process into multiple steps that can be reused in different places, facilitating easier code reuse and making your code simpler to understand and reason about.

Our next step carries out optical character recognition. Save this as src/Stages/ReadFile.php:

<?php
namespace Matthewbdaly\LetterClassifier\Stages;
use thiagoalessio\TesseractOCR\TesseractOCR;
class ReadFile
{
public function __invoke($file)
{
$uri = stream_get_meta_data($file)['uri'];
$ocr = new TesseractOCR($uri);
return $ocr->lang('eng')->run();
}
}

As you can see, this accepts the link to the temporary file as an argument, and runs Tesseract on it to retrieve the text. Note that we specify a language of eng - if you want to use a language other than English, you should specify it here.

At this point, we should have some usable text, but there may be unknown amounts of whitespace, so our next step uses a regex to strip them out. Save this as src/Stages/StripTabs.php:

<?php
namespace Matthewbdaly\LetterClassifier\Stages;
class StripTabs
{
public function __invoke($content)
{
return trim(preg_replace('/\s+/', ' ', $content));
}
}

With our whitespace issue sorted out, we now need to retrieve the policy number the communication should be filed under. These are generally regular alphanumeric patterns, so regexes are a suitable way of matching them. As this is a proof of concept, we’ll assume a very simple pattern for policy numbers in that they will consist of between seven and nine digits. Save this as src/Stages/GetPolicyNumber.php:

<?php
namespace Matthewbdaly\LetterClassifier\Stages;
class GetPolicyNumber
{
public function __invoke($content)
{
$matches = [];
$policyNumber = '';
preg_match('/\d{7,9}/', $content, $matches);
if (count($matches)) {
$policyNumber = $matches[0];
}
return [
'content' => $content,
'policy' => $policyNumber
];
}
}

Finally, we’re onto the really tough part - using machine learning to classify the letters. Save this as src/Stages/Classify.php:

<?php
namespace Matthewbdaly\LetterClassifier\Stages;
use Phpml\Dataset\CsvDataset;
use Phpml\Dataset\ArrayDataset;
use Phpml\FeatureExtraction\TokenCountVectorizer;
use Phpml\Tokenization\WordTokenizer;
use Phpml\CrossValidation\StratifiedRandomSplit;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Metric\Accuracy;
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;
class Classify
{
protected $classifier;
protected $vectorizer;
protected $tfIdfTransformer;
public function __construct()
{
$this->dataset = new CsvDataset('data/letters.csv', 1);
$this->vectorizer = new TokenCountVectorizer(new WordTokenizer());
$this->tfIdfTransformer = new TfIdfTransformer();
$samples = [];
foreach ($this->dataset->getSamples() as $sample) {
$samples[] = $sample[0];
}
$this->vectorizer->fit($samples);
$this->vectorizer->transform($samples);
$this->tfIdfTransformer->fit($samples);
$this->tfIdfTransformer->transform($samples);
$dataset = new ArrayDataset($samples, $this->dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);
$this->classifier = new SVC(Kernel::RBF, 10000);
$this->classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $this->classifier->predict($randomSplit->getTestSamples());
echo 'Accuracy: '.Accuracy::score($randomSplit->getTestLabels(), $predictedLabels);
}
public function __invoke(array $message)
{
$newSample = [$message['content']];
$this->vectorizer->transform($newSample);
$this->tfIdfTransformer->transform($newSample);
$message['classification'] = $this->classifier->predict($newSample)[0];
return $message;
}
}

In our constructor, we train up our model by passing our sample data through the following steps:

  • First, we use the token count vectorizer to convert our samples to a vector of token counts - replacing every word with a number and keeping track of how often that word occurs.
  • Next, we use TfIdfTransformer to get statistics about how important a word is in a document.
  • Then we instantiate our classifier and train it on a random subset of our data.
  • Finally, we pass our message to our now-trained classifier and see what it tells us.

Now, bear in mind I don’t have a background in machine learning and this is the first time I’ve done anything with machine learning, so I can’t tell you much more than that - if you want to know more I suggest you investigate on your own. In figuring this out I was helped a great deal by this article on Sitepoint, so you might want to start there.

The finished application is on GitHub, and the repository includes a CSV file of training data, as well as the examples folder, which contains some example PDF files. You can run it as follows:

$ php app process examples/Quote.pdf

I found that once I had trained it up using the CSV data from the repository, it was around 70-80% accurate, which isn’t bad at all considering the comparatively small size of the dataset. If this were genuinely being used in production, there would be an extremely large dataset of historical scanned letters to use for training purposes, so it wouldn’t be unreasonable to expect much better results under those circumstances.

Exercises for the reader

If you want to develop this concept further, here are some ideas:

  • We should be able to correct the model when it’s wrong. Add a separate command to train the model by passing through a file and specifying how it should be categorised, eg php app train File.pdf quote.
  • Try processing information from different sources. For instance, you could replace the first two stages with a stage that pulls all unread emails from a specified mailbox using PHP’s IMAP support, or fetching data from the Twitter API. Or you could have a telephony service such as Twilio set up as your voicemail, and automatically transcribe them, then pass the text to PHP ML for classification.
  • If you’re multilingual, you could try adding a step to sort letters by language and have separate models for classifying in each language

Summary

It’s actually quite a sobering thought that already it’s possible to use techniques like these to produce tools that replace people in various jobs, and as the tooling matures more and more tasks involving classification are going to become amenable to automation using machine learning.

This was my first experience with machine learning and it’s been very interesting for me to solve a real-world problem with it. I hope it gives you some ideas about how you could use it too.


Full-text search with MariaDB

$
0
0

Recently I had the occasion to check out MariaDB’s implementation of full-text search. As it’s a relatively recent arrival in MySQL and MariaDB, it doesn’t seem to get all that much attention. In this post I’ll show you how to use it, with a few Laravel-specific pointers. We’ll be using the default User model in a new Laravel installation, which has columns for name and email.

Our first task is to create the fulltext index, which is necessary to perform the query. Run the following command:

ALTER TABLE users ADD FULLTEXT (name, email);

As you can see, we can specify multiple columns in our table to index.

If you’re using Laravel, you’ll want to create the following migration for this:

<?php
use Illuminate\Support\Facades\Schema;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Database\Migrations\Migration;
class AddFulltextIndexForUsers extends Migration
{
/**
* Run the migrations.
*
* @return void
*/
public function up()
{
DB::statement('ALTER TABLE users ADD FULLTEXT(name, email)');
}
/**
* Reverse the migrations.
*
* @return void
*/
public function down()
{
DB::statement('ALTER TABLE users DROP INDEX IF EXISTS name');
}
}

Note that the index is named after the first field passed to it, so when we drop it we refer to it as name. Then, to actually query the index, you should run a command something like this:

SELECT * FROM users WHERE MATCH(name, email) AGAINST ('jeff' IN NATURAL LANGUAGE MODE);

Note that NATURAL LANGUAGE MODE is actually the default, so you can leave it off if you wish. We also have to specify the columns to match against.

If you’re using Laravel, you may want to create a reusable local scope for it:

public function scopeSearch($query, $search)
{
if (!$search) {
return $query;
}
return $query->whereRaw('MATCH(name, email) AGAINST (?)', [$search]);
}

Then you can call it as follows:

User::search('jeff')->get();

I personally have noticed that the query using the MATCH keywords seems to be far more performant, with the response time being between five and ten times less than a similar command using LIKE, however this observation isn’t very scientific (plus, we are talking about queries that still run in a fraction of a second). However, if you’re doing a particularly expensive query that currently uses a LIKE statement, it’s possible you may get better results by switching to a MATCH statement. Full-text search probably isn’t all that useful in this context - it’s only once we’re talking about longer text, such as blog posts, that some of the advantages like support for stopwords comes into play.

From what I’ve seen this implementation of full-text search is a lot simpler than in PostgreSQL, which has ups and downs. On the one hand, it’s a lot easier to implement, but conversely it’s less useful - there’s no obvious way to perform a full-text search against joined tables. However, it does seem to be superior to using a LIKE statement, so it’s probably a good fit for smaller sites where something like Elasticsearch would be overkill.

Logging to the ELK stack with Laravel

$
0
0

Logging to text files is the simplest and most common logging setup for web apps, and it works fine for relatively small and simple applications. However, it does have some downsides:

  • It’s difficult to make the log files accessible - normally users have to SSH in to read them.
  • The tools used to filter and analyse log files have a fairly high technical barrier to access - grep and sed are not exactly easy for non-programmers to pick up, so business information can be hard to get.
  • It’s hard to visually identify trends in the data.
  • Log files don’t let you know immediately when something urgent happens
  • You can’t access logs for different applications through the same interface.

For rare, urgent issues where you need to be informed immediately they occur, it’s straightforward to log to an instant messaging solution such as Slack or Hipchat. However, these aren’t easily searchable, and can only be used for the most important errors (otherwise, there’s a risk that important data will be lost in the noise). There are third-party services that allow you to search and filter your logs, but they can be prohibitively expensive.

The ELK stack has recently gained a lot of attention as a sophisticated solution for logging application data. It consists of:

  • Logstash for processing log data
  • Elasticsearch as a searchable storage backend
  • Kibana as a web interface

By making the log data available using a powerful web interface, you can easily expose it to non-technical users. Kibana also comes with powerful tools to aggregate and filter the data. In addition, you can run your own instance, giving you a greater degree of control (as well as possibly being more cost-effective) compared to using a third-party service.

In this post I’ll show you how to configure a Laravel application to log to an instance of the ELK stack. Fortunately, Laravel uses the popular Monolog logging library by default, which is relatively easy to get to work with the ELK stack. First, we need to install support for the GELF logging format:

$ composer require graylog2/gelf-php

Then, we create a custom logger class:

<?php
namespace App\Logging;
use Monolog\Logger;
use Monolog\Handler\GelfHandler;
use Gelf\Publisher;
use Gelf\Transport\UdpTransport;
class GelfLogger
{
/**
* Create a custom Monolog instance.
*
* @param array $config
* @return \Monolog\Logger
*/
public function __invoke(array $config)
{
$handler = new GelfHandler(new Publisher(new UdpTransport($config['host'], $config['port'])));
return new Logger('main', [$handler]);
}
}

Finally, we configure our application to use this as our custom driver and specify the host and port in config/logging.php:

'custom' => [
'driver' => 'custom',
'via' => App\Logging\GelfLogger::class,
'host' => '127.0.0.1',
'port' => 12201,
],

You can then set up whatever logging channels you need for your application, and specify whatever log level you feel is appropriate.

Please note that this requires at least Laravel 5.6 - this file doesn’t exist in Laravel 5.5 and earlier, so you may have more work on your hands to integrate it with older versions.

If you already have an instance of the ELK stack set up on a remote server that’s already set up to accept input as GELF, then you should be able to point it at that and you’ll be ready to go. If you just want to try it out, I’ve been using a Docker-based project that makes it straightforward to run the whole stack locally. However, you will need to amend logstash/pipeline/logstash.conf as follows to allow it to accept log data:

input {
tcp {
port => 5000
}
gelf {
port => 12201
type => gelf
codec => "json"
}
}
## Add your filters / logstash plugins configuration here
output {
elasticsearch {
hosts => "elasticsearch:9200"
}
}

Then you can start it up using the instructions in the repository and it should be ready to go. Now, if you run the following command from Tinker:

Log::info('Just testing');

Then if you access the web interface, you should be able to find that log message without any difficulty.

Now, this only covers the Laravel application logs. You may well want to pass other logs through to Logstash, such as Apache, Nginx or MySQL logs, and a quick Google should be sufficient to find ideas on how you might log for these services. Creating visualisations with Kibana is a huge subject, and the existing documentation covers that quite well, so if you’re interested in learning more about that I’d recommend reading the documentation and having a play with the dashboard.

Forcing SSL in CodeIgniter

$
0
0

I haven’t started a new CodeIgniter project since 2014, and don’t intend to, but on occasion I’ve been asked to do maintenance work on legacy CodeIgniter projects. This week I was asked to help out with a situation where a CodeIgniter site was being migrated to HTTPS and there were issues resulting from the migration.

Back in 2012, when working on my first solo project, I’d built a website using CodeIgniter that used HTTPS, but also needed to support an affiliate marketing system that did not support it, so certain pages had to force HTTP, and others had to force HTTPS, so I’d used the hook system to create hooks to enforce this. This kind of requirement is unlikely to reoccur now because HTTPS is becoming more prevalent, but sometimes it may be easier to enforce HTTPS at application level than in the web server configuration or using htaccess. It’s relatively straightforward to do that in CodeIgniter.

The first step is to create the hook. Save this as application/hooks/ssl.php:

<?php
function force_ssl()
{
$CI =& get_instance();
$CI->config->config['base_url'] = str_replace('http://', 'https://', $CI->config->config['base_url']);
if ($_SERVER['SERVER_PORT'] != 443) redirect($CI->uri->uri_string());
}
?>

Next, we register the hook. Update application/configs/hooks.php as follows:

<?php if ( ! defined('BASEPATH')) exit('No direct script access allowed');
/*
| -------------------------------------------------------------------------
| Hooks
| -------------------------------------------------------------------------
| This file lets you define "hooks" to extend CI without hacking the core
| files. Please see the user guide for info:
|
| http://codeigniter.com/user_guide/general/hooks.html
|
*/
$hook['post_controller_constructor'][] = array(
'function' => 'force_ssl',
'filename' => 'ssl.php',
'filepath' => 'hooks'
);
/* End of file hooks.php */
/* Location: ./application/config/hooks.php */

This tells CodeIgniter that it should looks in the application/hooks directory for a file called ssl.php, and return the function force_ssl.

Finally, we enable hooks. Update application/config/config.php:

$config['enable_hooks'] = TRUE;

If you only want to force SSL in production, not development, you may want to amend the ssl.php file to only perform the redirect in non-development environments, perhaps by using an environment variable via DotEnv.

Better strings in PHP

$
0
0

One of the weaknesses of PHP as a programming language is the limitations of some of the fundamental types. For instance, a string in PHP is a simple value, rather than an object, and doesn’t have any methods associated with it. Instead, to manipulate a string, you have to call all manner of functions. By comparison, in Python, not only can you call methods on a string, and receive a new string as the response, making them easily chainable, but you can also iterate through a string, as in this example:

>>> a = 'foo'
>>> a.upper()
'FOO'
>>> a.lower()
'foo'
>>> for letter in a:
... print(letter)
...
f
o
o

A little while back, I read Adam Wathan’s excellent book Refactoring to Collections, which describes how you can use a collection implementation (such as the one included with Laravel) to replace convoluted array manipulation with simpler, chainable calls to a collection object. Using this approach, you can turn something like this:

$result = array_filter(
array_map(function ($item) {
return $item->get('foo');
}, $items),
function ($item) {
return $item->bar == true;
});

Or, even worse, this:

$result1 = array_map(function ($item) {
return $item->get('foo');
}, $items);
$result2 = array_filter($result1, function ($item) {
return $item->bar == true;
});

Into this:

$result = Collection::make($items)
->map(function ($item) {
return $item->get('foo');
})->filter(function ($item) {
return $item->bar == true;
})->toArray();

Much cleaner, more elegant, and far easier to understand.

A while back, after some frustration with PHP’s native strings, I started wondering how practical it would be to produce a string implementation that was more like the string objects in languages like Python and Javascript, with inspiration from collection implementations such as that used by Laravel. I soon discovered that it was very practical, and with a bit of work it’s not hard to produce your own, more elegant string class.

The most fundamental functionality required is to be able to create a string object, either by passing a string to the constructor or calling a static method. Our string class should be able to do both:

<?php
class Str
{
protected $string;
public function __construct(string $string = '')
{
$this->string = $string;
}
public static function make(string $string)
{
return new static($string);
}
}

Making it iterable

To be able to get the length of a string, it needs to implement the Countable interface:

use Countable;
class Str implements Countable
{
...
public function count()
{
return strlen($this->string);
}
}

To access it as an array, it needs to implement the ArrayAccess interface:

...
use ArrayAccess;
class Str implements Countable, ArrayAccess
{
...
public function offsetExists($offset)
{
return isset($this->string[$offset]);
}
public function offsetGet($offset)
{
return isset($this->string[$offset]) ? $this->string[$offset] : null;
}
public function offsetSet($offset, $value)
{
if (is_null($offset)) {
$this->string[] = $value;
} else {
$this->string[$offset] = $value;
}
}
public function offsetUnset($offset)
{
$this->string = substr_replace($this->string, '', $offset, 1);
}
}

And to make it iterable, it needs to implement the Iterator interface:

use Iterator;
class Str implements Countable, ArrayAccess, Iterator
{
...
public function current()
{
return $this->string[$this->position];
}
public function key()
{
return $this->position;
}
public function next()
{
++$this->position;
}
public function rewind()
{
$this->position = 0;
}
public function valid()
{
return isset($this->string[$this->position]);
}
}

Making it work as a string

To be useful, it also needs to be possible to actually use it as a string - for instance, you should be able to do this:

$foo = Str::make('I am the very model of a modern major general');
echo $foo;

Fortunately, the __toString() magic method allows this:

public function __toString()
{
return $this->string;
}

Adding methods

With that functionality in place, you can then start adding support for the methods you need in your string objects. If you’re looking to be able to use the same functionality as existing PHP methods, you can call those functions inside your methods. However, be sure to return a new instance of your string object from each method - that way, you can continually chain them:

public function replace($find, $replace)
{
return new static(str_replace($find, $replace, $this->string));
}
public function toUpper()
{
return new static(strtoupper($this->string));
}
public function toLower()
{
return new static(strtolower($this->string));
}
public function trim()
{
return new static(trim($this->string));
}
public function ltrim()
{
return new static(ltrim($this->string));
}
public function rtrim()
{
return new static(rtrim($this->string));
}

Now, you can write something like this:

return Str::make('I am the very model of a modern major general ')
->trim()
->replace('modern major general', 'scientist Salarian')
->toLower();

While you could do this with PHP’s native string functions alone, it would be a lot less elegant. In addition, if you have other, more complex string manipulations that you often do in a particular application, it may make sense to write a method for that so that your string objects can encapsulate that functionality for easier reuse.

As our string objects are iterable, we can also do this:

>>> $foo = Str::make('foo');
>>> foreach ($foo as $letter) { echo "$letter\n"; }
f
o
o

If you have an application that does some complex string manipulation, having a string utility class like this can make for much more expressive, elegant and easy-to-comprehend code than PHP’s native string functions. If you want to see a working implementation for this, check out my proof of concept collection and string utility library Proper.

Switching from Vim to Neovim

$
0
0

I honestly thought it would never happen. I’ve been using Vim since 2008, and every other editor I’ve tried (including VSCode, Emacs, Sublime Text and Atom) hasn’t come up to scratch. There were a few useful features in PHPStorm, to be fair, but nothing that justified the bother of moving. Also, I suffer from a degree of RSI from my prior career as an insurance clerk (years of using crap keyboards and mice on Windows XP took its toll…), and Vim has always been the most RSI-friendly editor I found.

Yet I have actually gone ahead and migrated away… to Neovim. Of course, the fact that the workflow is essentially identical helps in the migration process, as does the fact that it supports most of the same plugins.

My workflow has always been strongly CLI-based. I use GNU Screen and Byobu together to run multiple “tabs” in the terminal, so the lack of GUI support in Neovim doesn’t bother me in the slightest. The only change I really made was to my .bash_aliases so that the Vim command ran screen -t Vim nvim, so that it would open up Neovim rather than Vim in a new Screen tab.

Initially I switched straight over to using the same settings and plugins I had with Vim, and they worked seamlessly. However, after a while I decided to use the opportunity to completely overhaul the plugins and settings I used and largely start over - cull the ones I no longer needed, add some new ones, and comment it properly.

Loading plugins

I used to use Pathogen to manage my Vim plugins, but it didn’t actually import the plugins itself, and just provided a structure for them. This meant that the only practical way I found to pull in third-party plugins was to set them up as Git submodules, meaning I had to store my configuration in version control and clone it recursively onto a new machine. It also made updating cumbersome.

Now I’ve switched to vim-plug, which makes things much easier. I can define my dependencies in my .config/nvim/init.vim and pull them in with PlugInstall. If I want to update them, I run PlugUpdate, or if I need to add something else, I merely add it in the file and run PlugInstall again. Nice and easy.

The first section of my configuration file loads the dependencies:

call plug#begin()
" NERDTree
Plug 'scrooloose/nerdtree'
" Git integration
Plug 'tpope/vim-fugitive'
Plug 'airblade/vim-gitgutter'
" Linting
Plug 'neomake/neomake'
Plug 'w0rp/ale'
" PHP-specific integration
Plug 'phpactor/phpactor' , {'do': 'composer install', 'for': 'php'}
Plug 'ncm2/ncm2'
Plug 'roxma/nvim-yarp'
Plug 'phpactor/ncm2-phpactor'
" Snippets
Plug 'SirVer/ultisnips'
Plug 'honza/vim-snippets'
" Comments
Plug 'tpope/vim-commentary'
" Search
Plug 'ctrlpvim/ctrlp.vim'
" Syntax
Plug 'sheerun/vim-polyglot'
Plug 'matthewbdaly/vim-filetype-settings'
" Themes
Plug 'nanotech/jellybeans.vim' , {'as': 'jellybeans'}
call plug#end()

As always, it’s a good idea to comment your config and try to group things logically. Note that I have one plugin of my own listed here - this is just a collection of settings for different filetypes, such as making Javascript files use 2 spaces for indentation, and it’s easier to keep that in a repository and pull it in as a dependency.

Completion

The next part of the config deals with configuration. Most of the time the default omnicompletion is pretty good, but in the process of building out this config, I discovered PHPActor, which has massively improved my development experience with PHP - it finally provides completion as good as most IDE’s, and also provides similar refactoring tools. My config for completion currently looks like this:

"Completion
autocmd FileType * setlocal formatoptions-=c formatoptions-=r formatoptions-=o
set ofu=syntaxcomplete#Complete
autocmd FileType php setlocal omnifunc=phpactor#Complete
let g:phpactorOmniError = v:true
autocmd BufEnter * call ncm2#enable_for_buffer()
set completeopt=noinsert,menuone,noselect

General config

This is a set of standard settings for the general behaviour of the application, such as setting the colorscheme and default indentation levels. I also routinely disable the mouse because it bugs me.

"General
syntax on
colorscheme jellybeans
set nu
filetype plugin indent on
set nocp
set ruler
set wildmenu
set mouse-=a
set t_Co=256
"Code folding
set foldmethod=manual
"Tabs and spacing
set autoindent
set cindent
set tabstop=4
set expandtab
set shiftwidth=4
set smarttab
"Search
set hlsearch
set incsearch
set ignorecase
set smartcase
set diffopt +=iwhite

Markdown configuration

This section sets the file type for Markdown. It disables the Markdown plugin included in vim-polyglot as I had problems with it, and sets the languages that will be highlighted in fenced code blocks. I may at some point migrate this to the filetype repository.

"Syntax highlighting in Markdown
au BufNewFile,BufReadPost *.md set filetype=markdown
let g:polyglot_disabled = ['markdown']
let g:markdown_fenced_languages = ['bash=sh', 'css', 'django', 'javascript', 'js=javascript', 'json=javascript', 'perl', 'php', 'python', 'ruby', 'sass', 'xml', 'html', 'vim']

Neomake

I used to use Syntastic for checking my code for errors, but I’ve always found it problematic - it was slow and would often block the editor for some time. Neovim does have support for asynchronous jobs (as does Vim 8), but Syntastic doesn’t use it, so I decided to look elsewhere.

Neomake seemed a lot better, so I migrated over to it. It doesn’t require much configuration, and it’s really fast - unlike Syntastic, it supports asynchronous jobs. This part of the config sets it up to run on changes with no delay in writing, so I get near-instant feedback if a syntax error creeps in, and it doesn’t block the editor the way Syntastic used to.

" Neomake config
" Full config: when writing or reading a buffer, and on changes in insert and
" normal mode (after 1s; no delay when writing).
call neomake#configure#automake('nrwi', 500)

PHPActor

As mentioned above, PHPActor has dramatically improved my experience when coding in PHP by providing access to features normally found only in full IDE’s. Here’s the fairly standard config I use for the refactoring functionality:

" PHPActor config
" Include use statement
nmap <Leader>u :call phpactor#UseAdd()<CR>
" Invoke the context menu
nmap <Leader>mm :call phpactor#ContextMenu()<CR>
" Invoke the navigation menu
nmap <Leader>nn :call phpactor#Navigate()<CR>
" Goto definition of class or class member under the cursor
nmap <Leader>o :call phpactor#GotoDefinition()<CR>
" Transform the classes in the current file
nmap <Leader>tt :call phpactor#Transform()<CR>
" Generate a new class (replacing the current file)
nmap <Leader>cc :call phpactor#ClassNew()<CR>
" Extract expression (normal mode)
nmap <silent><Leader>ee :call phpactor#ExtractExpression(v:false)<CR>
" Extract expression from selection
vmap <silent><Leader>ee :<C-U>call phpactor#ExtractExpression(v:true)<CR>
" Extract method from selection
vmap <silent><Leader>em :<C-U>call phpactor#ExtractMethod()<CR>

Summary

Vim or Neovim configuration files are never static. Your needs are always changing, and you’re constantly discovering new plugins and new settings to try out, and keeping ones that prove useful. It’s been helpful to start over and ditch some plugins I no longer needed, pull in some new ones, and organise my configuration a bit better.

Now that I can set the dependencies in a text file rather than pulling them in as Git submodules, it makes more sense to keep my config in a Github Gist rather than a Git repository, and that’s where I plan to retain it for now. Feel free to fork or cannibalize it for your own purposes if you wish.

Mutation testing with Infection

$
0
0

Writing automated tests is an excellent way of catching bugs during development and maintenance of your application, not to mention the other benefits. However, it’s hard to gauge the quality of your tests, particularly when you first start out. Coverage will give you a good idea of what code was actually run during the test, but it won’t tell you if the test itself actually tests anything worthwhile.

Infection is a mutation testing framework. The documentation defines mutation testing as follows:

Mutation testing involves modifying a program in small ways. Each mutated version is called a Mutant. To assess the quality of a given test set, these mutants are executed against the input test set to see if the seeded faults can be detected. If mutated program produces failing tests, this is called a killed mutant. If tests are green with mutated code, then we have an escaped mutant.

Infection works by running the test suite, carrying out a series of mutations on the source code in order to try to break the tests, and then collecting the results. The actual mutations carried out are not random - there is a set of mutations that get carried out every time, so results should be consistent. Ideally, all mutants should be killed by your tests - escaped mutants can indicate that either the line of mutated code is not tested, or the tests for that line are not very useful.

I decided to add mutation testing to my Laravel shopping cart package. In order to use Infection, you need to be able to generate code coverage, which means having either XDebug or phpdbg installed. Once Infection is installed (refer to the documentation for this), you can run this command in the project directory to configure it:

$ infection

Infection defaults to using PHPUnit for the tests, but it also supports PHPSpec. If you’re using PHPSpec, you will need to specify the testing framework like this:

$ infection --test-framework=phpspec

Since PHPSpec doesn’t support code coverage out of the box, you’ll need to install a package for that - I used leanphp/phpspec-code-coverage.

On first run, you’ll be prompted to create a configuration file. Your source directory should be straightforward to set up, but at the next step, if your project uses interfaces in the source directory, you should exclude them. The rest of the defaults should be fine.

I found that the first run gave a large number of uncovered results, but the second and later ones were more consistent - not sure if it’s an issue with my setup or not. Running it gave me this:

$ infection
You are running Infection with xdebug enabled.
____ ____ __ _
/ _/___ / __/__ _____/ /_(_)___ ____
/ // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
_/ // / / / __/ __/ /__/ /_/ / /_/ / / / /
/___/_/ /_/_/ \___/\___/\__/_/\____/_/ /_/
0 [>---------------------------] < 1 secRunning initial test suite...
PHPUnit version: 6.5.13
27 [============================] 3 secs
Generate mutants...
Processing source code files: 5/5
Creating mutated files and processes: 43/43
.: killed, M: escaped, S: uncovered, E: fatal error, T: timed out
...................MMM...M.......M......... (43 / 43)
43 mutations were generated:
38 mutants were killed
0 mutants were not covered by tests
5 covered mutants were not detected
0 errors were encountered
0 time outs were encountered
Metrics:
Mutation Score Indicator (MSI): 88%
Mutation Code Coverage: 100%
Covered Code MSI: 88%
Please note that some mutants will inevitably be harmless (i.e. false positives).
Time: 21s. Memory: 12.00MB

Our test run shows 5 escaped mutants, and the remaining 38 were killed. We can view the results by looking at the generated infection-log.txt:

Escaped mutants:
================
1) /home/matthew/Projects/laravel-cart/src/Services/Cart.php:132 [M] DecrementInteger
--- Original
+++ New
@@ @@
{
$content = Collection::make($this->all())->map(function ($item) use($rowId) {
if ($item['row_id'] == $rowId) {
- if ($item['qty'] > 0) {
+ if ($item['qty'] > -1) {
$item['qty'] -= 1;
}
}
2) /home/matthew/Projects/laravel-cart/src/Services/Cart.php:132 [M] OneZeroInteger
--- Original
+++ New
@@ @@
{
$content = Collection::make($this->all())->map(function ($item) use($rowId) {
if ($item['row_id'] == $rowId) {
- if ($item['qty'] > 0) {
+ if ($item['qty'] > 1) {
$item['qty'] -= 1;
}
}
3) /home/matthew/Projects/laravel-cart/src/Services/Cart.php:132 [M] GreaterThan
--- Original
+++ New
@@ @@
{
$content = Collection::make($this->all())->map(function ($item) use($rowId) {
if ($item['row_id'] == $rowId) {
- if ($item['qty'] > 0) {
+ if ($item['qty'] >= 0) {
$item['qty'] -= 1;
}
}
4) /home/matthew/Projects/laravel-cart/src/Services/Cart.php:133 [M] Assignment
--- Original
+++ New
@@ @@
$content = Collection::make($this->all())->map(function ($item) use($rowId) {
if ($item['row_id'] == $rowId) {
if ($item['qty'] > 0) {
- $item['qty'] -= 1;
+ $item['qty'] = 1;
}
}
return $item;
5) /home/matthew/Projects/laravel-cart/src/Services/Cart.php:197 [M] OneZeroInteger
--- Original
+++ New
@@ @@
*/
private function hasStringKeys(array $items)
{
- return count(array_filter(array_keys($items), 'is_string')) > 0;
+ return count(array_filter(array_keys($items), 'is_string')) > 1;
}
/**
* Validate input
Timed Out mutants:
==================
Not Covered mutants:
====================

This displays the mutants that escaped, and include a diff of the changed code, so we can see that all of these involve changing the comparison operators.

The last one can be resolved easily because the comparison is superfluous - the result of count() can be evaluated as true or false by itself, so removing the > 0 at the end in the test solves the problem quite neatly.

The other four mutations are somewhat harder. They all amend the decrement method’s conditions, showing that a single assertion doesn’t really fully check the behaviour. Here’s the current test for that method:

<?php
namespace Tests\Unit\Services;
use Tests\TestCase;
use Matthewbdaly\LaravelCart\Services\Cart;
use Mockery as m;
class CartTest extends TestCase
{
/**
* @dataProvider arrayProvider
*/
public function testCanDecrementQuantity($data)
{
$data[0]['row_id'] = 'my_row_id_1';
$data[1]['row_id'] = 'my_row_id_2';
$newdata = $data;
$newdata[1]['qty'] = 1;
$session = m::mock('Illuminate\Contracts\Session\Session');
$session->shouldReceive('get')->with('Matthewbdaly\LaravelCart\Services\Cart')->once()->andReturn($data);
$session->shouldReceive('put')->with('Matthewbdaly\LaravelCart\Services\Cart', $newdata)->once();
$uniqid = m::mock('Matthewbdaly\LaravelCart\Contracts\Services\UniqueId');
$cart = new Cart($session, $uniqid);
$this->assertEquals(null, $cart->decrement('my_row_id_2'));
}
}

It should be possible to decrement it if the quantity is more than zero, but not to go any lower. However, our current test does not catch anything but decrementing it from 2 to 1, which doesn’t fully demonstrate this. We therefore need to add a few more assertions to cover taking it down to zero, and then trying to decrement it again. Here’s how we might do that.

<?php
namespace Tests\Unit\Services;
use Tests\TestCase;
use Matthewbdaly\LaravelCart\Services\Cart;
use Mockery as m;
class CartTest extends TestCase
{
/**
* @dataProvider arrayProvider
*/
public function testCanDecrementQuantity($data)
{
$data[0]['row_id'] = 'my_row_id_1';
$data[1]['row_id'] = 'my_row_id_2';
$newdata = $data;
$newdata[1]['qty'] = 1;
$session = m::mock('Illuminate\Contracts\Session\Session');
$session->shouldReceive('get')->with('Matthewbdaly\LaravelCart\Services\Cart')->once()->andReturn($data);
$session->shouldReceive('put')->with('Matthewbdaly\LaravelCart\Services\Cart', $newdata)->once();
$uniqid = m::mock('Matthewbdaly\LaravelCart\Contracts\Services\UniqueId');
$cart = new Cart($session, $uniqid);
$this->assertEquals(null, $cart->decrement('my_row_id_2'));
$newerdata = $newdata;
$newerdata[1]['qty'] = 0;
$session->shouldReceive('get')->with('Matthewbdaly\LaravelCart\Services\Cart')->once()->andReturn($newdata);
$session->shouldReceive('put')->with('Matthewbdaly\LaravelCart\Services\Cart', $newerdata)->once();
$this->assertEquals(null, $cart->decrement('my_row_id_2'));
$session->shouldReceive('get')->with('Matthewbdaly\LaravelCart\Services\Cart')->once()->andReturn($newerdata);
$session->shouldReceive('put')->with('Matthewbdaly\LaravelCart\Services\Cart', $newerdata)->once();
$this->assertEquals(null, $cart->decrement('my_row_id_2'));
}
}

If we re-run Infection, we now get a much better result:

$ infection
You are running Infection with xdebug enabled.
____ ____ __ _
/ _/___ / __/__ _____/ /_(_)___ ____
/ // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
_/ // / / / __/ __/ /__/ /_/ / /_/ / / / /
/___/_/ /_/_/ \___/\___/\__/_/\____/_/ /_/
Running initial test suite...
PHPUnit version: 6.5.13
22 [============================] 3 secs
Generate mutants...
Processing source code files: 5/5
Creating mutated files and processes: 41/41
.: killed, M: escaped, S: uncovered, E: fatal error, T: timed out
......................................... (41 / 41)
41 mutations were generated:
41 mutants were killed
0 mutants were not covered by tests
0 covered mutants were not detected
0 errors were encountered
0 time outs were encountered
Metrics:
Mutation Score Indicator (MSI): 100%
Mutation Code Coverage: 100%
Covered Code MSI: 100%
Please note that some mutants will inevitably be harmless (i.e. false positives).
Time: 19s. Memory: 12.00MB

Code coverage only tells you what lines of code are actually executed - it doesn’t tell you much about how effectively that line of code is tested. Infection gives you a different insight into the quality of your tests, helping to write better ones. I’ve so far found it very useful for getting feedback on the quality of my tests. It’s interesting that PHPSpec tests seem to have a consistently lower proportion of escaped mutants than PHPUnit ones - perhaps the more natural workflow when writing specs with PHPSpec makes it easier to write good tests.

How I'm refactoring a Zend 1 legacy project

$
0
0

In my current job I’ve been maintaining and developing a Zend 1 legacy project for the best part of a year. It has to be said, it’s the worst code base I have ever seen, with textbook examples of many antipatterns, spaghetti jQuery, copy-pasted code and overly complex methods. It’s a fairly typical example of a project built on an older MVC framework by inexperienced developers (I’ve been responsible for building similar things in my CodeIgniter days).

In this article I’ll go through some of the steps I’ve taken to help bring this legacy project under control. Not all of them are complete as at time of writing, but they’ve all helped to make this decidedly crappy project somewhat better. In working with this legacy project, I’ve found Paul Jones’ book Modernizing Legacy Applications in PHP to be very useful, and if you’re working on a similar legacy project, I highly recommend investing in a copy. I’ve also found Sourcemaking to be a useful resource in identifying antipatterns in use, refactoring strategies, and applicable design patterns.

Moving to Git

When I first started working on the project, the repository was in Subversion, and was absolutely colossal - checking it out took two hours! Needless to say, my first action was to migrate it to Git. I used this post as a guide, and it was pretty straightforward, but took all of my first day.

Adding migrations

The next job involved making some changes to the database. Unfortunately, Zend 1 doesn’t include migrations, and no-one had added a third party solution. I therefore did some research and wound up stumbling across Phinx, which is a standalone migration package with a command-line runner. Using that, it was straightforward to start adding migrations to make any necessary changes to the database structure and fixtures.

Moving dependencies to Composer

The project was using Composer, but only to a limited degree - the framework itself was in the library/ folder, and several other dependencies were also stored here. The vendor/ directory was also checked into version control. I therefore took the vendor folder out of Git, and added zendframework/zendframework1 as a dependency. This drastically reduced the size of the repository.

Cleaning up commented code

There was an awful lot of commented code. Some of it was even commented out incorrectly (PHP code commented out with HTML comments). I’m of the school of thought that commented code is best deleted without a second thought, since it can be retrieved from version control, and it can be confusing, so I’ve been removing any commented code I come across.

Refactoring duplicate code

One of the biggest problems with the code base was the high level of duplication - a lot of code, particularly in the view layer, had been copied and pasted around. Running PHPCPD on the repository showed that, not including the views, around 12% of the code base was copied-and-pasted, which is a horrific amount. I therefore started aggressively refactoring duplicate code out into helpers and traits. As at today, the amount of duplication excluding the views is around 2.6%, which is obviously a big improvement.

Refactoring object creation code into persisters

There was some extremely complex code for creating and updating various objects that was jammed into the controllers, and involved a lot of duplicate code. I’ve used dedicated persister classes in the past with great effect, so I pulled that code out into persisters to centralise the logic about the creation of different objects. It’s still a lot more convoluted than I’d like, but at least now it’s out of the controllers and can be tested to some extent.

Creating repositories

One of the most problematic parts of the code base is the models. Whoever was responsible for them couldn’t seem to decide whether they represented a single domain object, or a container for methods for getting those objects, so both responsibilities were mixed up in the same class. This means you had to instantiate an object, then use it to call one of the methods to get another instance of that object, as in this example:

$media = new Application_Model_Media;
$media = $media->find(1);

I’ve therefore resolved to pull those methods out into separate repository classes, leaving the models as pure domain objects. Unfortunately, the lack of dependency injection makes it problematic to instantiate the repositories. For that reason, right now the repositories only implement static methods - it’s not ideal, but it’s better than what we have now.

I started out by creating interfaces for the methods I wanted to migrate, and had the models implement them. Then, I moved those methods from the model to the repository classes and amended all references to them, before removing the interfaces from the models. Now, a typical find request looks like this:

$media = App\Repository\Media::find(1);

It’s not done yet, but over half of them have been migrated.

Once that’s done, I’ll then be in a position to look at refactoring the logic in the models to make them easier to work with - right now each model has dedicated setters and getters (as well as some horrific logic to populate them), and I’m considering amending them to allow access to the properties via the __get() and __set() magic methods. Another option is to consider migrating the database layer to Doctrine, since that way we can reuse the getters and setters, but I haven’t yet made a firm decision about that.

Adding tests

The poor design of this application makes it difficult to test, so right now the coverage is poor. I’ve been using Behat to produce a basic set of acceptance tests for some of the most fundamental functionality, but they’re brittle and can be broken by database changes. I’ve also added some (even more brittle) golden master tests using a technique I’ll mention in a later blog post. I have got unit tests for three of the persister classes and some utility classes I’ve added, but nowhere near the level I want.

Refactoring code out of the fat controllers

Fat controllers are an antipattern I’ve seen, and indeed been responsible for myself, in the past, and this project has them in spades - running PHP Mess Detector on them is pretty sobering. The overwhelming majority of the code base is concentrated in the controllers, and it’s going to take a long time to refactor it into other classes.

Zend 1 does have the concept of controller helpers, and that’s been useful for removing some duplicate code, while more shared coded has been refactored out into traits. In addition, the utilities I’ve added include a Laravel-style collection class, and using that I’ve been able to refactor a lot of quite complex array handling into much simpler chained collection handling. However, this is still going to take a lot of effort.

Adding events

The lack of a decent event system caused particular problems when I was asked to add tracking of when a user views certain resources, so I used the PHP League’s Event package for this. I’ve started moving some other functionality to event listeners too, but this is another thing that will take a long time.

Refactoring the front end

Like many legacy projects, the front end is a horrible mess of jQuery spaghetti code, with some Handlebars templates thrown in here and there for good measure. It’s easily complex enough that it would benefit from a proper front-end framework, but a full rewrite is out of the question.

I was recently asked to add two new modals in the admin interface, and decided that it was worth taking a new approach rather than adding yet more jQuery spaghetti. Angular 1 is on its way out, so that wasn’t an option, and Angular 2+ would necessitate using Typescript, which would likely be problematic in the context of a legacy app, as well as the complexity being an issue. Vue was a possibility, but I always feel like Vue tries to do too much. Instead, I decided to go for React, because:

  • I’ve always enjoyed working with React, even though I haven’t had much chance to do so in the past.
  • We’re using Laravel Mix for processing the CSS and JS files (it can be used on non-Laravel projects), and it has a preset for React
  • React is well-suited to being added incrementally to existing projects without the need for a full rewrite (after all, it works for Facebook…), so it was straightforward to do a single modal with it
  • It’s easy to test - you can use snapshot tests to check it remains consistent, and using Enzyme it’s straightforward to navigate the rendered component for other tests

Both modals turned out very well, and went live recently. The first one took a fair while to write, and then when I wrote the second one, I had to spend some time making the sub-components more generic and pulling some functionality out into a higher order component, but now that that’s done it should be straightforward to write more.

In the longer term I plan to migrate more and more of the admin to React over time. The front end also has a new home page on the cards, and the plan is to use React for that too. Once the whole UI is using React, that will have eliminated most, if not all, of the problems with duplicate code in the view layer, as well as allowing for eventually turning the application into a single-page web app.

Upgrading the PHP version and migrating to a new server

When I started work on the project, it was running on an old server running PHP 5.4, but there were plans to migrate to a new server running PHP 5.6. The lack of tests made it difficult to verify it wouldn’t break in 5.6, but using PHP Compatibility and CodeSniffer I was able to find most of the problems. I ran it on PHP 5.6 locally during development so that any new development would be done on a more modern version. In the end, the migration to the new server was fairly seamless.

We will have to consider migrating to a newer PHP version again, since 5.6 is no longer supported as at the end of this year, but it may be too risky for now.

Namespacing the code

As Zend 1 predates PHP namespaces, the code wasn’t namespaced. This is something I do plan to remedy - the form and model classes should be straightforward to namespace, but the controllers are a bit more problematic. I’m waiting on completing the repositories before I look at this.

Adding PSR-3 logging

The existing logging solution was not all that great. It had drivers for several different logging solutions, but nothing terribly modern - one was for the now-discontinued Firebug extension for Firefox. However, it was fairly similar to PSR-3, so it wasn’t too much work to replace it. I installed Monolog, and amended the bootstrap file to store that as the logger in the Zend registry - that way, we could set up many different handlers. I now have it logging to a dedicated Slack channel when an error occurs in staging or production, which makes it much easier to detect problems. This would also make it easy to set up many other logging handlers, such as the ELK stack.

Debugging

Clockwork is my usual PHP debugging solution, and the absence of support for it in Zend 1 made it difficult to work with. Fortunately, it’s quite straightforward to implement your own data sources for Clockwork. I set it up to use the aforementioned logger as a data source, as well as the Zend 1 profiler. I also added a data source for the events implementation, and added a global clock() helper function, as well as one for the Symfony VarDumper component. Together these give me a reasonably good debugging experience.

Adding console commands

I’ve mentioned before that I’ve been using Symfony’s console component a lot lately, and this project is why. Zend 1 does not come with any sort of console task runner, and we needed an easy way to carry out certain tasks, such as:

  • Setting up a stored procedure
  • Anonymizing user data with Faker
  • Regenerating durations for audio and video files

In addition, I wanted a Laravel Tinker-style interactive shell. I was able to accomplish this with PsySh and the console components. For legacy projects that lack a console task runner, it’s worth considering adding one.

Configuration

The configuration system in Zend 1 is downright painful - it requires that you define multiple environments in there. I have integrated DotEnv, but only part of the configuration has been migrated over, so there’s still plenty of work there.

What’s left to do

The code base is in a much better state than it was, but there’s still an awful lot to do. Zend 1 does apparently still work with PHP 7.1, but not with 7.2, so at some point we’ll likely need to leave Zend 1 behind entirely. This process has already started with us ditching Zend_Log for Monolog, and over time I plan to replace the various components of Zend 1 with other packages, either ones from newer versions of Zend Framework, or elsewhere. While there are many articles about migrating Zend 1 to later versions, very few of them actually seem to go into much detail - certainly nothing as useful as a step-by-step guide.

The database layer is particularly bad, and refactoring some of the methods into repository classes is only the first step in bringing that under control. Once that’s finished, I’m going to start going through the models and seeing if any more methods would make more sense as static methods on the repository, and possibly rename some of them. Then, we can think about the possibility of either incrementally migrating to another database interface (either a newer version of Zend DB, or Doctrine), or refactoring the existing models to have less boilerplate by using magic methods instead of getters and setters.

Dependency injection is a must at some point, but isn’t practical right now - Zend 1 controllers implement an interface that defines the constructor arguments, so you can’t pass in any additional parameters, so that will need to wait until the controllers no longer use Zend 1. I have been using the Zend Registry as a poor man’s DI container, since it allows sharing of a single object throughout the application, but it’s not a good solution in the long term.

The routing is also painful - Zend 1’s routes are all stored in the bootstrap file. I’d prefer to use something like league/route, which would allow for handling different HTTP methods to the same route using different controller methods, making it easier to separate out handling of GET and POST requests.

I also want at some point to set up a queue system for processing video and audio content - at present it’s handled by running a shell command from PHP, which means you can’t easily get feedback if something goes wrong. Migrating that to a queue system, backed with something like Redis, would help a great deal.

Share your stories

I’d love to hear any similar stories about refactoring legacy applications - how you’ve solved various problems with those legacy apps (or how you’d solve the ones I’ve had), tools you’ve used, and so on. Feel free to provide details in the comments.

A legacy project like this can be very frustrating to work with, but it can also feel quite rewarding to bring it under control over a period of time. My experience has been that you get the best results by working in small, regular steps, and over time your experience working with the code base will improve.


Career direction after seven years

$
0
0

Earlier this month, I passed the seven year anniversary of starting my first web dev job. That job never really worked out, for various reasons, but since then I’ve had an interesting time of it. I’ve diversified into app development via Phonegap, and I’ve worked with frameworks that didn’t exist when I first started. So it seems a good opportunity to take stock and think about where I want to head next.

Sometimes these posts are where someone announces they’re leaving their current role, but that’s not the case here - I’m pretty happy where I am right now. I am maintaining a legacy project, but I do feel like I’m making a difference and it’s slowly becoming more pleasant to work with, and I’m learning a lot about applying design patterns, so I think where I am right now is a good place for me. However, it’s a useful exercise to think about what I want to do, where I want to concentrate my efforts, and what I want to learn about.

So, here are my thoughts about where I want to go in future:

  • I really enjoy working with React, and I want to do so much more than I have in the past, possibly including React Native. Ditto with Redux.
  • Much as I love Django, it’s unlikely I’ll be using it again in the future, as it’s simply not in much demand where I live. In 2015, I was working at a small agency with a dev team of three, including me, and it became apparent that we needed to standardise on a single framework. I’d been using CodeIgniter on and off for several years, but it was tired and dated, yet I couldn’t justify using Django because no-one else was familiar with Python, so we settled on Laravel. Ever since, Laravel has been my go-to framework - Django does some things better (Django REST Framework remains the best way I’ve ever found to create a REST API), but Laravel does enough stuff well enough that I can use it for most things I need, so it’s a good default option.
  • I really don’t want to work with Wordpress often, and if I do, I’d feel a lot better about it if I used Bedrock. Just churning out boilerplate sites is anathema to me - I’d much rather do something more interesting, even if it were paid worse.
  • PHP is actually pretty nice these days (as long as you’re not dealing with a legacy application), and I generally don’t mind working with it, as long as it’s fairly modern.
  • I enjoy mentoring and coaching others, and I’d like to do that a lot more often than I have been doing. Mentoring and coaching is a big part of being a senior developer, since a good mentor can quickly bring inexperienced developers up to a much higher standard, and hugely reduces the amount of horrible legacy code that needs to be maintained. I was without an experienced mentor for much of my career, and in retrospect it held me back - having someone around to teach me about TDD and design patterns earlier would have helped no end. Also, I find it the single most rewarding part of my job.
  • I have absolutely no desire whatsoever to go into management, or leave coding behind in any way, shape or form. I’ve heard it said before that Microsoft have two separate career tracks for developers, one through people management, the other into a software architect role, and were I there, I would definitely opt for the latter.
  • I’m now less interested in learning new frameworks or languages than I am in picking up and applying new design patterns, and avoiding antipatterns - they’re the best way to improve your code quality. I’ve learned the hard way that the hallmark of a skilled developer’s code is not the complexity, but the simplicity - I can now recognise the convoluted code I wrote earlier in my career as painful to maintain, and can identify it in legacy projects.
  • I’ve always used linters and other code quality tools, and I’m eager to evangelise their usage.
  • I’ve been a proponent of TDD for several years now, and that’s going to continue - I’ve not only seen how many things it catches when you have tests, but also how painful it is when you have a large legacy project with no tests at all, and I’m absolutely staggered that anyone ever continues to write non-trivial production code without any sort of tests.
  • I want to understand the frameworks I use at a deeper level - it’s all too easy to just treat them as magic, when there are huge benefits to understanding how your framework works under the bonnet, and how to swap out the framework’s functionality for alternative implementations.
  • I’d like to get involved in more IoT-related projects - guess the 3 Raspberry Pi’s and the Arduino I have gathering dust at home need to get some more use…
  • Chat interfaces are interesting - I built an Alexa skill recently, which was fun and useful, and I’d like to do stuff like that more often.

So, after seven years, that’s where I see myself going in future. I think I’m in a good place to do that right now, and I’ll probably stay where I am for a good long while yet. The first seven years of my web dev career have been interesting, and I’m eager to see what the next seven bring.

Replacing switch statements with polymorphism in PHP

$
0
0

For the last few months, I’ve been making a point of picking up on certain antipatterns, and ways to avoid or remove them. One I’ve seen a lot recently is unnecessary large switch-case or if-else statements. For instance, here is a simplified example of one of these, which renders links to different objects:

<?php
switch ($item->getType()) {
case 'audio':
$media = new stdClass;
$media->type = 'audio';
$media->duration = $item->getLength();
$media->name = $item->getName();
$media->url = $item->getUrl();
case 'video':
$media = new stdClass;
$media->type = 'video';
$media->duration = $item->getVideoLength();
$media->name = $item->getTitle();
$media->url = $item->getUrl();
}
return '<a href="'.$media->url.'" class="'.$media->type.'" data-duration="'.$media->duration.'">'.$media->name.'</a>';

There are a number of problems with this, most notably the fact that it’s doing a lot of work to try and create a new set of objects that behave consistently. Instead, your objects should be polymorphic - in other words, you should be able to treat the original objects the same.

While strictly speaking you don’t need one, it’s a good idea to create an interface that defines the required methods. That way, you can have those objects implement that interface, and be certain that they have all the required methods:

<?php
namespace App\Contracts;
interface MediaItem
{
public function getLength(): int;
public function getName(): string;
public function getType(): string;
public function getUrl(): string;
}

Then, you need to implement that interface in your objects. It doesn’t matter if the implementations are different, as long as the methods exist. That way, objects can define how they return a particular value, which is simpler and more logical than defining it in a large switch-case statement elsewhere. It also helps to prevent duplication. Here’s what the audio object might look like:

<?php
namespace App\Models;
use App\Contracts\MediaItem;
class Audio implements MediaItem
{
public function getLength(): int
{
return $this->length;
}
public function getName(): string
{
return $this->name;
}
public function getType(): string
{
return $this->type;
}
public function getUrl(): string
{
return $this->url;
}
}

And here’s a similar example of the video object:

<?php
namespace App\Models;
use App\Contracts\MediaItem;
class Video implements MediaItem
{
public function getLength(): int
{
return $this->getVideoLength();
}
public function getName(): string
{
return $this->getTitle();
}
public function getType(): string
{
return $this->type;
}
public function getUrl(): string
{
return $this->url;
}
}

With that done, the code to render the links can be greatly simplified:

<?php
return '<a href="'.$item->getUrl().'" class="'.$item->getType().'" data-duration="'.$item->getLength().'">'.$media->getName().'</a>';

Because we can use the exact same methods and get consistent responses, yet also allow for the different implementations within the objects, this approach allows for much more elegant and readable code. Different objects can be treated in the same way without the need for writing extensive if or switch statements.

I haven’t had the occasion to do so, but in theory this approach is applicable in other languages, such as Javascript or Python (although these languages don’t have the concept of interfaces). Since discovering the swtch statement antipattern and how to replace it with polymorphism, I’ve been able to remove a lot of overly complex code.

Understanding the pipeline pattern

$
0
0

In a previous post, I used the pipeline pattern to demonstrate processing letters using optical recognition and machine learning. The pipeline pattern is something I’ve found very useful in recent months. For a sequential series of tasks, this approach can make your code easier to understand by allowing you to break it up into simple, logical steps which are easy to test and understand individually. If you’re familiar with pipes and redirection in Unix, you’ll be aware of how you can chain together multiple, relatively simple commands to carry out some very complex transformations on data.

A few months back, I was asked to build a webhook for a Facebook lead form at work. One of my colleagues was having to manually export CSV data from Facebook for the data, and then import it into a MySQL database and a Campaign Monitor mailing list, which was an onerous task, so they asked me to look at more automated solutions. I wound up building a webhook with Lumen that would go through the following steps:

  • Get the lead ID’s from the webhook
  • Pull the leads from the Facebook API using those ID’s
  • Process the raw data into a more suitable format
  • Save the data to the database
  • Push the data to Campaign Monitor

Since this involved a number of discrete steps, I chose to implement each step as a separate stage. That way, each step was easy to test in isolation, and it was easily reusable. As it turned out, this approach saved us because Facebook needed to approve this app (and ended up rejecting it - their documentation at the time wasn’t clear on implementing server-to-server apps, making it hard to meet their guidelines), so we needed an interim solution. I instead wrote an Artisan task for importing the file from a CSV, which involved the following steps:

  • Read the rows from the CSV file
  • Format the CSV data into the desired format
  • Save the data to the database
  • Push the data to Campaign Monitor

This meant that two of the existing steps could be reused, as is, without touching the code or tests. I just added two new classes to read the data and format the data, and the Artisan command, which simply called the various pipeline stages, and that was all. In this post, I’ll demonstrate how I implemented this.

While there is more than one implementation of this available, and it wouldn’t be hard to roll your own, I generally use the PHP League’s Pipeline package, since it’s simple, solid and well-tested. Let’s say our application has three steps:

  • Format the request data
  • Save the data
  • Push it to a third party service.

We therefor need to write a stage for each step in the process. Each one must be a callable, such as a closure, a callback, or a class that implements the __invoke() magic method. I usually go for the latter as it allows you to more easily inject dependencies into the stage via its constructor, making it easier to use and test. Here’s what our first stage might look like:

<?php
namespace App\Stages;
use Illuminate\Support\Collection;
class FormatData
{
public function __invoke(Collection $data): Collection
{
return $data->map(function ($item) {
return [
'name' => $item->fullname,
'email' => $item->email
];
});
}
}

This class does nothing more than receive a collection, and format the data as expected. We could have it accept a request object instead, but I opted not to because I felt it made more sense to pass the data in as a collection so it’s not tied to an HTTP request. That way, it can also handle data passed through from a CSV file using an Artisan task, and the details of how it receives the data in the first place are deferred to the class that calls the pipeline in the first place. Note this stage also returns a collection, for handling by the next step:

<?php
namespace App\Stages;
use App\Lead;
use Illuminate\Support\Collection;
class SaveData
{
public function __invoke(Collection $data): Collection
{
return $data->map(function ($item) {
$lead = new Lead;
$lead->name = $item->name;
$lead->email = $item->email;
$lead->save();
return $lead;
}
}
}

This step saves each lead as an Eloquent model, and returns a collection of the saved models, which are passed to the final step:

<?php
namespace App\Stages;
use App\Contracts\Services\MailingList;
use Illuminate\Support\Collection;
class AddDataToList
{
protected $list;
public function __construct(MailingList $list)
{
$this->list = $list;
}
public function __invoke(Collection $data)
{
return $data->each(function ($item) {
$this->list->add([
'name' => $item->name,
'email' => $item->email
]);
});
}
}

This step uses a wrapper class for a mailing service, which is passed through as a dependency in the constructor. The __invoke() method then loops through each Eloquent model and uses it to fetch the data, which is then added to the list. With our stages complete, we can now put them together in our controller:

<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use App\Stages\FormatData;
use App\Stages\SaveData;
use App\Stages\AddDataToList;
use League\Pipeline\Pipeline;
use Illuminate\Support\Collection;
class WebhookController extends Controller
{
public function store(Request $request, Pipeline $pipeline, FormatData $formatData, SaveData $savedata, AddDataToList $addData)
{
try {
$data = Collection::make($request->get('data'));
$pipe = $pipeline->pipe($formatData)
->pipe($saveData)
->pipe($addData);
$pipe->process($data);
} catch (\Exception $e) {
// Handle exception
}
}
}

As mentioned above, we extract the request data (assumed to be an array of data for a webhook), and convert it into a collection. Then, we put together our pipeline. Note that we use dependency injection to fetch the steps - feel free to use method or constructor injection as appropriate. We instantiate our pipeline, and call the pipe() method multiple times to add new stages.

Finally we pass the data through to our pipe for processing by calling the process() method, passing in the initial data. Note that we can wrap the whole thing in a try...catch statement to handle exceptions, so if something happens that would mean we would want to cease processing at that point, we can throw an exception in the stage and handle it outside the pipeline.

This means that our controller is kept very simple. It just gets the data as a collection, then puts the pipeline together and passes the data through. If we subsequently had to write an Artisan task to do something similar from the command line, we could fetch the data via a CSV reader class, and then pass it to the same pipeline. If we needed to change the format of the initial data, we could replace the FormatData class with a single separate class with very little trouble.

Another thing you can do with the League pipeline package, but I haven’t yet had the occasion to try, is use League\Pipeline\PipelineBuilder to build pipelines in a more dynamic fashion. You can make steps conditional, as in this example:

<?php
use League\Pipeline\PipelineBuilder;
$builder = (new PipelineBuilder)
->add(new FormatData);
if ($data['type'] = 'foo') {
$builder->add(new HandleFooType);
}
$builder->add(new SaveData);
$pipeline = $builder->build();
$pipeline->process($data);

The pipeline pattern isn’t appropriate for every situation, but for anything that involves a set of operations on the same data, it makes a lot of sense, and can make it easy to break larger operations into smaller steps that are easier to understand, test, and re-use.

An approach to writing golden master tests for PHP web applications

$
0
0

Apologies if some of the spelling or formatting on this post is off - I wrote it on a long train journey down to London, with sunlight at an inconvenient angle.

Recently I had to carry out some substantial changes to the legacy web app I maintain as the lion’s share of my current job. The client has several channels that represent different parts of the business that would expect to see different content on the home page, and access to content is limited first by channel, and then by location. The client wanted an additional channel added. Due to bad design earlier in the application’s lifetime that isn’t yet practical to refactor away, each type of location has its own model, so it was necessary to add a new location model. It also had to work seamlessly, in the same way as the other location types. Unfortunately, these branch types didn’t use polymorphism, and instead used large switch statements, and it wasn’t practical to refactor all that away in one go. This was therefore quite a high-risk job, especially considering the paucity of tests on a legacy code base.

I’d heard of the concept of a golden master test before. If you haven’t heard of it before, the idea is that it works by running a process, capturing the output, and then comparing the output of that known good version against future runs. It’s very much a test of last resort since, in the context of a web app, it’s potentially very brittle since it depends on the state of the application remaining the same between runs to avoid false positives. I needed a set of simple “snapshot tests”, similar to how snapshot testing works with Jest, to catch unexpected breakages in a large number of pages, and this approach seemed to fit the bill. Unfortunately, I hadn’t been able to find a good example of how to do this for PHP applications, so it took a while to figure out something that worked.

Here is an example base test case I used for this approach:

<?php
namespace Tests;
use PHPUnit_Framework_TestCase as BaseTestCase;
use Behat\Mink\Driver\GoutteDriver;
use Behat\Mink\Session;
class GoldenMasterTestCase extends BaseTestCase
{
protected $driver;
protected $session;
protected $baseUrl = 'http://localhost:8000';
protected $snapshotDir = "tests/snapshots/";
public function setUp()
{
$this->driver = new GoutteDriver();
$this->session = new Session($this->driver);
}
public function tearDown()
{
$this->session = null;
$this->driver = null;
}
public function loginAs($username, $password)
{
$this->session->visit($this->baseUrl.'/login');
$page = $this->session->getPage();
$page->fillField("username", $username);
$page->fillField("password", $password);
$page->pressButton("Sign In");
return $this;
}
public function goto($path)
{
$this->session->visit($this->baseUrl.$path);
$this->assertNotEquals(404, $this->session->getStatusCode());
return $this;
}
public function saveHtml()
{
if (!$this->snapshotExists()) {
$this->saveSnapshot();
}
return $this;
}
public function assertSnapshotsMatch()
{
$path = $this->getPath();
$newHtml = $this->processHtml($this->getHtml());
$oldHtml = $this->getOldHtml();
$diff = "";
if (function_exists('xdiff_string_diff')) {
$diff = xdiff_string_diff($oldHtml, $newHtml);
}
$message = "The path $path does not match the snapshot\n$diff";
self::assertThat($newHtml == $oldHtml, self::isTrue(), $message);
}
protected function getHtml()
{
return $this->session->getPage()->getHtml();
}
protected function getPath()
{
$url = $this->session->getCurrentUrl();
$path = parse_url($url, PHP_URL_PATH);
$query = parse_url($url, PHP_URL_QUERY);
$frag = parse_url($url, PHP_URL_FRAGMENT);
return $path.$query.$frag;
}
protected function getEscapedPath()
{
return $this->snapshotDir.str_replace('/', '_', $this->getPath()).'.snap';
}
protected function snapshotExists()
{
return file_exists($this->getEscapedPath());
}
protected function processHtml($html)
{
return preg_replace('/<input type="hidden"[^>]+\>/i', '', $html);
}
protected function saveSnapshot()
{
$html = $this->processHtml($this->getHtml());
file_put_contents($this->getEscapedPath(), $html);
}
protected function getOldHtml()
{
return file_get_contents($this->getEscapedPath());
}
}

Because this application is built with Zend 1 and doesn’t have an easy way to get the HTML response without actually running the application, I was forced to use an actual HTTP client to fetch the content while the web server is running. I’ve used Mink together with Behat many times in the past, and the Goutte driver is fast and doesn’t rely on Javascript, so that was the best bet for a simple way of retrieving the HTML. Had I been taking this approach with a Laravel application, I could have populated the testing database with a common set of fixtures, and passed a request object through the application and captured the response object’s output rather than using an HTTP client, thereby eliminating the need to run a web server and making the tests faster and less brittle.

Another issue was CSRF handling. A CSRF token is, by definition, generated randomly each time the page is loaded, and so it broke those pages that had forms with CSRF tokens. The solution I came up with was to strip out the hidden input fields.

When each page is tested, the first step is to fetch the content of that page. The test case then checks to see if there’s an existing snapshot. If not, the content is saved as a new snapshot file. Otherwise, the two snapshots are compared, and the test fails if they do not match.

Once that base test case was in place, it was then straightforward to extend it to test multiple pages. I wrote one test to check pages that did not require login, and another to check pages that did require login, and the paths for those pages were passed through using a data provider method, as shown below:

<?php
namespace Tests\GoldenMaster;
use Tests\GoldenMasterTestCase;
class GoldenMasterTest extends GoldenMasterTestCase
{
/**
* @dataProvider nonAuthDataProvider
*/
public function testNonAuthPages($data)
{
$this->goto($data)
->saveHtml()
->assertSnapshotsMatch();
}
public function nonAuthDataProvider()
{
return [
['/login'],
];
}
/**
* @dataProvider dataProvider
*/
public function testPages($data)
{
$this->loginAs('foo', 'bar')
->goto($data)
->saveHtml()
->assertSnapshotsMatch();
}
public function dataProvider()
{
return [
['/foo'],
['/bar'],
];
}
}

Be warned, this is not an approach I would advocate as a matter of course, and it should only ever be a last resort as an alternative to onerous manual testing for things that can’t be tested in their current form. It’s extremely brittle, and I’ve had to deal with a lot of false positives, although that would be easier if I could populate a testing database beforehand and use that as the basis of the tests. It’s also very slow, with each test taking three or four seconds to run, although again this would be less of an issue if I could pass through a request object and get the response HTML directly. Nonetheless, I’ve found it to be a useful technique as a test of last resort for legacy applications.

Do you still need jQuery?

$
0
0

There was a time not so long ago when jQuery was ubiquitous. It was used on almost every website as a matter of course, to the point that many HTML boilerplates included a reference to the CDN.

However, more and more I think it’s probably unnecessary for two main use cases:

jQuery is probably unnecessary for many web apps with simple Javascript

When jQuery first appeared, IE6 was commonplace, and browser API’s were notoriously inconsistent. jQuery was very useful in ironing out those inconsistencies and helping to make

Nowadays, that’s no longer the case. Internet Explorer is on its way out, with IE11 being the only version still supported by Microsoft, and it’s becoming increasingly hard to justify support for older versions, especially with mobile browsers forming a bigger than ever chunk of the market. We’ll probably need to continue supporting IE11 for a good long while, and possibly IE10 for some time too, but these aren’t anything like as bad to work with as IE6. It’s worth noting that newer versions of jQuery are also dropping support for these older browsers, so in many ways it actually does less than it used to.

This is the usual thrust of articles on whether you should still be using jQuery so I’m not going to go over this matter , but for many smaller web apps, jQuery is no longer necessary, and a lot of developers have a tendency to keep using it when it’s probably not required.

jQuery is insufficient for web apps with complex Javascript

Nowadays, there’s a lot of web applications that have moved big chunks of functionality from the server side to the client side. Beyond a certain (and quite small) level of complexity, jQuery just doesn’t do enough to cut it. For me personally, the nature of the projects I work on means that this is a far, far bigger issue than the first one.

I used to work predominantly with Phonegap, which meant that a lot of functionality traditionally done on the server side had to be moved to the client side, and for that jQuery was never sufficient. My first Phonegap app started out using jQuery, but it quickly became obvious that it was going to be problematic. It wound up as a huge mass of jQuery callbacks and Handlebars templates, which was almost impossible to test and hard to maintain. Given this experience, I resolved to switch to a full-fledged Javascript framework next time I built a mobile app, and for the next one I chose Backbone.js, which still used jQuery as a dependency, but made things more maintainable by giving a structure that it didn’t have before, which was the crucial difference.

The more modern generation of Javascript frameworks such as Vue and React, go further in making jQuery redundant. Both of these implement a so-called Virtual DOM, which is used to calculate the minimum changes required to re-render the element in question. Subsequently using jQuery to mutate the DOM would cause problems because it would get out of sync with the Virtual DOM - in fact, in order to get a jQuery plugin working in the context of a React component, you have to actively prevent React from touching the DOM, thereby losing most of the benefits of using React in the first place. You usually see better results from using a React component designed for that purpose (or writing one, which React makes surprisingly simple), than from trying to shoehorn a jQuery plugin into it.

They also make a lot of things that jQuery does trivially easy - for instance, if you want to conditionally show and hide content in a React component, it’s just a case of building it to hide that content based on a particular value in the props or state, or filtering a list is just a case of applying a filter to the array containing the data and setting the state as appropriate.

In short, for single-page web apps or other ones with a lot of Javascript, you should look at other solutions first, and not just blithely assume jQuery will be up to the task. It’s technically possible to build this sort of web app using jQuery, but it’s apt to turn into a morass of spaghetti code unless approached with a great deal of discipline, one that sadly many developers don’t have, and it doesn’t exactly make it easy to promote code reuse. These days, I prefer React for complex web apps, because it makes it extremely intuitive to break my user interface up into reusable components, and test them individually. Using React would be overkill on brochure-style sites (unless you wanted to build it with something like Gatsby), but for more complex apps it’s often a better fit than jQuery.

So when should you use jQuery?

In truth, I’m finding it harder and harder to justify using it at all on new builds. I use it on my personal site because that’s built on Bootstrap 3 and so depends on jQuery, but for bigger web apps I’m generally finding myself moving to React, which renders it not just unnecessary for DOM manipulation, but counter-productive to use it. Most of what I do is big enough to justify something like React, and it generally results in code that is more declarative, easier to test and reason about, and less repetitive. Using jQuery for an application like this is probably a bad idea, because it’s difficult (not impossible, mind, if you follow some of the advice here, use a linter and consider using a proper client-side templating system alongside jQuery) to build an elegant and maintainable Javascript-heavy application.

As a rule of thumb, I find anything which is likely to require more than a few hundred lines of Javascript to be written, is probably complex enough that jQuery isn’t sufficient, and I should instead consider something like React.

I doubt it’d be worth the bother of ripping jQuery out of a legacy application and rewriting the whole thing to not require it, but for new builds I would think very hard about:

  • Whether jQuery is sufficient, or you’d be better off using something like React, Vue or Angular
  • If it is sufficient, whether it’s actually necessary

In all honesty, I don’t think using it when it’s technically not necessary is as much of a big deal as the issue of using it when it’s not really sufficient. Yes, dowloading a library you technically don’t need for a page is a bad practice, and it does make your site slower and harder for users on slow mobile connections, but there are ways to mitigate that such as CDN’s, caching and minification. If you build a web app using jQuery alone when React, Vue or Angular would be more suitable, you’re probably going to have to write a lot more code that will be difficult to maintain, test and understand. Things like React were created to solve the problems that arose when developers built complex client-side applications with jQuery, and are therefore a good fit for bigger applications. The complex setup does mean they have a threshold below which it’s not worth the bother of using them, but past that threshold they result in better, more maintainable, more testable and more reusable code.

Now React is cool, you hate jQuery, you hipster…

Don’t be a prat. Bitter experience has taught me that for a lot of my own personal use cases, jQuery is insufficient. It doesn’t suck, it’s just insufficient. If for your use case, jQuery is sufficient, then that’s fine. All I’m saying is that when a web app becomes sufficiently complex, jQuery can begin to cause more problems than it solves, and that for a sufficiently complex web app you should consider other solutions.

I currently maintain a legacy application that includes thousands of lines of Javascript. Most of it is done with jQuery and some plugins, and it’s resulted in some extremely repetitive jQuery callbacks that are hard to maintain and understand, and impossible to test. Recently I was asked to add a couple of modals to the admin interface, and rather than continuing to add them using jQuery and adding more spaghetti code, I instead opted to build them with React. During the process of building the first modal, I produced a number of components for different elements of the UI. Then, when I built the second one, I refactored those components to be more generic, and moved some common functionality into a higher-order component so that it could be reused. Now, if I need to add another modal, it will be trivial because I already have those components available, and I can just create a new component for the modal, import those components that I need, wrap it in the higher-order component if necessary, and that’s all. I can also easily test those components in isolation. In short, I’ve saved myself some work in the long run by writing it to use a library that was a better fit.

It’s not like using jQuery inevitably results in unmaintainable code, but it does require a certain amount of discipline to avoid it. A more opinionated library such as React makes it far, far harder to create spaghetti code, and makes code reuse natural in a way that jQuery doesn’t.

Viewing all 158 articles
Browse latest View live