?

Log in

fanf

UK postcode regular expression

« previous entry | next entry »
27th Aug 2008 | 17:43

A little contribution for anyone else who searches the web for this in the future.

The UK postcode consists of two parts. The first part is the Outward Postcode, or Outcode. This is separated by a single space from the second part which is the Inward Postcode, or Incode. The Outcode directs mail to the correct local area for delivery. The Incode is used to sort the mail at the local area delivery office.

The Outcode has 6 possible formats (as follows) and the Incode is consistently numeric, alpha, alpha format.

  • AN NAA
  • ANN NAA
  • ANA NAA
  • AAN NAA
  • AANN NAA
  • AANA NAA

There are some restrictions on the letters:

  1. The letters [QVX] are not used in the first position.
  2. The letters [IJZ] are not used in the second position.
  3. The only letters to appear in the third position are [ABCDEFGHJKSTUW].
  4. The only letters to appear in the fourth position are [ABEHMNPRVWXY].
  5. The letters [CIKMOV] are not used in the second part.

This translates into a perl extended regex as follows (with slightly relaxed whitespace):

qr{\b
    ([A-PR-UWYZ]\d[\dA-HJKSTUW]? # rules 1,3
    |[A-PR-UWYZ][A-HK-Y]\d[\dABEHMNPRVWXY]? # rules 1,2,4
    )[\t ]{1,2}
    (\d[ABD-HJLNP-UW-Z]{2}) # rule 5
\b}x

Update: more here.

| Leave a comment | Share

Comments {15}

Mike

from: knell
date: 27th Aug 2008 18:45 (UTC)

There are a couple of odd exceptions too - you missed out the special postcodes for St Helena and dependencies (STHL 1ZZ, TDCU 1ZZ, etc). Other overseas territories have these too.

Reply | Thread

Mike

from: knell
date: 27th Aug 2008 19:43 (UTC)

.. though it should be noted that not even the Royal Mail's own postcode -> address conversion page doesn't pick these up.

Reply | Parent | Thread

Mike

from: knell
date: 27th Aug 2008 19:45 (UTC)

... and that I dropped in a confusing double negative there.

Reply | Parent | Thread

from: hsenag
date: 27th Aug 2008 19:14 (UTC)

Also I believe that the N/NN in the Outcode must be >=1, except for CR, which was the pilot area for postcodes and thus has a 0 for legacy reasons. I expect it's not worth trying to add this to the regex :-)

Reply | Thread

Peter Maydell

from: pm215
date: 27th Aug 2008 21:02 (UTC)

The spec you reference notes at least one exception your regex won't handle: GIR 0AA

Reply | Thread

The Bellinghman

from: bellinghman
date: 27th Aug 2008 23:11 (UTC)

Ah, rule 3 broken.

Reply | Parent | Thread

Peter Maydell

from: pm215
date: 27th Aug 2008 23:14 (UTC)

Also breaks rule 0 in that AAA isn't one of the standard outcode forms.

Reply | Parent | Thread

The Bellinghman

from: bellinghman
date: 27th Aug 2008 23:31 (UTC)

That's an 'I' for India, not a '1'?

Where is that, then?

Reply | Parent | Thread

Bridget

from: bugshaw
date: 27th Aug 2008 23:35 (UTC)

It's a rare non-geographic postcode. [EDIT: for Girobank as was, in Bootle]
The '0AA' is zero-A-A so does fit the NAA scheme.



Edited at 2008-08-28 12:08 am (UTC)

Reply | Parent | Thread

Andrew

from: nonameyet
date: 28th Aug 2008 03:58 (UTC)

And also says:

These conventions may change in the future if operationally required.

So these aren't rules that Royal Mail follow, just a description of the current state.

Reply | Parent | Thread

Pete

from: pjc50
date: 28th Aug 2008 08:31 (UTC)

Did you ever see http://bitter.ukcod.org.uk/~chris/postcodeine/ ?

Reply | Thread

Sion

from: sion_a
date: 28th Aug 2008 09:10 (UTC)

separated by a single space

If you're only handling full postcodes, since the incode is always NAA, the space is an optional convention. If, on the other hand, your software is expected to understand town-only, town+sector (outcode) and town+sector+district (? it's a couple of years since I was doing this—I can't remember all the terminology) as well, it's essential to distinguish AANN from AAN N.

Reply | Thread

from: kaet
date: 28th Aug 2008 17:45 (UTC)

I used to be PE19 2SW, which was sorted at Bedford. Lots of our mail was delayed, because people would write PE1 92SW, which is sorted at Peterborough. Then Peterborough would scribble PE19 on the envelope and send it to Bedford. So, there's a fair few people in the post-office who don't know that there aren't two-digit incodes (and me, until now).

Reply | Parent | Thread

Jonny

from: jdc39
date: 28th Aug 2008 12:54 (UTC)

I heard that part of these restrictions was for letter recognition by computers. Apparently there was some point a number of years ago where they changed some postcodes to avoid misunderstandings with a computer reader. The jist was that it was if one letter could be mis-read (as a letter or number) with a reasonable likelihood then the one next to it would be identified correctly (with a high likelihood) so that the deduction was almost unique across all people's handwriting.

However this was only chat in a bar from a coursemate, so I don't know how true it is.

Reply | Thread

Andrew

from: nonameyet
date: 28th Aug 2008 19:47 (UTC)

They certainly do letter recognition by computer.
I remember being impressed, a decade ago, to hear that they
computer sort letters by post code whilst the letters are moving at 40mph !

Reply | Parent | Thread