Tony Finch - UK postcode regular expression

dotatfanf wrote
on 27th August 2008 at 17:43
Previous Entry Share Next Entry

UK postcode regular expression

A little contribution for anyone else who searches the web for this in the future.

The UK postcode consists of two parts. The first part is the Outward Postcode, or Outcode. This is separated by a single space from the second part which is the Inward Postcode, or Incode. The Outcode directs mail to the correct local area for delivery. The Incode is used to sort the mail at the local area delivery office.

The Outcode has 6 possible formats (as follows) and the Incode is consistently numeric, alpha, alpha format.

There are some restrictions on the letters:

  1. The letters [QVX] are not used in the first position.
  2. The letters [IJZ] are not used in the second position.
  3. The only letters to appear in the third position are [ABCDEFGHJKSTUW].
  4. The only letters to appear in the fourth position are [ABEHMNPRVWXY].
  5. The letters [CIKMOV] are not used in the second part.

This translates into a perl extended regex as follows (with slightly relaxed whitespace):

qr{\b
    ([A-PR-UWYZ]\d[\dA-HJKSTUW]? # rules 1,3
    |[A-PR-UWYZ][A-HK-Y]\d[\dABEHMNPRVWXY]? # rules 1,2,4
    )[\t ]{1,2}
    (\d[ABD-HJLNP-UW-Z]{2}) # rule 5
\b}x

Update: more here.


(Leave a comment)
From:knell
Date:2008-08-27 18:45 (UTC)
(Link)
There are a couple of odd exceptions too - you missed out the special postcodes for St Helena and dependencies (STHL 1ZZ, TDCU 1ZZ, etc). Other overseas territories have these too.
(Reply) (Thread)
From:knell
Date:2008-08-27 19:43 (UTC)
(Link)
.. though it should be noted that not even the Royal Mail's own postcode -> address conversion page doesn't pick these up.
(Reply) (Parent) (Thread)
From:knell
Date:2008-08-27 19:45 (UTC)
(Link)
... and that I dropped in a confusing double negative there.
(Reply) (Parent) (Thread)
From:hsenag
Date:2008-08-27 19:14 (UTC)
(Link)
Also I believe that the N/NN in the Outcode must be >=1, except for CR, which was the pilot area for postcodes and thus has a 0 for legacy reasons. I expect it's not worth trying to add this to the regex :-)
(Reply) (Thread)
From:pm215
Date:2008-08-27 21:02 (UTC)
(Link)
The spec you reference notes at least one exception your regex won't handle: GIR 0AA
(Reply) (Thread)
From:bellinghman
Date:2008-08-27 23:11 (UTC)
(Link)
Ah, rule 3 broken.
(Reply) (Parent) (Thread)
From:pm215
Date:2008-08-27 23:14 (UTC)
(Link)
Also breaks rule 0 in that AAA isn't one of the standard outcode forms.
(Reply) (Parent) (Thread)
From:bellinghman
Date:2008-08-27 23:31 (UTC)
(Link)
That's an 'I' for India, not a '1'?

Where is that, then?
(Reply) (Parent) (Thread)
From:bugshaw
Date:2008-08-27 23:35 (UTC)
(Link)
It's a rare non-geographic postcode. [EDIT: for Girobank as was, in Bootle]
The '0AA' is zero-A-A so does fit the NAA scheme.



Edited at 2008-08-28 00:08 (UTC)
(Reply) (Parent) (Thread)
From:nonameyet
Date:2008-08-28 03:58 (UTC)
(Link)
And also says:

These conventions may change in the future if operationally required.

So these aren't rules that Royal Mail follow, just a description of the current state.
(Reply) (Parent) (Thread)
From:pjc50
Date:2008-08-28 08:31 (UTC)
(Link)
Did you ever see http://bitter.ukcod.org.uk/~chris/postcodeine/ ?
(Reply) (Thread)
From:sion_a
Date:2008-08-28 09:10 (UTC)
(Link)
separated by a single space

If you're only handling full postcodes, since the incode is always NAA, the space is an optional convention. If, on the other hand, your software is expected to understand town-only, town+sector (outcode) and town+sector+district (? it's a couple of years since I was doing this—I can't remember all the terminology) as well, it's essential to distinguish AANN from AAN N.
(Reply) (Thread)
From:kaet
Date:2008-08-28 17:45 (UTC)
(Link)
I used to be PE19 2SW, which was sorted at Bedford. Lots of our mail was delayed, because people would write PE1 92SW, which is sorted at Peterborough. Then Peterborough would scribble PE19 on the envelope and send it to Bedford. So, there's a fair few people in the post-office who don't know that there aren't two-digit incodes (and me, until now).
(Reply) (Parent) (Thread)
From:jdc39
Date:2008-08-28 12:54 (UTC)
(Link)
I heard that part of these restrictions was for letter recognition by computers. Apparently there was some point a number of years ago where they changed some postcodes to avoid misunderstandings with a computer reader. The jist was that it was if one letter could be mis-read (as a letter or number) with a reasonable likelihood then the one next to it would be identified correctly (with a high likelihood) so that the deduction was almost unique across all people's handwriting.

However this was only chat in a bar from a coursemate, so I don't know how true it is.
(Reply) (Thread)
From:nonameyet
Date:2008-08-28 19:47 (UTC)
(Link)
They certainly do letter recognition by computer.
I remember being impressed, a decade ago, to hear that they
computer sort letters by post code whilst the letters are moving at 40mph !
(Reply) (Parent) (Thread)

(Leave a comment)

Powered by LiveJournal.com