?

Log in

No account? Create an account

fanf

More postcodes

« previous entry | next entry »
28th Aug 2008 | 14:48

Thanks for all the interesting comments on my previous post.

The reason I'm investigating this is to work around false positives caused by SpamAssassin's obfuscation rules. These are intended to match deliberate misspellings of commonly spammed goods such as Viagra. The specific instance that caused the bug report was a Reading postcode being treated as an obfuscated Rolex.

Therefore I'm not particularly worried about missing out obscure special cases like GIR 0AA and the overseas territories AAAA 1ZZ. However it might be worth tightening up the outcode regex, based on the list of UK postcode areas, to reduce the chance of matching a bogus postcode.

Also, the Post Office's postcode FAQ mentions that only London uses the ANA and AANA outcode formats. (In fact it's only the E, EC, SW, W, WC areas.) I managed to find a list of postcode districts which includes these outcodes (Wikipedia omits them) and it shows that the third position rule is wrong: it says M does not appear there but there is a poscode district London W1M. Rule Three also allows A and E which are not in fact used.

qr{\b
  ([BGLMNS][1-9][0-9]?
  |[A-PR-UWYZ][A-HK-Y][1-9]?[0-9]
  |([EW]C?|NW?|S[EW])[1-9][0-9A-HJKMNPR-Y]
  )[ ]{0,2}
  ([0-9][ABD-HJLNP-UW-Z]{2})
\b}x

| Leave a comment | Share

Comments {3}

Peter Maydell

from: pm215
date: 28th Aug 2008 14:57 (UTC)

Aha. I was wondering what application would justify being over-strict with postcode formats rather than over-lax...

Reply | Thread

from: anonymous
date: 14th Nov 2008 02:25 (UTC)

CM0 is also a valid, regular postcode, for much of the Burnham-on-Crouch area.

Reply | Thread

Tony Finch

from: fanf
date: 14th Nov 2008 11:37 (UTC)

That's OK because the M appears in position 2, not position 3. The CM0 outcode matches the third line of the regex.

Reply | Parent | Thread