Requirements for `account number` generator:
- Issue pseudo random consistent number (must be unique for dozen millions of records)
- Easy check validity (without a need to make a database call)
We will use
Feistel cipher to generate pseudo random number (positive only) from a sequential numbers (e.g. returned by
nextval() for a posgresql
sequence). This algorithm is taken as basis for the `make_feistel_number` function available in
wheezy.core package. Setup environment before proceed:
$ virtualenv env
$ env/bin/easy_install wheezy.core
$ env/bin/python
Let play a bit how numbers are generated (notice, the function is
reversal and consistent, all numbers are
unique, no collision):
>>> from wheezy.core.feistel import make_feistel_number
>>> from wheezy.core.feistel import sample_f
>>> feistel_number = make_feistel_number(sample_f)
>>> feistel_number(1)
573852158
>>> feistel_number(2)
1788827948
>>> feistel_number(1788827948)
2
We will use
Luhn algorithm to generate a single digit checksum. This algorithm is taken as basis for the module available in
wheezy.core package.
>>> from wheezy.core.luhn import luhn_checksum
>>> luhn_checksum(123456789)
7
There are two more useful functions to sign a number and also check it validity:
>>> from wheezy.core.luhn import luhn_sign
>>> from wheezy.core.luhn import is_luhn_valid
>>> luhn_sign(123456789)
1234567897
>>> is_luhn_valid(1234567897)
True
>>> is_luhn_valid(34518893)
False
Now let just make account number looking nice (pad with zeros, prefix with something meaningful, etc):
>>> account_number = lambda n: 'Z%011d' % luhn_sign( \
... feistel_number(n))
>>> account_number(1)
'Z05738521581'
>>> account_number(2)
'Z17888279480'
>>> account_number(3)
'Z07395350007'
Per discussion in python mail
list, there was discovered alternative, human readable representation of the same number:
>>> from base64 import b32encode
>>> human_format = lambda n: 'Z%s-%s' % (b32encode( \
... chr((n >> 24) & 255) + \
... chr((n >> 16) & 255))[:4], \
... b32encode(chr((n >> 8) & 255) + \
... chr(n & 255))[:4])
>>> human_format(5738521581)
'ZKYFA-4PWQ'
>>> human_format(17888279480)
'ZFI4Q-PO4A'
>>> human_format(7395350007)
'ZXDGA-CX3Q'
Side by side:
Z05738521581 = ZKYFA-4PWQ
Z17888279480 = ZFI4Q-PO4A
Z07395350007 = ZXDGA-CX3Q
Yes, it optimized for speed, you must provide your own `sample_f` implementation for security reasons. Source code available
here.
Nice.
ReplyDeletePlease consider promoting pip instead of easy_install.
How does one make a sample_f? what is sample_f expected to output?
ReplyDeleteLook at source code for sample_f definition as an example.
DeleteSeriously, it would have made more sense if you explained what sample_f is rather than direct us to non-documented code.
Deletesee sample_f.
Delete