Imagine you have EAN codes in the database. The values either do not have the corresponding check digit or you want to verify that the number you have stored is correct. You do not want to load all the entries from the database first and then utilize another programming language. You also do not have permissions to create a database function to calculate the check digit directly in the database. Instead you want to do this entirely in the database, because you are for example working in an SQL shell - or simply because you can.

## EAN Check Digit calculation

There are different kinds of EAN codes. Some of the prominent ones (at least to me) are EAN-8 and EAN-13.

The number in the name defines the total length of the code, where the last digit is the check digit, therefore an EAN-13 code consists of 12 code digits and 1 check digit.

The check digit calculation is best done with a concrete example.

We'll take the following 12 digits, which are the code digits for an EAN-13 code: `629104150021`

The check digit is calculated by assigning weights to the individual positions. This is done for all kinds of EAN codes by starting at the last code digit and assigning the weight `3`

. The digit before that is assigned weight `1`

. This continues in the same alternating fashion for the remaining digits. A table that visualizes this can be found here.

This means that for our example the weights will be as follows:

index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|

digit | 6 | 2 | 9 | 1 | 0 | 4 | 1 | 5 | 0 | 0 | 2 | 1 |

weight | 1 | 3 | 1 | 3 | 1 | 3 | 1 | 3 | 1 | 3 | 1 | 3 |

Another way of thinking about the weight assignment would be the following:

start with `1`

if the number of digits in the code (including check digit) is uneven, start with `3`

otherwise.

Each digit is then multiplied by its weight and summed up.

index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | sum |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

digit | 6 | 2 | 9 | 1 | 0 | 4 | 1 | 5 | 0 | 0 | 2 | 1 | |

weight | 1 | 3 | 1 | 3 | 1 | 3 | 1 | 3 | 1 | 3 | 1 | 3 | |

product | 6 | 6 | 9 | 3 | 0 | 12 | 1 | 15 | 0 | 0 | 2 | 3 | 57 |

The difference to the nearest equal or higher multiple of ten is then the check digit. This would be `60 - 57 = 3`

.

It can be avoided having take the next multiple of 10 by calculating modulo 10 first on the result first: `57 % 10 = 7`

. If one then substracts the result here from 10, the difference is available: `10 - 7 = 3`

. There is one edge case here, name if the previous number was already 0. In that case the check digit must not be 10. Calculating modulo 10 again, resolves this: `(10 - (<sum> % 10)) % 10`

.

The EAN-13 code with the check digit is `6291041500213`

GS1 provides a check digit calculator on their website, with which this can be confirmed.

## EAN check digit calculation using SQL

For calculating this in SQL, we are going to assume that the EAN code is available as a single integer or text value.

We are furthermore going to assume that we have a table `eans`

with a column `ean`

, which already has the check digit available.

### Individual digits

Since a lot of the work is done on the individual digits, we are going to have to split the digits up into individual values.

A look into the string function documentation for Postgres reveals that this can be done by using `regexp_split_to_table`

. Splitting it into a table allows us to continue with normal table operations.

This starts us out with the following query:

```
SELECT
REGEXP_SPLIT_TO_TABLE(
ean::TEXT,
''::TEXT
)
FROM eans
```

Currently this would return all individual digits for all eans in the table.

We will need to know which digit is at which position of the code.

In order to achieve that we can utilize Postgres window functions.

```
SELECT
(
SELECT
array_agg(pos || ':' || digit)::TEXT -- aggregation to make it valid SQL
FROM (
SELECT
ROW_NUMBER() over () AS pos,
digit
FROM (
SELECT
REGEXP_SPLIT_TO_TABLE(
ean::TEXT,
''::TEXT
) AS digit
) AS split
) AS temp_agg -- helper structure
)
FROM eans
```

Please note that a temporary aggregation had to be added here to allow executing this as valid SQL while still showing the positions.

Since we made the assumption that our EAN code includes its check digit already, we can not filter that check digit away, by only selecting all digits where `pos`

is smaller than the length of the `ean`

value from the database:

```
SELECT
ean,
(
SELECT
array_agg(pos || ':' || digit)::TEXT -- aggregation to make it valid SQL
FROM (
SELECT
ROW_NUMBER() over () AS pos,
digit
FROM (
SELECT
REGEXP_SPLIT_TO_TABLE(
ean::TEXT,
''::TEXT
) AS digit
) AS split
) AS temp_agg -- helper structure
WHERE pos < LENGTH(ean::TEXT) -- filter check digit
)
FROM eans
```

In order to select the correct digits for the correct weights, we can use `LENGTH`

as a helper function and modulo 2.

Modulo 2 can here help us to decide based on 0 or 1 as a result.

We want to assign the weight of `1`

to every digit that has the same modulo 2 value as the length of the code: `pos % 2 = LENGTH(ean::TEXT) % 2`

Therefore the weight of `3`

is assigned to every digit that does *not* have the same modulo 2 value as the length of the code: `pos % 2 = (LENGTH(ean::TEXT) - 1) % 2`

.

(Note: if the EAN code from the database does not contain the check digit, those two conditions have to be swapped.)

We can then directly multiply with the weights and sum those values together:

```
SELECT
ean,
(
SELECT
SUM(digit)
FROM (
SELECT
ROW_NUMBER() over () AS pos,
digit
FROM (
SELECT
REGEXP_SPLIT_TO_TABLE(
ean::TEXT,
''::TEXT
)::INT AS digit
) AS split
) AS sub
WHERE pos < LENGTH(ean::TEXT) -- filter check digit
AND pos % 2 = LENGTH(ean::TEXT) % 2
)
+
(
SELECT
SUM(digit * 3)
FROM (
SELECT
ROW_NUMBER() over () AS pos,
digit
FROM (
SELECT
REGEXP_SPLIT_TO_TABLE(
ean::TEXT,
''::TEXT
)::INT AS digit
) AS split
) AS sub
WHERE pos < LENGTH(ean::TEXT) -- filter check digit
AND pos % 2 = (LENGTH(ean::TEXT) - 1) % 2
)
FROM eans
LIMIT 10
```

Note: we are copying the digit splitting and row number part here so that we can do all of this inside the SELECT part of the query.

We can now add the modulo 10 calculation as described above:

```
SELECT
ean,
(
10
- -- substract from 10 to get the difference to next highest multiple of 10
(
(
SELECT
SUM(digit)
FROM (
SELECT
ROW_NUMBER() over () AS pos,
digit
FROM (
SELECT
REGEXP_SPLIT_TO_TABLE(
ean::TEXT,
''::TEXT
)::INT AS digit
) AS split
) AS sub
WHERE pos < LENGTH(ean::TEXT) -- filter check digit
AND pos % 2 = LENGTH(ean::TEXT) % 2
)
+
(
SELECT
SUM(digit * 3)
FROM (
SELECT
ROW_NUMBER() over () AS pos,
digit
FROM (
SELECT
REGEXP_SPLIT_TO_TABLE(
ean::TEXT,
''::TEXT
)::INT AS digit
) AS split
) AS sub
WHERE pos < LENGTH(ean::TEXT) -- filter check digit
AND pos % 2 = (LENGTH(ean::TEXT) - 1) % 2
)
) % 10 -- normalize to values from 0 to 10
) % 10 AS digit -- handle cases where the previous value was already 0
FROM eans
LIMIT 10
```

With the check digit a `UNION ALL`

and `string_agg`

we could now build up an entire EAN code.

## Where would you need that?

A couple of examples of where this can come in handy were already mentioned at the top:

- lack of permission to create a database function
- loading everything from the database is not an option

For us it is a combination of the two. We are using views to allow us to create anonymized dumps, where the critical data never leaves the database.

For that purpose we built libraries for Ruby and Golang. More information about those can be found here and here.

In order to make it as seamless as possible to setup, we wanted to avoid requiring additional permissions like creating functions.

With the query above we can calculate the check digits for anonymized EAN codes and ensure that the original ones never actually leave the database.