Metamorphing Machine I rather be this walking metamorphosis
than having that old formed opinion about everything!

How to validate a date using regular expression

For starters, let me tell you that it is a bad idea. A really, really bad idea. Regular expression was not meant to solve a problem like this.
But it never stopped me before, right? And you may learn a thing or two, maybe.

OK, so, what are the rules here?

So... What is a leap year?

Most years have 365 days. When it has one extra day (366 days), it is a leap year.
To know if a year is a leap year, we should divide it by four. If it has no remainder, then it is a leap year.
Except that it also has to have a remainder when divided by 100. If it has no remainder when divided by 100, then it is not a leap year anymore.
But the 100 exception above has an exception too: It also must not have a remainder when divided by 400. If it has no remainder when divided by 400, then it is a leap year.
Confused yet?

Some examples:
1999 was not a leap year. 1999 divided by four yields three as a remainder, so it is not a leap year. We don't need to check the 100 rule.

2000 was a leap year. We got a remainder of zero when dividing by four, so it is a leap year. Now have to check the 100 rule. It gives zero again, so we're back to it not being a leap year. But now we need to check the 400 rule. When dividing it by 400, we got a zero again, so we're back to it being a leap year.

Working with Regular Expressions

Regular expressions check patterns in text. They do not make calculations with it. So we need to find some patterns in the rules above to use them in regular expressions.

What the multiplication table for four looks like?
00, 04, 08, 12, 16, 20, 24, 28, 32, 36, 40, ...

We can see that when a number ends with four or eight, it is preceded by an even digit: 00, 04, 08, 20, 24, 28, 40, etc.
Also, when the number ends with two or six it is preceded by an odd digit: 12, 16, 32, 36, etc.
Hmm... It looks promising.

Fun fact: To check if a number can be evenly divided by four, you just need to take into account its two last digits.

In regular expression lingo, we could say that a number having four digits and that is evenly divided by four looks like this:

\d{2}([02468][048]|[13579][26])

We could use it to match leap years, except it matches 1500, 2100, etc., that are not leap years. We need to change it to avoid matching those last two zeros being together:

\d{2}([2468][048]|[02468][48]|[13579][26])

To match years in the 400 rule, we can just adapt the expression above removing the first two digits and sticking two zeros at the end like this:

([2468][048]|[02468][48]|[13579][26])00

Now that we can match leap years, what about the other constraints mentioned before?

0?[13578]|10|12 allows us to match the 31 days months.
0?[1-9]|[12]\d|3[01] allow us to match 1 through 31.
0?[469]|11 allows us to match the 30 days months
0?[1-9]|[12]\d|30 allow us to match 1 through 30.

0?2 allows us to match February
0?[1-9]|1\d|2[0-8] allows us to match 1 through 28.
0?[1-9]|[12]\d allows us to match 1 through 29.

Now we have all pieces of the puzzle. We just need to put them together.
Be prepared for a long regular expression at the end.

Matching dates having months with 31 days:

(0?[13578]|10|12)/(0?[1-9]|[12]\d|3[01])/\d{4}

Matching dates having months with 30 days:

(0?[469]|11)/(0?[1-9]|[12]\d|30)/\d{4}

Matching February dates up to 29 days (leap years):

0?2/(0?[1-9]|[12]\d)/(\d{2}([2468][048]|[02468][48]|[13579][26])|([2468][048]|[02468][48]|[13579][26])00)

Matching February dates up to 28 days (non-leap years):

0?2/(0?[1-9]|1\d|2[0-8])/(\d{2}([2468][1235679]|[02468][01235679]|[13579][01345789])|([2468][1235679]|[02468][01235679]|[13579][01345789])00)

Joining everything:

^((0?[13578]|10|12)/(0?[1-9]|[12]\d|3[01])|(0?[469]|11)/(0?[1-9]|[12]\d|30))/\d{4}|0?2/((0?[1-9]|[12]\d)/(\d{2}([2468][048]|[02468][48]|[13579][26])|([2468][048]|[02468][48]|[13579][26])00)|(0?[1-9]|1\d|2[0-8])/(\d{2}([2468][1235679]|[02468][01235679]|[13579][01345789])|([2468][1235679]|[02468][01235679]|[13579][01345789])00))$

Try reading this to a friend over the phone...

Limitations

You see, back in the day, there was a calendrical reform that changed the leap year rule as we use today and happened to erase 10 days from the calendar. Which ones depend on the country, as it took some centuries to some of them catch up.

As you can imagine, the monster regular expression above does not take that reform into account. Or does it?
We can pretend those deleted days were there, and that the leap year rule was applied in the years before its creation. This is called proleptic dates: Applying rules retroactively as they were valid when they weren't.

And that's it. A working regular expression to validate dates that you should not use, though.

Andrej Biasic
2020-02-19