Wednesday, January 16, 2008

2038 Bug

What is the year 2038 bug?

In the first month of the year 2038 C.E. many computers will encounter a date-related bug in their operating systems and/or in the applications they run. This can result in incorrect and grossly inaccurate dates being reported by the operating system and/or applications. The effect of this bug is hard to predict, because many applications are not prepared for the resulting "skip" in reported time - anywhere from 1901 to a "broken record" repeat of the reported time at the second the bug occurs. Also, leap seconds may make some small adjustment to the actual time the bug expresses itself. I expect this bug to cause serious problems on many platforms, especially Unix and Unix-like platforms, because these systems will "run out of time". Starting at GMT 03:14:07, Tuesday, January 19, 2038, I fully expect to see lots of systems around the world breaking magnificently: satellites falling out of orbit, massive power outages (like the 2003 North American blackout), hospital life support system failures, phone system interruptions (including 911 emergency services), banking system crashes, etc. One second after this critical second, many of these systems will have wildly inaccurate date settings, producing all kinds of unpredictable consequences. In short, many of the dire predictions for the year 2000 are much more likely to actually occur in the year 2038! Consider the year 2000 just a dry run. In case you think we can sit on this issue for another 30 years before addressing it, consider that reports of temporal echoes of the 2038 problem are already starting to appear in future date calculations for mortgages and vital statistics! Just wait til January 19, 2008, when 30-year mortgages will start to be calculated.

What causes it?

What makes January 19, 2038 a special day? Unix and Unix-like operating systems do not calculate time in the Gregorian calendar, they simply count time in seconds since their arbitrary "birthday", GMT 00:00:00, Thursday, January 1, 1970 C.E. The industry-wide practice is to use a 32-bit variable for this number (32-bit signed time_t). Imagine an odometer with 32 wheels, each marked to count from 0 and 1 (for base-2 counting), with the end wheel used to indicate a positive or negative integer. The largest possible value for this integer is 2**31-1 = 2,147,483,647 (over two billion). 2,147,483,647 seconds after Unix's birthday corresponds to GMT 03:14:07, Tuesday, January 19, 2038. One second later, many Unix systems will revert to their birth date (like an odometer rollover from 999999 to 000000). Because the end bit indicating positive/negative integer may flip over, some systems may revert the date to 20:45:52, Friday, December 13, 1901 (which corresponds to GMT 00:00:00 Thursday, January 1, 1970 minus 2**31 seconds). Hence the media may nickname this the "Friday the Thirteenth Bug". I have read unconfirmed reports that the rollover could even result in a system time of December 32, 1969 on some legacy systems!

What operating systems, platforms, and applications are affected by it?


A quick check with the following Perl script may help determine if your computers will have problems (this requires Perl to be installed on your system, of course):
#!/usr/bin/perl
#
# I've seen a few versions of this algorithm
# online, I don't know who to credit. I assume
# this code to by GPL unless proven otherwise.
# Comments provided by William Porquet, February 2004.
# You may need to change the line above to
# reflect the location of your Perl binary
# (e.g. "#!/usr/local/bin/perl").
# Also change this file's name to '2038.pl'.
# Don't forget to make this file +x with "chmod".
# On Linux, you can run this from a command line like this:
# ./2038.pl
use POSIX;
# Use POSIX (Portable Operating System Interface),
# a set of standard operating system interfaces.
$ENV{'TZ'} = "GMT";
# Set the Time Zone to GMT (Greenwich Mean Time) for date calculations.
for ($clock = 2147483641; $clock <>
{
print ctime($clock);
}
# Count up in seconds of Epoch time just before and after the critical event.
# Print out the corresponding date in Gregorian calendar for each result.
# Are the date and time outputs correct after the critical event second?


I have only seen a mere handful of operating systems that appear to be unaffected by the year 2038 bug so far. For example, the output of this script on Debian GNU/Linux (kernel 2.4.22):

# ./2038.pl
Tue Jan 19 03:14:01 2038
Tue Jan 19 03:14:02 2038
Tue Jan 19 03:14:03 2038
Tue Jan 19 03:14:04 2038
Tue Jan 19 03:14:05 2038
Tue Jan 19 03:14:06 2038
Tue Jan 19 03:14:07 2038
Fri Dec 13 20:45:52 1901
Fri Dec 13 20:45:52 1901
Fri Dec 13 20:45:52 1901


Windows 2000 Professional with ActivePerl 5.8.3.809 fails in such a manner that it stops displaying the date after the critical second:

C:\>perl 2038.pl
Mon Jan 18 22:14:01 2038
Mon Jan 18 22:14:02 2038
Mon Jan 18 22:14:03 2038
Mon Jan 18 22:14:04 2038
Mon Jan 18 22:14:05 2038
Mon Jan 18 22:14:06 2038
Mon Jan 18 22:14:07 2038


So far, the few operating systems that I haven't found susceptible to the 2038 bug include very new versions of Unix and Linux ported to 64-bit platforms. Recent versions of QNX seems to take the temporal transition in stride. If you'd like to try this 2038 test yourself on whatever operating systems and platforms you have handy, download the Perl source code here. A gcc-compatible ANSI C work-alike version is available here. A Python work-alike version is available here. Feel free to email your output to me for inclusion on a future revision of this Web page. I have collected many reader-submitted sample outputs from various platforms and operating systems and posted them here.

For a recent relevant example of the wide-spread and far-reaching extent of the 2038 problem, consider the Mars rover Opportunity that had a software crash which resulted in it "phoning home" while reporting the year as 2038 (see paragraph under heading "Condition Red").

A large number of machines, platforms, and applications could be affected by the 2038 problem. Most of these will (hopefully) get decommissioned before the critical date. However, it is possible that some machines going into service now, or legacy systems which have never been upgraded due to budget constrains, may still be operating in 2038. These may include process control computers, space probe computers, embedded systems in traffic light controllers, navigation systems, etc. It may not be possible to upgrade many of these systems. For example, Ferranti Argus computers survived in service long enough to present serious maintenance problems. Clock circuit hardware which has adopted the Unix time convention may also be affected if 32-bit registers are used. While 32-bit CPUs may be obsolete in desktop computers and servers by 2038, they may still exist in microcontrollers and embedded circuits. For instance, when I last checked in 1999, the Z80 processor is still available as an embedded function within Altera programmable devices. Such embedded functions present a serious maintenance problem for all rollover issues like the year 2038 problem, since the package part number and other markings usually give no indication of the device's internal function. Also, I expect we've already forgotten how many devices are running strange mutations of embedded Microsoft Windows. I can recall encountering some telephony devices and printers running Windows CE and NT under the hood, just off the top of my head. And don't forget emulators that allow older code (both applications and operating systems) to run on newer platforms!

Many Intel x86 platforms have BIOS date issues as well. The Linux BIOS time utility hwclock has issues around the critical second in 2038 too (DO NOT try this on a production system unless you REALLY know what you're doing):

[root@alouette root]# hwclock --set --date="1/18/2038 22:14:06"
[root@alouette root]# hwclock --set --date="1/18/2038 22:14:07"
RTC_SET_TIME: Invalid argument
ioctl() to /dev/rtc to set the time failed.
[root@alouette root]# hwclock --set --date="1/18/2038 22:14:08"
date: invalid date `1/18/2038 22:14:08'
The date command issued by hwclock returned unexpected results.
The command was:
date --date="1/18/2038 22:14:08" +seconds-into-epoch=%s

I performed this test on an Intel Celeron laptop which has a Toshiba BIOS. Note that trying to set the BIOS hardware to the "critical second" (relative to my time zone) resulted in a different error than setting the BIOS to one second later (although that failed too). Usually Linux systems do not rely too heavily on the BIOS clock and will try to synchronize themselves to an NTP server after boot anyway. Again I must emphasize, do not play with the date on a product server! This story may help illustrate why one should never try critical date testing on production machines...

"Three years ago, I had several servers timed to a single (note I say single) NTP located at a university that, unbeknownst to me, was checking for the Unix equivalent of the Y2K bug that will occur in the year 2038. Well, perhaps the university didn't realize that businesses such as mine actually rely on accurate data. During their test, they advanced their real time clock to the year 2038, and suddenly, without warning, each of my log files started writing the year 2038 instead of 1999. This caused massive problems for systems across the network. The lesson learned was that most NTP clients have the ability to reference two or more stratums, writing the average of difference to the software clock, and where the difference is [significantly] out of sync, the client will either compensate to the clock closest to the last known value or exit (1), leaving the local clock(s) unchanged." - The Importance of Choosing an Accurate NTP

I believe the year 2038 problem will more likely result in air traffic control disasters, life-support systems failure, and power grid meltdown than the year 2000 problem. The year 2000 problems often involved higher-level application programs, disrupting inventory control, credit card payments, pension plans, and the like. The 2038 problem may well cause more serious problems because it involves the basic system timekeeping functions from which most other time and date information is derived. Databases using 32-bit Unix time may survive through 2038, and care will have to be taken in these cases to avoid rollover issues. Some problems related to the year 2038 have already started to show themselves, as this quote from the Web site 2038bug.com illustrates:

The first 2038 problems are already here. Many 32-bit programs calculate time averages using (t1 + t2)/2. It should be quite obvious that this calculation fails when the time values pass 30 bits. The exact day can be calculated by making a small Unix C program, as follows:

echo 'long q=(1UL<<30);int> x.c && cc x.c && ./a.out

In other words, on the 10th of January 2004 the occasional system will perform an incorrect time calculation until its code is corrected. Thanks to Ray Boucher for this observation.

The temporary solution is to replace all (t1 + t2)/2 with (((long long) t1 + t2) / 2)(POSIX/SuS) or (((double) t1 + t2) / 2) (ANSI). (Note that using t1/2 + t2/2 gives a roundoff error.)

Some Unix vendors have already started to use a 64-bit signed time_t in their operating systems to count the number of seconds since GMT 00:00:00, Thursday, January 1, 1970 C.E. Programs or databases with a fixed field width should probably allocate at least 48 bits to storing time values. 64-bit Unix time would be safe for the indefinite future, as this variable won't overflow until 2**63 or 9,223,372,036,854,775,808 (over nine quintillion) seconds after the beginning of the Unix epoch - corresponding to GMT 15:30:08, Sunday, December 4, 292,277,026,596 C.E. This is a rather artificial and arbitrary date, considering that it is several times the average lifespan of a sun like our solar system's, the very same celestial body by which we measure time. The sun is estimated at present to be about four and a half billion years old, and it may last another five billion years before running out of hydrogen and turning into a white dwarf star.

A recent example of the 2038 problem documented on Wikipedia:

In May, 2006, reports surfaced of an early Y2038 problem in the AOLServer software. The software would specify that a database request should "never" timeout by specifying a timeout date one billion seconds in the future. One billion seconds after 21:27:28 on 12 May, 2006 is beyond the 2038 cutoff date, so after this date, the timeout calculation overflowed and calculated a timeout date that was actually in the past, causing the software to crash.


What can I do about it?

If you are a programmer or a systems integrator who needs to know what you can do to fix this problem, here is a checklist of my suggestions (which come with no warranty or guarantee):

Consider testing your mission-critical code well ahead of time on a non-production test platform set just before the critical date, or with utilities such as the FakeTime Preload Library. FTPL "...intercepts various system calls which programs use to retrieve the current date and time. It can then report faked dates and times (as specified by you, the user) to these programs. This means you can modify the system time a program sees without changing the time system-wide" [emphasis theirs].

An organization called The Open Group (formerly X/Open), which maintains the Unix specification and trademark, has a number of programming recommendations which should be followed by developers to deal with the year 2038 and other problematic dates.

Also, see this article regarding Solutions to the Year 2000 Problem by my colleague Steve Manley. Many of his suggestions can be applied to the 2038 problem too. I would also suggest this very concise and well-written essay on the 2038 problem by a programmer named Roger M. Wilcox.

If you are working with Open Source code, this free library may be a useful reference for patching existing code for high-accuracy longterm time calculation: "libtai is a library for storing and manipulating dates and times. libtai supports two time scales: (1) TAI64, covering a few hundred billion years with 1-second precision; (2) TAI64NA, covering the same period with 1-attosecond precision. Both scales are defined in terms of TAI, the current international real time standard." An attosecond, defined in U.S. usage, is one quintillionth (10**18) of a second (it takes a very fast stopwatch for a researcher to clock those pesky photons). This is the kind of good timekeeping one might need for deep-space probes or high-reliability systems.

For more general applications, just using large types for storing dates will do the trick in most cases. For example, in GNU C, 64-bits (a "long long" type) is sufficient to keep the time from rolling over for literally geological eons (I can hear Carl Sagan in my head saying "beeeelions... and beeeelions..."). This just means any executables the operating systems runs will always get the correct time reported to them when queried in the correct manner. It doesn't stop the executables from having date issues of their own.

"The best way to predict the future is to engineer it." There will always be stuff we won't foresee, or simply didn't know about. Even this most exhaustive list of critical dates by Dr. Stockton may miss some critical dates - perhaps you work with emerging, proprietary, or closed source code. Check Dr. Stockton's list twice if you need to worry about "five-nines" reliability: you'd be surprised how many issues there are just within two centuries of date calculation. A lot of modern computers do a considerable amount of work within these years, since it covers a couple of recent human generations of calculable human demographics. Software for genealogy, mortgage calculation, vital statistics, and many other applications may need to frequently and reliably peer a century or two forward or backward.

Good luck, and I hope no ones flying car breaks down in 2038! :-)

How is the 2038 problem related to the John Titor story?

If you are not familiar with the name John Titor, I recommend you browse the site JohnTitor.com. I understand that my site's URL appears quoted a number of times in the discussion of this apparently transtemporal Internet celebrity. I don't know John Titor, and I have never chatted with anyone on the Internet purporting to be John Titor. The stories have not convinced me so far that he has traveled from another time. I also have some technical issues with his rather offhanded mention of the 2038 problem. Furthermore, John Titor has conveniently returned to his time stream and can not answer email on the subject. Having said that, I think it fair to say that I find John Titor's political commentary insightful and thought-provoking, and I consider him a performance artist par excellence.

source : 2038.org

1 comment:

William Porquet said...

Thanks for the shout out! :-D

Cheers,
William Porquet
william@2038.org
http://www.2038.org/


Total Pageviews