Discussion:
Of strptime() and dates
(too old to reply)
James Kuyper
2020-12-03 04:23:00 UTC
Permalink
strptime() is not a C standard library function. A function with that
name is specified by the POSIX standard. I'll assume that is the one
you're referring to. In general, you would get better answers to such a
question by asking in comp.unix.programmer.
When I convert a date string from format m/d/yyyy to epoch the code
doesn't read the first value in the array correctly.
It usually assigns some massive value as Unix time.
When I convert a date string from format yyyy-mm-dd h:m:s to epoch the
code does read the first value in the array correctly.
===========================================================================
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int main(void)
{
char minc[100], maxc[100];
char *dtfmt1 = "%m/%d/%Y> char *dates1[] =
{"1/1/2019","1/1/2020","1/1/2024","1/1/2022","1/1/2023"};
char *dtfmt2 = "%Y-%m-%d %H:%M:%S";
char *dates2[] = {"2100-01-01 04:15:23","2100-01-01
04:15:22","2220-01-01 04:15:33","2021-01-01 04:15:34","2019-01-01
04:15:22"};
struct tm tm;
time_t epoch, epochmin, epochmax;
for(int i=0;i<5;i++)
{
if ( strptime(dates1[i], dtfmt1, &tm) != NULL )
{
epoch = mktime(&tm);
if(i==0) {epochmin = epochmax = epoch;}
else
{
if(epoch < epochmin) {epochmin = epoch;}
if(epoch > epochmax) {epochmax = epoch;}
}
printf("Date %s, epoch %ld, min %ld, max
%ld\n",dates1[i],epoch,epochmin,epochmax);
}
}
strftime(minc,sizeof(minc),dtfmt1,localtime(&epochmin));
strftime(maxc,sizeof(maxc),dtfmt1,localtime(&epochmax));
printf("Min %s, Max %s\n\n",minc, maxc);
for(int i=0;i<5;i++)
{
if ( strptime(dates2[i], dtfmt2, &tm) != NULL )
{
epoch = mktime(&tm);
if(i==0) {epochmin = epochmax = epoch;}
else
{
if(epoch < epochmin) {epochmin = epoch;}
if(epoch > epochmax) {epochmax = epoch;}
}
printf("Date %s, epoch %ld, min %ld, max
%ld\n",dates2[i],epoch,epochmin,epochmax);
}
}
strftime(minc,sizeof(minc),dtfmt2,localtime(&epochmin));
strftime(maxc,sizeof(maxc),dtfmt2,localtime(&epochmax));
printf("Min %s, Max %s\n",minc, maxc);
}
===========================================================================
Date 1/1/2019, epoch 1112022688900, min 1112022688900, max 1112022688900
Date 1/1/2020, epoch 1577920900, min 1577920900, max 1112022688900
Date 1/1/2024, epoch 1704151300, min 1577920900, max 1112022688900
Date 1/1/2022, epoch 1641079300, min 1577920900, max 1112022688900
Date 1/1/2023, epoch 1672615300, min 1577920900, max 1112022688900
Min 01/01/2020, Max 08/05/37208
note: incorrect min and max
Date 2100-01-01 04:15:23, epoch 4102478123, min 4102478123, max 4102478123
Date 2100-01-01 04:15:22, epoch 4102478122, min 4102478122, max 4102478123
Date 2220-01-01 04:15:33, epoch 7889217333, min 4102478122, max 7889217333
Date 2021-01-01 04:15:34, epoch 1609492534, min 1609492534, max 7889217333
Date 2019-01-01 04:15:22, epoch 1546334122, min 1546334122, max 7889217333
Min 2019-01-01 04:15:22, Max 2220-01-01 04:15:33
note: correct min and max
Any idea what's going on here?
It took some experimenting for me to figure it out. I instrumented the
code the print out all of the standard-specified members of struct tm:

static void print_tm(const struct tm *tm)
{
printf("%d %d %d %d %d %d %d %d %d\n", tm->tm_sec, tm->tm_min,
tm->tm_hour, tm->tm_mday, tm->tm_mon, tm->tm_year,
tm->tm_wday, tm->tm_yday, tm->tm_isdst);
}

and I executed that function before and after each call to strptime()
and mktime(). What I found surprised me: strptime() only modifies the
values stored in those members of the struct tm that are specified in
the format string. The other members are left unchanged. Since tm is
uninitialized in your program, the first call to strptime() can produce
some very bizarre results, depending upon what's in that memory.

From The Open Group Base Specifications Issue 7, 2018 edition
IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008)
Copyright © 2001-2018 IEEE and The Open Group

"Any other conversion specification is executed by scanning characters
until a character matching the next directive is scanned, or until no
more characters can be scanned. These characters, except the one
matching the next directive, are then compared to the locale values
associated with the conversion specifier. If a match is found, values
for the appropriate tm structure members are set to values corresponding
to the locale information."

Notice that it does not say anything about what happens to the other tm
structure members.

Solution:
struct tm tm = {0};
dfs
2020-12-03 05:02:27 UTC
Permalink
Post by James Kuyper
strptime() is not a C standard library function. A function with that
name is specified by the POSIX standard. I'll assume that is the one
you're referring to. In general, you would get better answers to such a
question by asking in comp.unix.programmer.
When I convert a date string from format m/d/yyyy to epoch the code
doesn't read the first value in the array correctly.
It usually assigns some massive value as Unix time.
When I convert a date string from format yyyy-mm-dd h:m:s to epoch the
code does read the first value in the array correctly.
===========================================================================
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int main(void)
{
char minc[100], maxc[100];
char *dtfmt1 = "%m/%d/%Y> char *dates1[] =
{"1/1/2019","1/1/2020","1/1/2024","1/1/2022","1/1/2023"};
char *dtfmt2 = "%Y-%m-%d %H:%M:%S";
char *dates2[] = {"2100-01-01 04:15:23","2100-01-01
04:15:22","2220-01-01 04:15:33","2021-01-01 04:15:34","2019-01-01
04:15:22"};
struct tm tm;
time_t epoch, epochmin, epochmax;
for(int i=0;i<5;i++)
{
if ( strptime(dates1[i], dtfmt1, &tm) != NULL )
{
epoch = mktime(&tm);
if(i==0) {epochmin = epochmax = epoch;}
else
{
if(epoch < epochmin) {epochmin = epoch;}
if(epoch > epochmax) {epochmax = epoch;}
}
printf("Date %s, epoch %ld, min %ld, max
%ld\n",dates1[i],epoch,epochmin,epochmax);
}
}
strftime(minc,sizeof(minc),dtfmt1,localtime(&epochmin));
strftime(maxc,sizeof(maxc),dtfmt1,localtime(&epochmax));
printf("Min %s, Max %s\n\n",minc, maxc);
for(int i=0;i<5;i++)
{
if ( strptime(dates2[i], dtfmt2, &tm) != NULL )
{
epoch = mktime(&tm);
if(i==0) {epochmin = epochmax = epoch;}
else
{
if(epoch < epochmin) {epochmin = epoch;}
if(epoch > epochmax) {epochmax = epoch;}
}
printf("Date %s, epoch %ld, min %ld, max
%ld\n",dates2[i],epoch,epochmin,epochmax);
}
}
strftime(minc,sizeof(minc),dtfmt2,localtime(&epochmin));
strftime(maxc,sizeof(maxc),dtfmt2,localtime(&epochmax));
printf("Min %s, Max %s\n",minc, maxc);
}
===========================================================================
Date 1/1/2019, epoch 1112022688900, min 1112022688900, max 1112022688900
Date 1/1/2020, epoch 1577920900, min 1577920900, max 1112022688900
Date 1/1/2024, epoch 1704151300, min 1577920900, max 1112022688900
Date 1/1/2022, epoch 1641079300, min 1577920900, max 1112022688900
Date 1/1/2023, epoch 1672615300, min 1577920900, max 1112022688900
Min 01/01/2020, Max 08/05/37208
note: incorrect min and max
Date 2100-01-01 04:15:23, epoch 4102478123, min 4102478123, max 4102478123
Date 2100-01-01 04:15:22, epoch 4102478122, min 4102478122, max 4102478123
Date 2220-01-01 04:15:33, epoch 7889217333, min 4102478122, max 7889217333
Date 2021-01-01 04:15:34, epoch 1609492534, min 1609492534, max 7889217333
Date 2019-01-01 04:15:22, epoch 1546334122, min 1546334122, max 7889217333
Min 2019-01-01 04:15:22, Max 2220-01-01 04:15:33
note: correct min and max
Any idea what's going on here?
It took some experimenting for me to figure it out. I instrumented the
static void print_tm(const struct tm *tm)
{
printf("%d %d %d %d %d %d %d %d %d\n", tm->tm_sec, tm->tm_min,
tm->tm_hour, tm->tm_mday, tm->tm_mon, tm->tm_year,
tm->tm_wday, tm->tm_yday, tm->tm_isdst);
}
and I executed that function before and after each call to strptime()
and mktime(). What I found surprised me: strptime() only modifies the
values stored in those members of the struct tm that are specified in
the format string. The other members are left unchanged. Since tm is
uninitialized in your program, the first call to strptime() can produce
some very bizarre results, depending upon what's in that memory.
Yeah, I did see strange results: huge positive and huge negative numbers.
Post by James Kuyper
From The Open Group Base Specifications Issue 7, 2018 edition
IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008)
Copyright © 2001-2018 IEEE and The Open Group
"Any other conversion specification is executed by scanning characters
until a character matching the next directive is scanned, or until no
more characters can be scanned. These characters, except the one
matching the next directive, are then compared to the locale values
associated with the conversion specifier. If a match is found, values
for the appropriate tm structure members are set to values corresponding
to the locale information."
Notice that it does not say anything about what happens to the other tm
structure members.
struct tm tm = {0};
A 4-character fix. Nice - you are the man, James. Thanks for taking
the time to look into it.

I had previously built another method that worked: using strptime and
strftime, convert the original dates into a sortable format
(yyyy-mm-dd), copy them to a temp array, sort the temp, get first and
last values.

I like the Unix time method better: simpler, less code, fewer objects.

Thanks again.
Kenny McCormack
2020-12-03 06:37:33 UTC
Permalink
In article <7k_xH.148535$***@fx39.iad>, dfs <***@dfs.com> wrote:
...
Post by dfs
A 4-character fix. Nice - you are the man, James. Thanks for taking
the time to look into it.
I looked at some old code of mine that uses strptime(), and found these two
lines of initialization:

/* struct tm tm; */
memset(&tm,0,sizeof(tm));
tm.tm_isdst = -1; /* Let the system handle DST */

If you are doing this more than once (e.g., in a loop), then you need to
initialize it like this each time - i.e., a simple compile-time
initialization won't do the job.
--
Debating creationists on the topic of evolution is rather like trying to
play chess with a pigeon --- it knocks the pieces over, craps on the
board, and flies back to its flock to claim victory.
dfs
2020-12-03 17:48:11 UTC
Permalink
Post by Kenny McCormack
...
Post by dfs
A 4-character fix. Nice - you are the man, James. Thanks for taking
the time to look into it.
I looked at some old code of mine that uses strptime(), and found these two
/* struct tm tm; */
memset(&tm,0,sizeof(tm));
tm.tm_isdst = -1; /* Let the system handle DST */
If you are doing this more than once (e.g., in a loop), then you need to
initialize it like this each time - i.e., a simple compile-time
initialization won't do the job.
It's initialized in the function in which it's used:


void getstats(char *arr[][headers], int column)
{
...
struct tm tm = {0};
time_t epoch, epochmin, epochmax;
...other code...
}


That function is called in a loop to examine each column of a .csv file
that was previously read into a 2d array. To test I created a .csv file
with consecutive date fields m/d/yyyy and the function correctly
identified the min and max of each.

Without the initialization it produced invalid results for the first
date field only - the 2nd was correct.

To test your theory, I put
struct tm tm = {0};
outside the function and it did continue to give correct results for all
m/d/yyyy calculations done in a loop.



Thanks for the followup.
James Kuyper
2020-12-03 18:06:39 UTC
Permalink
Post by dfs
Post by Kenny McCormack
...
Post by dfs
A 4-character fix. Nice - you are the man, James. Thanks for taking
the time to look into it.
I looked at some old code of mine that uses strptime(), and found these two
/* struct tm tm; */
memset(&tm,0,sizeof(tm));
tm.tm_isdst = -1; /* Let the system handle DST */
If you are doing this more than once (e.g., in a loop), then you need to
initialize it like this each time - i.e., a simple compile-time
initialization won't do the job.
void getstats(char *arr[][headers], int column)
{
...
struct tm tm = {0};
time_t epoch, epochmin, epochmax;
...other code...
}
That function is called in a loop to examine each column of a .csv file
that was previously read into a 2d array. To test I created a .csv file
with consecutive date fields m/d/yyyy and the function correctly
identified the min and max of each.
Without the initialization it produced invalid results for the first
date field only - the 2nd was correct.
To test your theory, I put
struct tm tm = {0};
outside the function and it did continue to give correct results for all
m/d/yyyy calculations done in a loop.
The key point is whether it's acceptable for a later pass through the
loop to use the same values as the previous pass for all fields not set
by the strptime() call. That's often the case - it's only likely to be
an issue if different passes through the loop call strptime() with
format strings that set different fields.

F Russell
2020-12-03 13:43:30 UTC
Permalink
Post by James Kuyper
From The Open Group Base Specifications Issue 7, 2018 edition
IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008)
Copyright © 2001-2018 IEEE and The Open Group
Notice that it does not say anything about what happens to the other tm
structure members.
struct tm tm = {0};
The GNU C Library manual gives the advice:

"Before calling the strptime function for a new input string, you should prepare
the tm structure you pass. Normally this will mean initializing all values to zero.
Alternatively, you can set all fields to values like INT_MAX, allowing you to determine
which elements were set by the function call. Zero does not work here since it is a
valid value for many of the fields."
--
Systemd free. D.E. free.

Always and forever.
James Kuyper
2020-12-03 14:18:52 UTC
Permalink
Post by F Russell
Post by James Kuyper
From The Open Group Base Specifications Issue 7, 2018 edition
IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008)
Copyright © 2001-2018 IEEE and The Open Group
Notice that it does not say anything about what happens to the other tm
structure members.
struct tm tm = {0};
"Before calling the strptime function for a new input string, you should prepare
the tm structure you pass. Normally this will mean initializing all values to zero.
Alternatively, you can set all fields to values like INT_MAX, allowing you to determine
which elements were set by the function call. Zero does not work here since it is a
valid value for many of the fields."
If you follow that advice, fix the INT_MAX values before passing the
struct to mktime(). mktime()'s defined behavior is to normalize
out-of-range values. For instance, if you have a month number of -4, it
will subtract 1 from the year and add 12 to the month number.
Personally, I'd favor INT_MIN for uses like this.
Continue reading on narkive:
Loading...