Pipe problems on Windows - BUG in Universal CRT

I've sent feedback to Microsoft, we'll see what happens.

For anyone who wants to check the weirdness on their own:

sender.c

···

-----------------------------------------------------
#include <stdio.h>
#define MAX_TESTLINES 20000

int main(void)
{
   int i;

   for (i = 0; i < MAX_TESTLINES; i++) {
     fprintf_s(stdout, "x\n");
   }
   return 0;
}
-----------------------------------------------------

receiver.c
-----------------------------------------------------
#include <stdio.h>
#define MAX_BUF 10

int main(void)
{
   int i = 0;
   char inbuf[MAX_BUF];

   while (fgets((char*)&inbuf, MAX_BUF, stdin)) {
     if (inbuf[1] != '\n') {
       fprintf_s(stdout, "Line ending omitted from stream on line %d: \"%s\"\n", i++, &inbuf);
     }
     i++;
   }
   return 0;
}
-----------------------------------------------------

Invoke the two programs in a console as:

$ sender | receiver

With bug present, the output will be something similar to:

Line ending omitted from stream on line 2730: "xx
"
Line ending omitted from stream on line 5461: "xx
"
Line ending omitted from stream on line 8192: "xx
"
Line ending omitted from stream on line 10923: "xx
"
Line ending omitted from stream on line 13654: "xx
"
Line ending omitted from stream on line 16385: "xx
"

Have fun!

-schorsch

Am 2016-03-27 10:38, schrieb Georg Mischler:

Good thing I tested with fgetline() before starting to roll my own
fgets(). The suspicion I had from stepping through fgets() was
confirmed, and it's actually the underlying stream that's broken.
Plugging our own buffering text stream might be theoretically
possible, but is probably not worth the effort.

So pending a fix from Microsoft, we need to consider Visual Studio
2015 in default settings as unsuitable for production use.

Microsoft seems to be quite proud of having massively refactored their
C/C++ runtime libraries for Windows 10 into what they call the
"universal crt". And that new version of the CRT is now included in VS
2015.

I'll try and see if (and how) I can link to an older CRT instead, but
I'm not very optimistic there.

The bug is slightly obscure. It only happens very intermittinlgy and
at seemingly random intervals. You need to pass a largish number of
very short text lines through a pipe to trigger it, and even then
you may only notice the problem if you happen to count the lines.
Sending a sequence of numerals simplifies that...

Of course that's not really an excuse for a multi-billion-dollar
corporation breaking one of the most basic building blocks of
eventually all of their software products. I'm actually wondering if
such a possibility to "manipulate" the contents of an interprocess
data stream (eg. by changing the default buffer length) has any
security implications.

This drastically shows the value of having an extremely complete and
thorough battery of test cases before you start with any major
refactoring.

-schorsch

Am 2016-03-27 00:28, schrieb Gregory J. Ward:

I agree this is probably not the error we have seen before, though it
is an important one. We might think about writing an fgets()
replacement for Windows, rather than using fgetline(), which has
slightly different semantics. We should replace it at the library
level, so it will propagate to all potentially affected tools. It's
hard to believe that such a simple, basic function call would be
broken in this way....

Good sleuthing, Schorsch!

-G

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 26, 2016 3:20:09 PM PDT

It looks like we're dealing with a broken fgets() included
with Visual Studio 2015 Community edition.

When a newline character falls exactly to the end of the
pipe buffer, it will be ignored. This means that instead of
"\t1328\n" the received string will be "\t1328\t1329\n".
Any time that happens, nrecs is only incremented once for
two actual input values, which accounts for the lower nuber
of output values in the end.

Guess I'll have to try if our own fgetline() has better success.

But again, this is probably not the "garbage date from binary
pipe" problem that we were previously discussing. We should still
look for test cases to identify that one.

Cheers
-schorsch

Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.
The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.
But...
turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.
Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.
Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...
Ah, and first I should probably create a few test cases to cover this
kind of bug.
Cheers
-schorsch

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.
If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.
I suppose a simple test would be something like:
  cnt 37 | rcalc -of -e '$1=recno' | total -if
This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.
We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.

_______________________________________________
Radiance-dev mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-dev

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Keep talking to myself...

The MS feedback site has those entries
  https://connect.microsoft.com/VisualStudio/Feedback/Details/1902345
  https://connect.microsoft.com/VisualStudio/Feedback/Details/2419638
both of which seem to describe incarnations of our problem.

In the first one, a MS person added a comment saying:
   We have fixed this bug; the fix will be present in an upcoming
   update to the Universal CRT.

So all hope is not lost.

-schorsch

···

Am 2016-03-27 14:31, schrieb Georg Mischler:

I've sent feedback to Microsoft, we'll see what happens.

For anyone who wants to check the weirdness on their own:

sender.c
-----------------------------------------------------
#include <stdio.h>
#define MAX_TESTLINES 20000

int main(void)
{
  int i;

  for (i = 0; i < MAX_TESTLINES; i++) {
    fprintf_s(stdout, "x\n");
  }
  return 0;
}
-----------------------------------------------------

receiver.c
-----------------------------------------------------
#include <stdio.h>
#define MAX_BUF 10

int main(void)
{
  int i = 0;
  char inbuf[MAX_BUF];

  while (fgets((char*)&inbuf, MAX_BUF, stdin)) {
    if (inbuf[1] != '\n') {
      fprintf_s(stdout, "Line ending omitted from stream on line %d:
\"%s\"\n", i++, &inbuf);
    }
    i++;
  }
  return 0;
}
-----------------------------------------------------

Invoke the two programs in a console as:

$ sender | receiver

With bug present, the output will be something similar to:

Line ending omitted from stream on line 2730: "xx
"
Line ending omitted from stream on line 5461: "xx
"
Line ending omitted from stream on line 8192: "xx
"
Line ending omitted from stream on line 10923: "xx
"
Line ending omitted from stream on line 13654: "xx
"
Line ending omitted from stream on line 16385: "xx
"

Have fun!

-schorsch

Am 2016-03-27 10:38, schrieb Georg Mischler:

Good thing I tested with fgetline() before starting to roll my own
fgets(). The suspicion I had from stepping through fgets() was
confirmed, and it's actually the underlying stream that's broken.
Plugging our own buffering text stream might be theoretically
possible, but is probably not worth the effort.

So pending a fix from Microsoft, we need to consider Visual Studio
2015 in default settings as unsuitable for production use.

Microsoft seems to be quite proud of having massively refactored their
C/C++ runtime libraries for Windows 10 into what they call the
"universal crt". And that new version of the CRT is now included in VS
2015.

I'll try and see if (and how) I can link to an older CRT instead, but
I'm not very optimistic there.

The bug is slightly obscure. It only happens very intermittinlgy and
at seemingly random intervals. You need to pass a largish number of
very short text lines through a pipe to trigger it, and even then
you may only notice the problem if you happen to count the lines.
Sending a sequence of numerals simplifies that...

Of course that's not really an excuse for a multi-billion-dollar
corporation breaking one of the most basic building blocks of
eventually all of their software products. I'm actually wondering if
such a possibility to "manipulate" the contents of an interprocess
data stream (eg. by changing the default buffer length) has any
security implications.

This drastically shows the value of having an extremely complete and
thorough battery of test cases before you start with any major
refactoring.

-schorsch

Am 2016-03-27 00:28, schrieb Gregory J. Ward:

I agree this is probably not the error we have seen before, though it
is an important one. We might think about writing an fgets()
replacement for Windows, rather than using fgetline(), which has
slightly different semantics. We should replace it at the library
level, so it will propagate to all potentially affected tools. It's
hard to believe that such a simple, basic function call would be
broken in this way....

Good sleuthing, Schorsch!

-G

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 26, 2016 3:20:09 PM PDT

It looks like we're dealing with a broken fgets() included
with Visual Studio 2015 Community edition.

When a newline character falls exactly to the end of the
pipe buffer, it will be ignored. This means that instead of
"\t1328\n" the received string will be "\t1328\t1329\n".
Any time that happens, nrecs is only incremented once for
two actual input values, which accounts for the lower nuber
of output values in the end.

Guess I'll have to try if our own fgetline() has better success.

But again, this is probably not the "garbage date from binary
pipe" problem that we were previously discussing. We should still
look for test cases to identify that one.

Cheers
-schorsch

Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.
The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.
But...
turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.
Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.
Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...
Ah, and first I should probably create a few test cases to cover this
kind of bug.
Cheers
-schorsch

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.
If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.
I suppose a simple test would be something like:
  cnt 37 | rcalc -of -e '$1=recno' | total -if
This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.
We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.

_______________________________________________
Radiance-dev mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-dev

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Good that you tracked this down, but as you say, we still have the rumored issue with binary data we need to sort out.

Regarding Microsoft (by analogy):

  https://www.youtube.com/watch?v=CHgUN_95UAw

Power to the monopolies!

-Greg

···

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows - BUG in Universal CRT
Date: March 27, 2016 6:21:27 AM PDT

Keep talking to myself...

The MS feedback site has those entries
https://connect.microsoft.com/VisualStudio/Feedback/Details/1902345
https://connect.microsoft.com/VisualStudio/Feedback/Details/2419638
both of which seem to describe incarnations of our problem.

In the first one, a MS person added a comment saying:
We have fixed this bug; the fix will be present in an upcoming
update to the Universal CRT.

So all hope is not lost.

-schorsch

Am 2016-03-27 14:31, schrieb Georg Mischler:

I've sent feedback to Microsoft, we'll see what happens.
For anyone who wants to check the weirdness on their own:
sender.c
-----------------------------------------------------
#include <stdio.h>
#define MAX_TESTLINES 20000
int main(void)
{
int i;
for (i = 0; i < MAX_TESTLINES; i++) {
   fprintf_s(stdout, "x\n");
}
return 0;
}
-----------------------------------------------------
receiver.c
-----------------------------------------------------
#include <stdio.h>
#define MAX_BUF 10
int main(void)
{
int i = 0;
char inbuf[MAX_BUF];
while (fgets((char*)&inbuf, MAX_BUF, stdin)) {
   if (inbuf[1] != '\n') {
     fprintf_s(stdout, "Line ending omitted from stream on line %d:
\"%s\"\n", i++, &inbuf);
   }
   i++;
}
return 0;
}
-----------------------------------------------------
Invoke the two programs in a console as:
$ sender | receiver
With bug present, the output will be something similar to:
Line ending omitted from stream on line 2730: "xx
"
Line ending omitted from stream on line 5461: "xx
"
Line ending omitted from stream on line 8192: "xx
"
Line ending omitted from stream on line 10923: "xx
"
Line ending omitted from stream on line 13654: "xx
"
Line ending omitted from stream on line 16385: "xx
"
Have fun!
-schorsch
Am 2016-03-27 10:38, schrieb Georg Mischler:

Good thing I tested with fgetline() before starting to roll my own
fgets(). The suspicion I had from stepping through fgets() was
confirmed, and it's actually the underlying stream that's broken.
Plugging our own buffering text stream might be theoretically
possible, but is probably not worth the effort.
So pending a fix from Microsoft, we need to consider Visual Studio
2015 in default settings as unsuitable for production use.
Microsoft seems to be quite proud of having massively refactored their
C/C++ runtime libraries for Windows 10 into what they call the
"universal crt". And that new version of the CRT is now included in VS
2015.
I'll try and see if (and how) I can link to an older CRT instead, but
I'm not very optimistic there.
The bug is slightly obscure. It only happens very intermittinlgy and
at seemingly random intervals. You need to pass a largish number of
very short text lines through a pipe to trigger it, and even then
you may only notice the problem if you happen to count the lines.
Sending a sequence of numerals simplifies that...
Of course that's not really an excuse for a multi-billion-dollar
corporation breaking one of the most basic building blocks of
eventually all of their software products. I'm actually wondering if
such a possibility to "manipulate" the contents of an interprocess
data stream (eg. by changing the default buffer length) has any
security implications.
This drastically shows the value of having an extremely complete and
thorough battery of test cases before you start with any major
refactoring.
-schorsch
Am 2016-03-27 00:28, schrieb Gregory J. Ward:

I agree this is probably not the error we have seen before, though it
is an important one. We might think about writing an fgets()
replacement for Windows, rather than using fgetline(), which has
slightly different semantics. We should replace it at the library
level, so it will propagate to all potentially affected tools. It's
hard to believe that such a simple, basic function call would be
broken in this way....
Good sleuthing, Schorsch!
-G

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 26, 2016 3:20:09 PM PDT
It looks like we're dealing with a broken fgets() included
with Visual Studio 2015 Community edition.
When a newline character falls exactly to the end of the
pipe buffer, it will be ignored. This means that instead of
"\t1328\n" the received string will be "\t1328\t1329\n".
Any time that happens, nrecs is only incremented once for
two actual input values, which accounts for the lower nuber
of output values in the end.
Guess I'll have to try if our own fgetline() has better success.
But again, this is probably not the "garbage date from binary
pipe" problem that we were previously discussing. We should still
look for test cases to identify that one.
Cheers
-schorsch
Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.
The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.
But...
turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.
Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.
Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...
Ah, and first I should probably create a few test cases to cover this
kind of bug.
Cheers
-schorsch

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.
If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.
I suppose a simple test would be something like:
  cnt 37 | rcalc -of -e '$1=recno' | total -if
This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.
We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.

Update:
In the report mentioned earlier, the same MS person now said that an update
of the universal CRT will probably be rolled out at around the same time
as the "anniversary update" of Windows 10 this summer. After that, I expect
we can use VS 2015 without further problems.

In the mean time, I have modified the SCons build system so that you can
select the version of Visual Studio to use. The current default is
   MSVC_VERSION=12.0
which selects VS 2013 (better don't try to understand their numbering).

The binaries created that way pass all the currently available tests,
including the ones I explicitly added to catch the text pipe CRT bug.
Those tests shove large amounts of data through both text and (all types
of) binary pipes, and currently do so without a hitch.

Cheers
-schorsch

···

Am 2016-03-28 02:03, schrieb Gregory J. Ward:

Good that you tracked this down, but as you say, we still have the
rumored issue with binary data we need to sort out.

Regarding Microsoft (by analogy):

  https://www.youtube.com/watch?v=CHgUN_95UAw

Power to the monopolies!

-Greg

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows - BUG in Universal CRT
Date: March 27, 2016 6:21:27 AM PDT

Keep talking to myself...

The MS feedback site has those entries
https://connect.microsoft.com/VisualStudio/Feedback/Details/1902345
https://connect.microsoft.com/VisualStudio/Feedback/Details/2419638
both of which seem to describe incarnations of our problem.

In the first one, a MS person added a comment saying:
We have fixed this bug; the fix will be present in an upcoming
update to the Universal CRT.

So all hope is not lost.

-schorsch

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/