Pipe problems on Windows

Good thing I tested with fgetline() before starting to roll my own
fgets(). The suspicion I had from stepping through fgets() was
confirmed, and it's actually the underlying stream that's broken.
Plugging our own buffering text stream might be theoretically
possible, but is probably not worth the effort.

So pending a fix from Microsoft, we need to consider Visual Studio
2015 in default settings as unsuitable for production use.

Microsoft seems to be quite proud of having massively refactored their
C/C++ runtime libraries for Windows 10 into what they call the
"universal crt". And that new version of the CRT is now included in VS

I'll try and see if (and how) I can link to an older CRT instead, but
I'm not very optimistic there.

The bug is slightly obscure. It only happens very intermittinlgy and
at seemingly random intervals. You need to pass a largish number of
very short text lines through a pipe to trigger it, and even then
you may only notice the problem if you happen to count the lines.
Sending a sequence of numerals simplifies that...

Of course that's not really an excuse for a multi-billion-dollar
corporation breaking one of the most basic building blocks of
eventually all of their software products. I'm actually wondering if
such a possibility to "manipulate" the contents of an interprocess
data stream (eg. by changing the default buffer length) has any
security implications.

This drastically shows the value of having an extremely complete and
thorough battery of test cases before you start with any major



Am 2016-03-27 00:28, schrieb Gregory J. Ward:

I agree this is probably not the error we have seen before, though it
is an important one. We might think about writing an fgets()
replacement for Windows, rather than using fgetline(), which has
slightly different semantics. We should replace it at the library
level, so it will propagate to all potentially affected tools. It's
hard to believe that such a simple, basic function call would be
broken in this way....

Good sleuthing, Schorsch!


From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 26, 2016 3:20:09 PM PDT

It looks like we're dealing with a broken fgets() included
with Visual Studio 2015 Community edition.

When a newline character falls exactly to the end of the
pipe buffer, it will be ignored. This means that instead of
"\t1328\n" the received string will be "\t1328\t1329\n".
Any time that happens, nrecs is only incremented once for
two actual input values, which accounts for the lower nuber
of output values in the end.

Guess I'll have to try if our own fgetline() has better success.

But again, this is probably not the "garbage date from binary
pipe" problem that we were previously discussing. We should still
look for test cases to identify that one.


Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.
The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.
turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.
Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...
Ah, and first I should probably create a few test cases to cover this
kind of bug.

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.
If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.
I suppose a simple test would be something like:
  cnt 37 | rcalc -of -e '$1=recno' | total -if
This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.
We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.

Radiance-dev mailing list
[email protected]

Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/