Pipe problems on Windows

Moving this to a seperate thread.

The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.

But...

turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.

Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.

Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...

Ah, and first I should probably create a few test cases to cover this
kind of bug.

Cheers
-schorsch

···

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.

If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.

I suppose a simple test would be something like:

  cnt 37 | rcalc -of -e '$1=recno' | total -if

This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.

We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Well, rcalc.c calls exit(0) when finished, which is supposed to flush all the output streams. Does it under MSC?

What do you get when you send the output of rcalc to a (binary) file with n>2000 -- is the byte count wrong, then?

Thanks for having a look at this!
-Greg

···

From: Georg Mischler <[email protected]>
Subject: [Radiance-dev] Pipe problems on Windows
Date: March 25, 2016 7:35:11 AM PDT

Moving this to a seperate thread.

The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.

But...

turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.

Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.

Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...

Ah, and first I should probably create a few test cases to cover this
kind of bug.

Cheers
-schorsch

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.
If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.
I suppose a simple test would be something like:
  cnt 37 | rcalc -of -e '$1=recno' | total -if
This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.
We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.

rcalc.c calls exit(0) when finished, which is supposed to flush all the
output streams.

Harbison and Steele's *C: A Reference Manual* agrees, and so does the MS
documentation.

WtF?

I suppose it is possible this is some subtle pointer bug. :frowning:

Randolph

I suppose we could run rcalc under valgrind and start looking for memory
problems. :frowning:

Randolph

I would be surprised if this ended up being a memory issue. I don't think rcalc allocates memory, unless the stack counts.

-Greg

···

From: "Randolph M. Fritz" <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows

I suppose we could run rcalc under valgrind and start looking for memory problems. :frowning:

Randolph

++++++++++++++

From: "Randolph M. Fritz" <[email protected]>
Date: March 25, 2016 10:51:11 AM PDT

rcalc.c calls exit(0) when finished, which is supposed to flush all the output streams.

Harbison and Steele's *C: A Reference Manual* agrees, and so does the MS documentation.

WtF?

I suppose it is possible this is some subtle pointer bug. :frowning:

Randolph

I'm shooting in the dark, but it's exactly what we would expect from a
memory error. I suppose it might also be a subtle data typing error, or an
error in the write calls. Can anyone think of anything else?

Schorsch, which version of MSVC are you using?

Randolph

Moving this to a seperate thread.

The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.

But...

turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.

Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.

Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...

Ah, and first I should probably create a few test cases to cover this
kind of bug.

round two:

I've detected the problem by redirecting the output of rcalc
into a file, which ends up too small.
It always happens when running in bash.
It currently gives the correct output about one time out of
four running in cmd.exe.
When reading from a file instead of through a pipe from cnt,
the output is always correct.
Adding a fflush() doesn't change anyhting, as mentioned the
exit() will do that anyway.
There's no problem when feeding the same data through
call_one() from pyradlib.
Passing in the output of cnt through call_two() currently runs
into a deadlock that I can't quite explain.

It's getting more and more mysterious...

-schorsch

···

Am 2016-03-25 15:35, schrieb Georg Mischler:

Cheers
-schorsch

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.

If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.

I suppose a simple test would be something like:

  cnt 37 | rcalc -of -e '$1=recno' | total -if

This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.

We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

The most recent one. Visual Studio 2015 Community edition.

-schorsch

···

Am 2016-03-25 19:27, schrieb Randolph M. Fritz:

I'm shooting in the dark, but it's exactly what we would expect from a
memory error. I suppose it might also be a subtle data typing error,
or an error in the write calls. Can anyone think of anything else?

Schorsch, which version of MSVC are you using?

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Sorry for the weird sequence in here...

Deadlock issue resolved.
Now the testing with call_two() also results in the output
of rcalc being cut short by one or several values.

The problem starts at n=1330, where one value is lost.
Starting with n=4833, the last two values get missing.
Starting with n=9876, the last three values are dropped.
At least approximately. Sometimes it's a value less or more.
Any logic to those numbers? I have no idea.

By the way, the effect is independent of the output format.
This also happens with ASCII output.
Somehow I get the impression, this might be a different problem
than we were originally looking for, and I'm not sure yet if
cnt or rcalc are to blame.

-schorsch

···

Am 2016-03-25 20:19, schrieb Georg Mischler:

Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.

The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.

But...

turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.

Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.

Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...

Ah, and first I should probably create a few test cases to cover this
kind of bug.

round two:

I've detected the problem by redirecting the output of rcalc
into a file, which ends up too small.
It always happens when running in bash.
It currently gives the correct output about one time out of
four running in cmd.exe.
When reading from a file instead of through a pipe from cnt,
the output is always correct.
Adding a fflush() doesn't change anyhting, as mentioned the
exit() will do that anyway.
There's no problem when feeding the same data through
call_one() from pyradlib.
Passing in the output of cnt through call_two() currently runs
into a deadlock that I can't quite explain.

It's getting more and more mysterious...

-schorsch

Cheers
-schorsch

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.

If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.

I suppose a simple test would be something like:

  cnt 37 | rcalc -of -e '$1=recno' | total -if

This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.

We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Really strange... I definitely don't see problems on Unix. Did you say that the problem goes away with intermediate files? Which commands succeed and which ones fail. Can we figure out where bytes are being lost?

-Greg

···

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 25, 2016 3:13:17 PM PDT

Sorry for the weird sequence in here...

Deadlock issue resolved.
Now the testing with call_two() also results in the output
of rcalc being cut short by one or several values.

The problem starts at n=1330, where one value is lost.
Starting with n=4833, the last two values get missing.
Starting with n=9876, the last three values are dropped.
At least approximately. Sometimes it's a value less or more.
Any logic to those numbers? I have no idea.

By the way, the effect is independent of the output format.
This also happens with ASCII output.
Somehow I get the impression, this might be a different problem
than we were originally looking for, and I'm not sure yet if
cnt or rcalc are to blame.

-schorsch

Am 2016-03-25 20:19, schrieb Georg Mischler:

Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.
The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.
But...
turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.
Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.
Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...
Ah, and first I should probably create a few test cases to cover this
kind of bug.

round two:
I've detected the problem by redirecting the output of rcalc
into a file, which ends up too small.
It always happens when running in bash.
It currently gives the correct output about one time out of
four running in cmd.exe.
When reading from a file instead of through a pipe from cnt,
the output is always correct.
Adding a fflush() doesn't change anyhting, as mentioned the
exit() will do that anyway.
There's no problem when feeding the same data through
call_one() from pyradlib.
Passing in the output of cnt through call_two() currently runs
into a deadlock that I can't quite explain.
It's getting more and more mysterious...
-schorsch

No problems with the gcc binaries on Windows either.

I tend to agree with Randolph. This is likely a problem that comes to
the surface due to the different memory layout applied by the MS
compiler.

So far it only fails when cnt and rcalc are directly connected via
a pipe. When cnt writes to a file, then that file is complete.
Rcalc reading from such a file also procuces correct output.

My previous test cases for cnt only tested very small sequences.
I've now cranked up those tests (getting the job killed a few times
by exhausting 6 GB of RAM), and didn't find any problems under those
testing conditions.

One of the next steps might be to improvise a "tee" program for
Windows to see what happens to the data in between the two (and
whether the problem still persists then).

Then I'll have to figure out how I can get the MS debugger to invoke
such a chained pipeline for stepping through the process.
So far I could only take a glance when rcalc output was blocked for
some reason, so I had time to attach the debugger from the outside.
But that only gave me a static picture, which didn't really help.

-schorsch

···

Am 2016-03-25 23:49, schrieb Gregory J. Ward:

Really strange... I definitely don't see problems on Unix. Did you
say that the problem goes away with intermediate files? Which
commands succeed and which ones fail. Can we figure out where bytes
are being lost?

-Greg

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 25, 2016 3:13:17 PM PDT

Sorry for the weird sequence in here...

Deadlock issue resolved.
Now the testing with call_two() also results in the output
of rcalc being cut short by one or several values.

The problem starts at n=1330, where one value is lost.
Starting with n=4833, the last two values get missing.
Starting with n=9876, the last three values are dropped.
At least approximately. Sometimes it's a value less or more.
Any logic to those numbers? I have no idea.

By the way, the effect is independent of the output format.
This also happens with ASCII output.
Somehow I get the impression, this might be a different problem
than we were originally looking for, and I'm not sure yet if
cnt or rcalc are to blame.

-schorsch

Am 2016-03-25 20:19, schrieb Georg Mischler:

Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.
The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.
But...
turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.
Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.
Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...
Ah, and first I should probably create a few test cases to cover this
kind of bug.

round two:
I've detected the problem by redirecting the output of rcalc
into a file, which ends up too small.
It always happens when running in bash.
It currently gives the correct output about one time out of
four running in cmd.exe.
When reading from a file instead of through a pipe from cnt,
the output is always correct.
Adding a fflush() doesn't change anyhting, as mentioned the
exit() will do that anyway.
There's no problem when feeding the same data through
call_one() from pyradlib.
Passing in the output of cnt through call_two() currently runs
into a deadlock that I can't quite explain.
It's getting more and more mysterious...
-schorsch

_______________________________________________
Radiance-dev mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-dev

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Turns out the GitHub console brings a nice set of unix utilities.
The tee-d output of cnt is correct in all tested cases.
Reading that file via filename argument in rcalc gives
the correct output again.
Feeding the same data via a pipe from cat causes rcalc to
exhibit the bug.
So it's definitively rcalc that's losing data.

-schorsch

···

Am 2016-03-26 08:09, schrieb Georg Mischler:

No problems with the gcc binaries on Windows either.

I tend to agree with Randolph. This is likely a problem that comes to
the surface due to the different memory layout applied by the MS
compiler.

So far it only fails when cnt and rcalc are directly connected via
a pipe. When cnt writes to a file, then that file is complete.
Rcalc reading from such a file also procuces correct output.

My previous test cases for cnt only tested very small sequences.
I've now cranked up those tests (getting the job killed a few times
by exhausting 6 GB of RAM), and didn't find any problems under those
testing conditions.

One of the next steps might be to improvise a "tee" program for
Windows to see what happens to the data in between the two (and
whether the problem still persists then).

Then I'll have to figure out how I can get the MS debugger to invoke
such a chained pipeline for stepping through the process.
So far I could only take a glance when rcalc output was blocked for
some reason, so I had time to attach the debugger from the outside.
But that only gave me a static picture, which didn't really help.

-schorsch

Am 2016-03-25 23:49, schrieb Gregory J. Ward:

Really strange... I definitely don't see problems on Unix. Did you
say that the problem goes away with intermediate files? Which
commands succeed and which ones fail. Can we figure out where bytes
are being lost?

-Greg

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 25, 2016 3:13:17 PM PDT

Sorry for the weird sequence in here...

Deadlock issue resolved.
Now the testing with call_two() also results in the output
of rcalc being cut short by one or several values.

The problem starts at n=1330, where one value is lost.
Starting with n=4833, the last two values get missing.
Starting with n=9876, the last three values are dropped.
At least approximately. Sometimes it's a value less or more.
Any logic to those numbers? I have no idea.

By the way, the effect is independent of the output format.
This also happens with ASCII output.
Somehow I get the impression, this might be a different problem
than we were originally looking for, and I'm not sure yet if
cnt or rcalc are to blame.

-schorsch

Am 2016-03-25 20:19, schrieb Georg Mischler:

Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.
The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.
But...
turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.
Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.
Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...
Ah, and first I should probably create a few test cases to cover this
kind of bug.

round two:
I've detected the problem by redirecting the output of rcalc
into a file, which ends up too small.
It always happens when running in bash.
It currently gives the correct output about one time out of
four running in cmd.exe.
When reading from a file instead of through a pipe from cnt,
the output is always correct.
Adding a fflush() doesn't change anyhting, as mentioned the
exit() will do that anyway.
There's no problem when feeding the same data through
call_one() from pyradlib.
Passing in the output of cnt through call_two() currently runs
into a deadlock that I can't quite explain.
It's getting more and more mysterious...
-schorsch

_______________________________________________
Radiance-dev mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-dev

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

I tend to agree with Randolph. This is likely a problem that comes to
the surface due to the different memory layout applied by the MS
compiler.

I suppose it could also be a timing problem. Perhaps the Windows stdio library is multi-threaded and this somehow interacts with pipes.

Hmmm…or perhaps MSVC’s code generation “optimizer” is creating problems. Might be worth setting it to the lowest level and see if that helps.

Then I'll have to figure out how I can get the MS debugger to invoke
such a chained pipeline for stepping through the process.
So far I could only take a glance when rcalc output was blocked for
some reason, so I had time to attach the debugger from the outside.
But that only gave me a static picture, which didn't really help.

I wonder if the Dr. Memory debugging tool (http://www.drmemory.org/) would help?

Randolph

···

On Mar 26, 2016, at 12:09 AM, Georg Mischler <[email protected]> wrote:

I tend to agree with Randolph. This is likely a problem that comes to
the surface due to the different memory layout applied by the MS
compiler.

I suppose it could also be a timing problem. Perhaps the Windows stdio
library is multi-threaded and this somehow interacts with pipes.

For a while I also had that suspicion, especially since in cmd.exe,
things go right (or at least slightly less wrong) once in a while.
But memory alignment doesn't have to be identical on each run, and I
don't see a a reason for making straight stdio calls multi-threaded,
unless you explicitly use the "overlapping io" functions.
We would also find lots of complaints online, if correct use of
stdio could lead to intermittent failure.

Hmmm…or perhaps MSVC’s code generation “optimizer” is creating
problems. Might be worth setting it to the lowest level and see if
that helps.

This is a debug build with zero optimization.

Then I'll have to figure out how I can get the MS debugger to invoke
such a chained pipeline for stepping through the process.
So far I could only take a glance when rcalc output was blocked for
some reason, so I had time to attach the debugger from the outside.
But that only gave me a static picture, which didn't really help.

I wonder if the Dr. Memory debugging tool (http://www.drmemory.org/) would help?

I'll look into it, thanks!

-schorsch

···

Am 2016-03-26 08:27, schrieb Randolph Fritz:

On Mar 26, 2016, at 12:09 AM, Georg Mischler <[email protected]> > wrote:

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Input is read from stdin by rcalc using this line in getinputrec():

  return(fgets(inpbuf, INBSIZ, fp) != NULL);

I don't know why fgets() would fail on a pipe of ASCII data, but it sounds like that's what is happening in some cases.

-Greg

···

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 26, 2016 2:49:05 AM PDT

Am 2016-03-26 08:27, schrieb Randolph Fritz:

On Mar 26, 2016, at 12:09 AM, Georg Mischler <[email protected]> wrote:

I tend to agree with Randolph. This is likely a problem that comes to
the surface due to the different memory layout applied by the MS
compiler.

I suppose it could also be a timing problem. Perhaps the Windows stdio
library is multi-threaded and this somehow interacts with pipes.

For a while I also had that suspicion, especially since in cmd.exe,
things go right (or at least slightly less wrong) once in a while.
But memory alignment doesn't have to be identical on each run, and I
don't see a a reason for making straight stdio calls multi-threaded,
unless you explicitly use the "overlapping io" functions.
We would also find lots of complaints online, if correct use of
stdio could lead to intermittent failure.

Hmmm…or perhaps MSVC’s code generation “optimizer” is creating
problems. Might be worth setting it to the lowest level and see if
that helps.

This is a debug build with zero optimization.

Then I'll have to figure out how I can get the MS debugger to invoke
such a chained pipeline for stepping through the process.
So far I could only take a glance when rcalc output was blocked for
some reason, so I had time to attach the debugger from the outside.
But that only gave me a static picture, which didn't really help.

I wonder if the Dr. Memory debugging tool (http://www.drmemory.org/) would help?

I'll look into it, thanks!

-schorsch

Maybe we should change that to fread()? fgets() is text-oriented.

···

--
Randolph M. Fritz, Lighting Design and Simulation
+1 206 659-8617 || [email protected]

On Sat, Mar 26, 2016 at 8:33 AM, Gregory J. Ward <[email protected]> wrote:

Input is read from stdin by rcalc using this line in getinputrec():

        return(fgets(inpbuf, INBSIZ, fp) != NULL);

I don't know why fgets() would fail on a pipe of ASCII data, but it sounds
like that's what is happening in some cases.

-Greg

> From: Georg Mischler <[email protected]>
> Subject: Re: [Radiance-dev] Pipe problems on Windows
> Date: March 26, 2016 2:49:05 AM PDT
>
> Am 2016-03-26 08:27, schrieb Randolph Fritz:
>> On Mar 26, 2016, at 12:09 AM, Georg Mischler <[email protected]> > wrote:
>>> I tend to agree with Randolph. This is likely a problem that comes to
>>> the surface due to the different memory layout applied by the MS
>>> compiler.
>> I suppose it could also be a timing problem. Perhaps the Windows stdio
>> library is multi-threaded and this somehow interacts with pipes.
>
> For a while I also had that suspicion, especially since in cmd.exe,
> things go right (or at least slightly less wrong) once in a while.
> But memory alignment doesn't have to be identical on each run, and I
> don't see a a reason for making straight stdio calls multi-threaded,
> unless you explicitly use the "overlapping io" functions.
> We would also find lots of complaints online, if correct use of
> stdio could lead to intermittent failure.
>
>
>> Hmmm…or perhaps MSVC’s code generation “optimizer” is creating
>> problems. Might be worth setting it to the lowest level and see if
>> that helps.
>
> This is a debug build with zero optimization.
>
>
>>> Then I'll have to figure out how I can get the MS debugger to invoke
>>> such a chained pipeline for stepping through the process.
>>> So far I could only take a glance when rcalc output was blocked for
>>> some reason, so I had time to attach the debugger from the outside.
>>> But that only gave me a static picture, which didn't really help.
>> I wonder if the Dr. Memory debugging tool (http://www.drmemory.org/)
would help?
>
> I'll look into it, thanks!
>
> -schorsch

_______________________________________________
Radiance-dev mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-dev

It only calls fgets() when the input is ASCII, as it is in the case of reading from cnt. I do call fread() for loading binary input.

-G

···

From: "Randolph M. Fritz" <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 26, 2016 11:48:18 AM PDT

Maybe we should change that to fread()? fgets() is text-oriented.

--
Randolph M. Fritz, Lighting Design and Simulation
+1 206 659-8617 || [email protected]

On Sat, Mar 26, 2016 at 8:33 AM, Gregory J. Ward <[email protected]> wrote:
Input is read from stdin by rcalc using this line in getinputrec():

        return(fgets(inpbuf, INBSIZ, fp) != NULL);

I don't know why fgets() would fail on a pipe of ASCII data, but it sounds like that's what is happening in some cases.

-Greg

> From: Georg Mischler <[email protected]>
> Subject: Re: [Radiance-dev] Pipe problems on Windows
> Date: March 26, 2016 2:49:05 AM PDT
>
> Am 2016-03-26 08:27, schrieb Randolph Fritz:
>> On Mar 26, 2016, at 12:09 AM, Georg Mischler <[email protected]> wrote:
>>> I tend to agree with Randolph. This is likely a problem that comes to
>>> the surface due to the different memory layout applied by the MS
>>> compiler.
>> I suppose it could also be a timing problem. Perhaps the Windows stdio
>> library is multi-threaded and this somehow interacts with pipes.
>
> For a while I also had that suspicion, especially since in cmd.exe,
> things go right (or at least slightly less wrong) once in a while.
> But memory alignment doesn't have to be identical on each run, and I
> don't see a a reason for making straight stdio calls multi-threaded,
> unless you explicitly use the "overlapping io" functions.
> We would also find lots of complaints online, if correct use of
> stdio could lead to intermittent failure.
>
>
>> Hmmm…or perhaps MSVC’s code generation “optimizer” is creating
>> problems. Might be worth setting it to the lowest level and see if
>> that helps.
>
> This is a debug build with zero optimization.
>
>
>>> Then I'll have to figure out how I can get the MS debugger to invoke
>>> such a chained pipeline for stepping through the process.
>>> So far I could only take a glance when rcalc output was blocked for
>>> some reason, so I had time to attach the debugger from the outside.
>>> But that only gave me a static picture, which didn't really help.
>> I wonder if the Dr. Memory debugging tool (http://www.drmemory.org/) would help?
>
> I'll look into it, thanks!
>
> -schorsch

It seems that Windows fread() sometimes returns a short record when reading
from a pipe. See
https://github.com/jmacd/xdelta/issues/101#issuecomment-85332549.

I still don't understand rcalc's behavior; it seems to me like the short
record should be treated as EOF, but maybe I'm missing something.

It looks like we're dealing with a broken fgets() included
with Visual Studio 2015 Community edition.

When a newline character falls exactly to the end of the
pipe buffer, it will be ignored. This means that instead of
"\t1328\n" the received string will be "\t1328\t1329\n".
Any time that happens, nrecs is only incremented once for
two actual input values, which accounts for the lower nuber
of output values in the end.

Guess I'll have to try if our own fgetline() has better success.

But again, this is probably not the "garbage date from binary
pipe" problem that we were previously discussing. We should still
look for test cases to identify that one.

Cheers
-schorsch

···

Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.

The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.

But...

turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.

Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.

Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...

Ah, and first I should probably create a few test cases to cover this
kind of bug.

Cheers
-schorsch

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.

If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.

I suppose a simple test would be something like:

  cnt 37 | rcalc -of -e '$1=recno' | total -if

This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.

We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

I agree this is probably not the error we have seen before, though it is an important one. We might think about writing an fgets() replacement for Windows, rather than using fgetline(), which has slightly different semantics. We should replace it at the library level, so it will propagate to all potentially affected tools. It's hard to believe that such a simple, basic function call would be broken in this way....

Good sleuthing, Schorsch!

-G

···

From: Georg Mischler <[email protected]>
Subject: Re: [Radiance-dev] Pipe problems on Windows
Date: March 26, 2016 3:20:09 PM PDT

It looks like we're dealing with a broken fgets() included
with Visual Studio 2015 Community edition.

When a newline character falls exactly to the end of the
pipe buffer, it will be ignored. This means that instead of
"\t1328\n" the received string will be "\t1328\t1329\n".
Any time that happens, nrecs is only incremented once for
two actual input values, which accounts for the lower nuber
of output values in the end.

Guess I'll have to try if our own fgetline() has better success.

But again, this is probably not the "garbage date from binary
pipe" problem that we were previously discussing. We should still
look for test cases to identify that one.

Cheers
-schorsch

Am 2016-03-25 15:35, schrieb Georg Mischler:

Moving this to a seperate thread.
The sequence below consistently gives me 703 on Vista, with the only
difference that the DOS box asks for double quotes.
But...
turning up n to values beyond 2000, the MSC binary of rcalc begins to
write(!) some bytes less(!) to stdout. Which obviously falsifies the
result of the chain.
Interesingly, the NREL binary doesn't do that.
Rob mentioned using gcc, so there seems to be a disagreement between
the two compilers as to the semantics of writing to stdout on program
termination.
Going to have some discussion with the debugger on this one.
I'd only be too happy if a simple flush() would solve the problem...
Ah, and first I should probably create a few test cases to cover this
kind of bug.
Cheers
-schorsch

I've searched for similar complaints online. In the few instances I've
found, it usually was because a terminating null byte wasn't written
to the receiving buffer for some reason. The purportedly received
garbage data was then simply the previous random contents of that
buffer. That may or may not be the cause here as well.
If there really was an inherent problem with using pipes on Windows,
then I'm sure I would have found a lot more information about it.

Well, in our case, it's not about null bytes not being sent -- it's
about knowing exactly when we've reached end-of-data, which we expect
the system to tell us in some cases. Radiance's binary formats for
octrees, ambient files, pictures, etc., we know when we've reached EOD
regardless because the file header tells us how much to expect.
However, when we're sending binary streams of floats to rcalc, which
is simply operating on them and counting on the OS to stop sending
data when it's out of data, we run into trouble if the OS doesn't tell
us exactly when the party is over.
I suppose a simple test would be something like:
  cnt 37 | rcalc -of -e '$1=recno' | total -if
This should give us a value of 703, or n*(n+1)/2 for any n (i.e.,
37*(37+1)/2==703). We could try running the above on a Windows box
with a FAT or ExFAT filesystem to determine if this is a problem or
not. We should probably try it with some large numbers as well, being
aware that we end on a 128-byte boundary when n is a multiple of 32.
We can also try it while writing with an intermediate file between
rcalc and total, to see if that makes any difference.