Greg Ward wrote:
Is Everything broken on Linux, or does it just seem that way?
The problem appears to come from the specification of the NFS
protocol version 2, so it's not necessarily limited to Linux:
http://www.linux.se/doc/HOWTO/Secure-Programs-HOWTO/avoid-race.html
> if the lock file may be on an NFS-mounted filesystem, then
> you have the problem that NFS version 2 doesn't completely
> support normal file semantics.
> ...
> NFS version 3 added support for O_EXCL mode in open(2); see
> IETF RFC 1813, in particular the "EXCLUSIVE" value to the
> "mode" argument of "CREATE". Sadly, not everyone has switched
> to NFS version 3 or higher at the time of this writing, so
> you can't depend on this yet in portable programs. Still,
> in the long run there's hope that this issue will go away.
There's another idea that I ran into on my searches (as a reply
to someone with a similar problem to ours):
http://lists.linux.org.au/archives/linuxcprogramming/2002-July/msg00054.html
> Surely the simple solution is to write a network lock server
> (which would take a couple of hours) or grab one from
> somewhere (which might take even less time). Then you can
> forget about special cases for different platforms and file
> systems.
This is a less involved variation of what I was planning to try.
I thought that once I have a server running, I can just as well
feed all the data through it. The more lightweight implementation
suggested here just establishes the lock, while the ambient data
still goes directly to the file.
An example implementation that we might even be able to use
and/or modify can be found here:
Lock Server download | SourceForge.net
This does a lot more than we need, as it also queues requests,
stores locks to a file to survive a restart, and manages
priorities and TTLs for both queued requests and granted locks.
> Maybe we just have to watch the file for at least a minute to
> decide it's expired.
I thought about this, and that's why I think it's best for the
process to simply keep track of when the lock file says it was
created, and if the date hasn't changed between checks that are
a few minutes apart by the local clock, it's safe to assume
that the process died during an update.
Same idea. Potential pitfall: We'll need to check out which of
the timestamps we can rely on on Windows. I *think* ctime should
be fine. One of the problems here is that Windows file systems
don't use inodes, but store this kind of information with the
file. If I remember correctly, then this means that eg. the atime
gets modified just by looking at it (kind of logically inevitable,
but still weird!).
Unless we are really stupid about how often we check the lock
file, I don't think this scenario will play out in any of our
lifetimes.
Maybe, maybe not. All I know for sure is that it *can* happen.
That was the reason I asked for data (or even just a reasonable
estimation) on the frequency that the lock might typically get
aquired during a large simulation. Maybe we can add a way to log
this information, to get a better idea about the risks involved.
I agree that there will be a point of being "good enough" (my
convoluted scenario isn't 100% safe either), but I would feel
much better if we could decide on this point based on solid data.
If the data shows that it might happen once during my lifetime,
then I'd gladly accept that. If I were to see it twice, I might
have second thoughts...
Ok, independently of the specific implementation details, I see
the following strategies that we could test:
System lock (as implemented for unix)
* direct read access to ambient file
* direct write access to ambient file
* locking through fcntl() (resp. the standard Windows locking
mechanisms, translated through Samba when needed)
Simple lock file
* direct read access to ambient file
* direct write access to ambient file
* locking through lock file, broken after a TTL of n minutes
Complex lock file
* direct read access to ambient file
* direct write access to ambient file
* locking through lock file, broken under protection of a
secondary lock
Locking server
* direct read access to ambient file
* direct write access to ambient file
* locking through a seperate server process
Unidirectional data server
* direct read access to ambient file
* ambient data written through server process
* server may use one of the above locking mechanisms if file
writing is still shared with other processes, or none if it
has exclusive access
Bidirectional data server
* ambient data read through server process
* ambient data written through server process
* server may use one of the above locking mechanisms if file
writing is still shared with other processes, or none if it
has exclusive access
Lots of toys to play with...
-schorsch
···
--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/