Discussion:
Behold the IBM 704!
(too old to reply)
Rainer Weikusat
2019-07-17 20:50:21 UTC
Permalink
Quote from the Python language reference:

,----
| CPython implementation detail: CPython currently uses a
| reference-counting scheme with (optional) delayed detection of
| cyclically linked garbage, which collects most objects as soon as they
| become unreachable, but is not guaranteed to collect garbage
| containing circular references. See the documentation of the gc module
| for information on controlling the collection of cyclic garbage. Other
| implementations act differently and CPython may change. Do not depend
| on immediate finalization of objects when they become unreachable (ex:
| always close files).

[...]

| Some objects contain references to “external” resources such as open
| files or windows. It is understood that these resources are freed when
| the object is garbage-collected, but since garbage collection is not
| guaranteed to happen, such objects also provide an explicit way to
| release the external resource, usually a close() method. Programs are
| strongly recommended to explicitly close such objects. The
| ‘try…finally’ statement provides a convenient way to do this.

`----

Who would believe that, in 2019, 61 years after John McCarthy being
unable to implement reference counting in IBM 704 machine code due to
the way the already existing code used the registers of the machine,
application would still leak file descriptors, locks, windows and
whatever other resource with an allocate/ release semantic exists due to
the clever observation that all memory on an IBM 704 was accessible to
the single program it ran and that detection of "unreachable objects"
could thus be postponed ...
Siri Cruise
2019-07-17 21:57:08 UTC
Permalink
Post by Rainer Weikusat
Who would believe that, in 2019, 61 years after John McCarthy being
unable to implement reference counting in IBM 704 machine code due to
the way the already existing code used the registers of the machine,
Apple uses reference count for objective c and forces the programmer to
distinguish counted forward edges and uncounted back edges. I don't know how
they handle general graphs where back and forward edges can't be distinguished.
--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
The first law of discordiamism: The more energy This post / \
to make order is nore energy made into entropy. insults Islam. Mohammed
Rainer Weikusat
2019-07-18 15:35:00 UTC
Permalink
Post by Siri Cruise
Post by Rainer Weikusat
Who would believe that, in 2019, 61 years after John McCarthy being
unable to implement reference counting in IBM 704 machine code due to
the way the already existing code used the registers of the machine,
Apple uses reference count for objective c and forces the programmer to
distinguish counted forward edges and uncounted back edges. I don't know how
they handle general graphs where back and forward edges can't be distinguished.
The obvious idea would be to put all member objects of the graph into
some sort of dictionary and make all member-member links uncounted. If
complex, linked structures come into play, garbage collection is
essentially window dressing as objects will need to be unlinked with
explicit code, anyway, otherwise, they'll never turn into 'garbage'.
Kaz Kylheku
2019-07-18 18:06:13 UTC
Permalink
Post by Rainer Weikusat
Post by Siri Cruise
Post by Rainer Weikusat
Who would believe that, in 2019, 61 years after John McCarthy being
unable to implement reference counting in IBM 704 machine code due to
the way the already existing code used the registers of the machine,
Apple uses reference count for objective c and forces the programmer to
distinguish counted forward edges and uncounted back edges. I don't know how
they handle general graphs where back and forward edges can't be distinguished.
The obvious idea would be to put all member objects of the graph into
some sort of dictionary and make all member-member links uncounted. If
complex, linked structures come into play, garbage collection is
essentially window dressing as objects will need to be unlinked with
explicit code, anyway, otherwise, they'll never turn into 'garbage'.
That could work for a formal graph data structure, but isn't very
practical for /de facto/ cyclic graphs that arise due to complexities in
the function environment structure. E.g. a function contains a pointer
to a lexical environment which holds some object, which directly or
indirectly holds that function.
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Rainer Weikusat
2019-07-19 14:39:48 UTC
Permalink
Post by Kaz Kylheku
Post by Rainer Weikusat
Post by Siri Cruise
Post by Rainer Weikusat
Who would believe that, in 2019, 61 years after John McCarthy being
unable to implement reference counting in IBM 704 machine code due to
the way the already existing code used the registers of the machine,
Apple uses reference count for objective c and forces the programmer to
distinguish counted forward edges and uncounted back edges. I don't know how
they handle general graphs where back and forward edges can't be distinguished.
The obvious idea would be to put all member objects of the graph into
some sort of dictionary and make all member-member links uncounted. If
complex, linked structures come into play, garbage collection is
essentially window dressing as objects will need to be unlinked with
explicit code, anyway, otherwise, they'll never turn into 'garbage'.
That could work for a formal graph data structure, but isn't very
practical for /de facto/ cyclic graphs that arise due to complexities in
the function environment structure. E.g. a function contains a pointer
to a lexical environment which holds some object, which directly or
indirectly holds that function.
The usual solution to that would be weak/ uncounted references. This
requires a litte more work when contstructing a linked something, opens
a possibility for resource leaks and is something of a wart. But the
ability to tie dynamically allocated objects to lexical scopes such that
they're automatically freed when the scope is exited, which will also
execute whatever other "finalization code" might be needed, is useful
enough that I think it outweighs these drawbacks. Especially in
languages with support for non-local control transfers aka 'exceptions'.

Eg, Python has a weird abstraction something which was dynamically
allocated but whose lifetime should usually bound to that of a lexical
scope called 'the with statement'. This is syntactically more convenient
than full try - finally blocks. But objects managed in this way are
condemned to be executed when the scope is terminated: There's no way to
pass a file opened under control of the with out of this scope unclosed.
Kaz Kylheku
2019-07-19 17:50:36 UTC
Permalink
Post by Rainer Weikusat
Post by Kaz Kylheku
Post by Rainer Weikusat
Post by Siri Cruise
Post by Rainer Weikusat
Who would believe that, in 2019, 61 years after John McCarthy being
unable to implement reference counting in IBM 704 machine code due to
the way the already existing code used the registers of the machine,
Apple uses reference count for objective c and forces the programmer to
distinguish counted forward edges and uncounted back edges. I don't know how
they handle general graphs where back and forward edges can't be distinguished.
The obvious idea would be to put all member objects of the graph into
some sort of dictionary and make all member-member links uncounted. If
complex, linked structures come into play, garbage collection is
essentially window dressing as objects will need to be unlinked with
explicit code, anyway, otherwise, they'll never turn into 'garbage'.
That could work for a formal graph data structure, but isn't very
practical for /de facto/ cyclic graphs that arise due to complexities in
the function environment structure. E.g. a function contains a pointer
to a lexical environment which holds some object, which directly or
indirectly holds that function.
The usual solution to that would be weak/ uncounted references.
Statistically, the usual solution is garbage collection, actually.

The solution usually imagined by a C++ programmer is weak/uncounted
references.

The problem is, the virtual machine doesn't know which pointers should
be that way. Given two adjacent lexical variables in the same scope,
one may have to be weak, and one strong. Determining which one is required
is tantamount to doing the same tree traversal as a garbage
collector.
Post by Rainer Weikusat
Eg, Python has a weird abstraction something which was dynamically
allocated but whose lifetime should usually bound to that of a lexical
scope called 'the with statement'. This is syntactically more convenient
than full try - finally blocks.
In TXR Lisp, I have a with-objects macro. It will run object finalizers
(which are normally hooked to garbage collection):

This is the TXR Lisp interactive listener of TXR 220.
Quit with :quit or Ctrl-D on empty line. Ctrl-X ? for cheatsheet.
1> (defstruct widget nil
(:init (me) (put-line `@me: hello`))
(:fini (me) (put-line `@me: goodbye`)))
#<struct-type widget>
2> (with-objects ((w (new widget)))
(put-line "in-scope"))
#S(widget): hello
in-scope
#S(widget): goodbye
t

A more general with-resources can manage anything:

3> (with-resources ((x 1 (prinl x))
(y 2 (prinl y)))
(prinl 'body))
body
2
1

For each variable bound under with-resources, the clean-up forms
are executed in reverse order.
Post by Rainer Weikusat
But objects managed in this way are
condemned to be executed when the scope is terminated: There's no way to
pass a file opened under control of the with out of this scope unclosed.
That sentence is a good synopsis of the purpose of that statement;
if the file could escape without being closed, the with statement would
be broken.

In TXR Lisp, I put in a crazy mechanism to escape out of a scope without doing
the unwinding, which actually makes such a thing possible. Watch this:

4> (block foo
(with-objects ((w (new widget)))
(put-line "in-scope")
(sys:abscond-from foo w))) ;; analogous to (return-from ...)
#S(widget): hello
in-scope
#S(widget)
5>

abscond-from is in the system package because it's not supposed to be
used generally.

I invented abscond-from in order to support delimited continuations.
When we save a continuation, we can use abscond-from to leave that context.
Then later resume the context using the continuation, with all the resources
intact. If we think about an analogy to multithreading, it makes sense; we
don't unwind a thread's resources when we context switch out of it!

We can make a little demo out of this:

First, we create a suspended yield block using an "obtain" operator,
getting a function in return:

5> (obtain-block foo
(with-objects ((w (new widget)))
(put-line "in scope")
(yield-from foo w)
(put-line "leaving scope")
(yield-from foo 42)))
#<vm fun: 0 param + 1 optional>

Now we can call the function:

6> [*5]
#S(widget): hello
in scope
#S(widget)

The code starts to execute and reaches the yield. The yield gives us back
the widget! Under the hood it has used sys:abscond.

We invoke the function again (at this point, we could pass in a value:
yield-from supports two-way communication):

7> [*5]
leaving scope
42

Now we reach the second yield which produces 42.

We call it again:

8> [*5]
#S(widget): goodbye
nil

And, blam, we have hit the end. The with-objects construct terminates
and finalizes the widget; the delimited continuation now executes up
to the delimited prompt, encountering no yields, and we get the return
value of the obtain-block.

The object system also supports the concept of finalizing if construction
aborts. Suppose we make a widget2 whose :init function throws an exception:

9> (defstruct widget2 nil
(:init (me) (put-line `@me: hello`) (error 'oops))
(:fini (me) (put-line `@me: goodbye`)))
#<struct-type widget2>

Then try to instantiate it:

10> (new widget2)
#S(widget2): hello
#S(widget2): goodbye
** oops
** during evaluation at expr-9:2 of form (error 'oops)
** run with --backtrace to enable backtraces
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Siri Cruise
2019-07-18 21:38:02 UTC
Permalink
Post by Rainer Weikusat
Post by Siri Cruise
Post by Rainer Weikusat
Who would believe that, in 2019, 61 years after John McCarthy being
unable to implement reference counting in IBM 704 machine code due to
the way the already existing code used the registers of the machine,
Apple uses reference count for objective c and forces the programmer to
distinguish counted forward edges and uncounted back edges. I don't know how
they handle general graphs where back and forward edges can't be distinguished.
The obvious idea would be to put all member objects of the graph into
some sort of dictionary and make all member-member links uncounted. If
complex, linked structures come into play, garbage collection is
essentially window dressing as objects will need to be unlinked with
explicit code, anyway, otherwise, they'll never turn into 'garbage'.
When you cut an edge of a general DAG, you don't know whether that isolates part
of the graph from the sources without checking the entire graph for
connectiveness. Regarding isolated subgraphs as garbage, you can't reliably
detect garbage without some sort of mark and sweep. Since a DAG models any
pointer linking, what you say about a DAG, you say about pointer links.

I use the Boehm-Demers-Weiser collector with finalisers. I no longer have to
write free routines or worry about resource leaks.
--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
The first law of discordiamism: The more energy This post / \
to make order is nore energy made into entropy. insults Islam. Mohammed
Rainer Weikusat
2019-07-19 14:20:22 UTC
Permalink
Post by Siri Cruise
Post by Rainer Weikusat
Post by Siri Cruise
Post by Rainer Weikusat
Who would believe that, in 2019, 61 years after John McCarthy being
unable to implement reference counting in IBM 704 machine code due to
the way the already existing code used the registers of the machine,
Apple uses reference count for objective c and forces the programmer to
distinguish counted forward edges and uncounted back edges. I don't know how
they handle general graphs where back and forward edges can't be distinguished.
The obvious idea would be to put all member objects of the graph into
some sort of dictionary and make all member-member links uncounted. If
complex, linked structures come into play, garbage collection is
essentially window dressing as objects will need to be unlinked with
explicit code, anyway, otherwise, they'll never turn into 'garbage'.
When you cut an edge of a general DAG, you don't know whether that isolates part
of the graph from the sources without checking the entire graph for
connectiveness. Regarding isolated subgraphs as garbage, you can't reliably
detect garbage without some sort of mark and sweep. Since a DAG models any
pointer linking, what you say about a DAG, you say about pointer links.
To someone convinced that nothing but nails matter, all tools except
hammers seem useless.
James K. Lowden
2019-07-18 15:36:53 UTC
Permalink
On Wed, 17 Jul 2019 21:50:21 +0100
| It is understood that these resources are freed when
| the object is garbage-collected, but since garbage collection is not
| guaranteed to happen, such objects also provide an explicit way to
| release the external resource, usually a close() method.
Who would believe that, in 2019, 61 years after John McCarthy ...
application would still leak file descriptors
ISTM you're blowing this out of proportion. While GC circles are
possible, and therefore close() methods need to be provided, that's a
pretty academic concern. Unless you're using Python to write a daemon,
who cares about resource leaks?

I don't know anyone who explicitly closes files in Python. The script
will terminate, and the OS will release resources allocated to the
process. GC is just a tempest in a teapot.

--jkl
Rainer Weikusat
2019-07-18 20:38:30 UTC
Permalink
Post by James K. Lowden
| It is understood that these resources are freed when
| the object is garbage-collected, but since garbage collection is not
| guaranteed to happen, such objects also provide an explicit way to
| release the external resource, usually a close() method.
Who would believe that, in 2019, 61 years after John McCarthy ...
application would still leak file descriptors
ISTM you're blowing this out of proportion. While GC circles are
possible, and therefore close() methods need to be provided, that's a
pretty academic concern. Unless you're using Python to write a daemon,
who cares about resource leaks?
I don't know anyone who explicitly closes files in Python.
A function/ method should not leak resources as this makes it less
universally useful. That's not just a "But I'm convinced I can get away
with it!"-question. There are also situations where "closing a file
descriptor" has effects beyond "freeing resources", eg, in case of some
sort of "IPC stream". Because the Python garbage collection behaviour is
undefined to accomodate tracing collectors, this means (as seen from the
perspective of the program) non-deterministich finalization, hence,
anything but memory has to be managed manually.
Ian Zimmerman
2019-07-19 17:09:59 UTC
Permalink
Post by James K. Lowden
I don't know anyone who explicitly closes files in Python. The script
will terminate, and the OS will release resources allocated to the
process. GC is just a tempest in a teapot.
The recommended style, and also the most prevalent one IME, is to use a
with block:

with open(fn) as fh:
use(fh)

# fh is closed here

Not "explicit", but it gets closed anyway :)
--
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.
James K. Lowden
2019-07-19 19:15:10 UTC
Permalink
On Fri, 19 Jul 2019 10:09:59 -0700
Post by Ian Zimmerman
use(fh)
# fh is closed here
Right. Rainer's point of course is that fh isn't necessarily closed
there. It goes out of scope and is inaccessible to Python, but it's
indeterminate when GC will call close(2).

AFAIK that's a general GC problem. Either you let the GC free OS
resources in its own good time, or you allow/force the programmer to do
it explicitly, if only indirectly through "finalize" or somesuch.

Or, I guess, certain resources -- e.g., IPC connections -- could be
marked as "free immediately", and interpreter would call some kind a
destructor when it goes out of scope. That's just a guess. Maybe
someone who knows something about it would like to comment....

--jkl
Rainer Weikusat
2019-07-19 21:34:57 UTC
Permalink
Post by James K. Lowden
On Fri, 19 Jul 2019 10:09:59 -0700
Post by Ian Zimmerman
use(fh)
# fh is closed here
Right. Rainer's point of course is that fh isn't necessarily closed
there. It goes out of scope and is inaccessible to Python, but it's
indeterminate when GC will call close(2).
It will be closed there. The with statement wraps the enclosed block
(it's suite) in calls of the __enter__ and __exit__ method of a context
manage object returned by the expression which comes after the with. The
return value of the __enter__ method is bound to the name (after the
as). Exceptions raised while executing the block will be caught
automatically and will be passed to the __exit__ method which may chose
to suppress them. For the given case, the effect is the same as if

f = open(fn);
try:
use(f)
finally:
f.close()

had been used instead.

That's a pretty complicated facilitate solely intended to work around a
deficency of tracing collectors, namely, that they may end up
postponing execution of finalization code for an indeterminate time by
using special-case syntax to cover what's believed to be the most common
case where this is a problem: Allocation of something which needs to be
freed at the end of the lifetime of the scope it belonged to as the
collector cannot detect if future allocation would suceed and as
realtime deallocation might be called for regardless of this.

This follows a typical pattern in 'GC programming': Whenever another bug/
misfeature is identified, make the code more complicated to work around
it 'most of the time'. The obvious drawback of this 'pattern' is that
there's no way to pass the file out of the scope unclosed. IOW,
something like this

---------
use POSIX;
use Socket;

sub do_aton
{
my $ip;

$ip = inet_aton($_[0]);
$ip // die("no idea what $_[0] is supposed to be");

return $ip;
}

sub listen_on
{
my ($host, $port) = @_;
my ($rc, $sk);

$rc = socket($sk, AF_INET, SOCK_STREAM, 0);
$rc or die("socket: $!");

$rc = bind($sk, pack_sockaddr_in($port, do_aton($host)));
$rc or die("bind: $!");

$rc = listen($sk, 10);
$rc or die("listen");

return $sk;
}

eval {
my $sk = listen_on($ARGV[0], $ARGV[1] // 1234);
pause();
};

if ($@) {
# socket will be closed here
die("something went wrong: $@");
}
-------

with listen_on either returning a listening socket or throwing an
exception with the socket file descriptor being freed or not by the
runtime, can't be expressed in this way.

It's obviously possible to work around that by making the code more
complicated :->

[...]
Post by James K. Lowden
Or, I guess, certain resources -- e.g., IPC connections -- could be
marked as "free immediately", and interpreter would call some kind a
destructor when it goes out of scope.
That's not possible as the interpreter would need to track references in
order to accomplish this. As McCarthy found himself unable to implement
this for early LISP because of limitations of the IBM 704 hardware,
that's a Bad Thing[tm] and any amount of effort is justified to avoid
it.
m***@gmail.com
2020-09-21 11:42:32 UTC
Permalink
Post by Rainer Weikusat
,----
| CPython implementation detail: CPython currently uses a
| reference-counting scheme with (optional) delayed detection of
| cyclically linked garbage, which collects most objects as soon as they
| become unreachable, but is not guaranteed to collect garbage
| containing circular references. See the documentation of the gc module
| for information on controlling the collection of cyclic garbage. Other
| implementations act differently and CPython may change. Do not depend
| always close files).
[...]
| Some objects contain references to “external” resources such as open
| files or windows. It is understood that these resources are freed when
| the object is garbage-collected, but since garbage collection is not
| guaranteed to happen, such objects also provide an explicit way to
| release the external resource, usually a close() method. Programs are
| strongly recommended to explicitly close such objects. The
| ‘try…finally’ statement provides a convenient way to do this.
`----
Who would believe that, in 2019, 61 years after John McCarthy being
unable to implement reference counting in IBM 704 machine code due to
the way the already existing code used the registers of the machine,
application would still leak file descriptors, locks, windows and
whatever other resource with an allocate/ release semantic exists due to
the clever observation that all memory on an IBM 704 was accessible to
the single program it ran and that detection of "unreachable objects"
could thus be postponed ...
Loading...