00:38:17 <esolangs> [[Functional paradigm]] N https://esolangs.org/w/index.php?oldid=166334 * Corbin * (+1821) "Moving" a category page to main namespace, per IRC discussion.
00:38:37 <esolangs> [[Category:Functional paradigm]] https://esolangs.org/w/index.php?diff=166335&oldid=162713 * Corbin * (-1795) Fork to [[functional paradigm]] to put content in main namespace, per IRC discussion.
00:40:19 <korvo> Sgeo: I'd say yes to all of that.
01:14:42 <esolangs> [[Smalltix]] https://esolangs.org/w/index.php?diff=166336&oldid=166295 * Corbin * (+388) Document the core of the correspondence between Smalltalk and Unix. This is technically a surjection rather than an isomorphism but we will paper over that.
01:27:59 <sorear> korvo: nql compiler works for me with python 3.12.11 and pyparsing 3.2.3? is the patch something specifically required by your harness, or is there still something weird with my setup?
01:29:00 <sorear> i think i've figured out how ajwade's variable length PC works well enough to try it
02:12:54 <korvo> sorear: Probably my harness. Nix says that I'm also on CPython 3.12.11 with pyparsing 3.2.3. Perhaps order of imports is important? I import framework, nqlast, nqlgrammar in that order.
02:13:46 <korvo> ...Although I've just tried rebuilding the entire BB Gauge with your repository as upstream and the 8yr commit as the target revision, and everything appears to work. So perhaps my commit's not needed.
02:32:16 -!- ais523 has joined.
02:32:53 <ais523> Sgeo: quite a few of the languages I was working with during my PhD were call-by-name and Algol-based, so I used an actual Algol 60 implementation to test some of the programs (by translating them)
02:33:52 <ais523> the non-fixed syntax looks pretty esoteric to modern eyes – at the time, each implementation was expected to come up with its own syntax and the syntax used in the specification was designed for typesetting, not programming in
02:34:07 <ais523> did Algol 60 also allow spaces in identifiers? or was that just Algol 68?
02:34:44 <ais523> (incidentally, Algol-68 does have I/O but it looks very esoteric to modern eyes)
02:40:58 <zzo38> How is the I/O of Algol-68? Is there a program to convert the syntax for programming to the syntax for typesetting? (I think WEB does something similar, but different)
02:44:21 <Sgeo> For Algol 60, iiuc different computers had different programming syntaxes. Some put keywords in quotes (e.g. 'BEGIN'), some capitalized
02:44:31 <b_jonas> ais523: doesn't the non-fixed syntax mean only that keywords can be represented as single symbols or short combinations or full words depending on how capable your input devices (eg. card reader) are, since Algol may be running from five-bit telegraph with two shift modes and only like 55 usable characters, or an EBCDIC card reader that can recognize 256 characters that you can each punch by
02:45:28 <b_jonas> BASIC is kind of like this too: on some microcomputers you can type BASIC keywords from letters, on others you can only enter them as a single shifted keyboard symbol that's only shown on screen as letters
02:45:45 <ais523> b_jonas: yes, but also keywords and variable names could be the same and so implementations needed a way to disambiguate (in the Algol 68 specification, this was done using different fonts)
02:45:51 <zzo38> I have seen that before
02:46:24 <b_jonas> and when you don't have every printer standardized on being able to print most of ASCII then it would be silly to say that some symbol must be represented as exactly a left square bracket, another as a yen sign etc, just use whatever your printer can show
02:47:52 <b_jonas> ais523: makes sense, that means you can even have versions with built-ins represented in different natural languages, like Excel or LOGO. you could even use a terminal that doesn't have latin letters.
02:48:14 <ais523> I'm trying to remember how Algol 68's I/O works
02:48:23 <ais523> I think all the files had to be opened before the program starts, although I'm not 100% sure
02:49:27 <b_jonas> have I complained lately about how hard it is to find on the internet sources that are written in English and list *all* the Russian abbreviations for SI units and prefixes?
02:49:48 <b_jonas> why doesn't someone have a complete table somewhere? is it really that hard?
02:50:05 <b_jonas> are you supposed to learn them only from printed university textbooks for engineers or something?
02:50:23 <ais523> hmm, apparently ECMA 6 (which later became ISO 646) was first published in 1965
02:50:30 <b_jonas> also if you only list the russian abbreviation for kilogram, not for gram, then you're doing it wrong
02:50:46 <ais523> so Algol 60 couldn't have used it, and although Algol 68 could have done, there wasn't time for it to have "won" yet
02:51:07 <b_jonas> even if ASCII exists doesn't mean all computers are using it
02:51:32 <ais523> ECMA 6 isn't exactly ASCII, it's a precursor to it
02:51:56 <ais523> it's what C was designed against, which is why C has trigraphs (they make it possible to type the ASCII characters that C uses that aren't guaranteed by ECMA 6)
02:58:52 <strerror> Sgeo: It's a bit hard to avoid IO creeping into your standard after systems started widely adopting, well, stdio.
03:01:10 <strerror> So you might want to look at systems that still don't have that. Such as Ecmascript! It looks like the console object isn't actually in the standard. https://tc39.es/ecma262/#sec-ecmascript-standard-built-in-objects
03:02:56 <zzo38> JavaScript does have built-in date/time and random numbers though, which means that it still has some I/O, although console.log is not a core function it is common among multiple implementations (console.log with a single argument which must be a string, might be the most portable way to do output in JavaScript, and if it doesn't have it, it is easy to add it)
03:03:19 <zzo38> (Some people do not consider date/time and random numbers to be I/O, but to me, I consider that it is.)
03:08:54 <b_jonas> strerror: which is why C++ overreacted and went the opposite way. iostreams was basically the first part of C++ that got (at least unofficially) standardized between different implementations, with both the language and the standard library changing a lot since, and the weirdest part of the language were designed specifically to serve iostreams – have you ever seen virtual inheritence used in C++ for
03:09:00 <b_jonas> anything other than implementing the classes of iostreams such that basic_stream can inherit from both basic_istream and basic_ostream but only have one copy of ios and one format flag?
03:09:18 <ais523> I remember, at a previous job, having to explain why a program was producing progress output to the terminal in 4KiB chunks rather than immediately (libc buffering), and then having to explain why libc was involved even though the program was written in Haskell
03:11:13 <ais523> I hate the way that on many OSes you have to go through libc to interact with the OS at all, even though it contains functions that have nothing to do with OS interaction (like strlen)
03:13:16 <b_jonas> ais523: ooh, you had to go through fifty years of computer history with that one. I recently learned that the officially documented API of the Commodore 64 kernal ROM has unix-like file description abstractions in it, where you can open numbered file descriptors that can correspond to either the screen or tape or floppy disk and then you're supposed to write them with a system call for *every byte*,
03:13:22 <b_jonas> even though this is such a silly design that the designers should really have expected that no sane program will use it
03:14:17 <ais523> it is surprising how silly many API designs end up
03:14:45 <b_jonas> and then the whole CP/M thing where there's a standardized operating system interface (without unix-like file descriptors and with files read/writable only in fixed-sized blocks and no byte-granular size) without shared language or CPU
03:15:04 <ais523> CPUID is one of my pet hates – in order to use it you first have to use CPUID request 0, which gives you a value for the highest CPUID request you can use – higher requests are undefined behaviour, lower requests might or might not be implemented, but return all-bits-zero if not implemented
03:15:37 <zzo38> I also did not like that you have to go through libc; my own design is deliberately design that you do not have to use libc (which, in my system, lacks the functions for interacting with the OS anyways) to interact with the OS. (Also, you cannot necessarily interact with the I/O anyways; you can interact with capabilities.)
03:15:44 <b_jonas> ais523: this one is actually less silly than it seems at first, one character at a time does make more sense than it seems at first with how the floppy drive and tape works
03:15:47 <ais523> it would have made so much more sense to say "unimplemented requests return all-bits-zero" so that using request 0 wasn't mandatory, you just make the request you want (because you have to check for the all-bits-zero case anyway)
03:16:04 <zzo38> (And, I do not like CPUID either, so my own design would be one that does not have such a thing, at least for application mode.)
03:16:22 <ais523> the only logical reason I can see for the CPUID design is so that Intel can fill your ebx, ecx and edx registers with advertising
03:16:29 <ais523> (it actually does that, while it returns the result in eax)
03:16:41 <zzo38> (And that about CPUID is not very good either)
03:17:11 <b_jonas> anyway, buffering output before doing the actual write call is both older than unix's unified write operating system API, and even on a typical linux it has lots of reimplementations besides libc
03:17:16 <ais523> I think Intel encouraged people to verify that the CPU was an Intel CPU before trusting the CPUID result (which would have the consequence of programs running non-optimised on non-Intel x86 clones) – and later actually did that themselves in icc
03:17:50 <ais523> but, no sensible manufacturer would create an x86 clones for which the CPUID results had different meanings than on Intel
03:18:16 <b_jonas> it's so old that TAOCP volume 1 explains how to buffer input and output, both because you're doing IO in fixed-sized blocks and because you're using a larger ring buffer for background IO in parallel to computations
03:19:12 <ais523> (the funny thing is, later AMD created their own CPUID requests in the billions, to avoid any likely clashes with Intel's which were all small integers, and enough software started using them (even though it was defined by Intel as UB) that Intel had to partially implement some of them in order to prevent code running more slowly on Intel processors than it would on AMD)
03:19:20 <b_jonas> ais5523: sure, but Intel kind of has to do that because they can't promise that every bit of their documentation will apply to all third-party CPUs made, even with how much they specifically work together with AMD to make the CPUs as compatible as reasonably possible
03:19:41 <ais523> no they don't, they just say "on Intel processors, CPUID works like this"
03:20:08 <ais523> and if a non-Intel processor doesn't match the documentation Intel just blames it on the manufacturer
03:20:58 <b_jonas> but that's not just CPUID, all of their documentation basically says "Intel processors work like this", so much that the architecture programming manual tells you the details of how all CPUs going back to the 8086 work and how to detect if you're running on a modern CPU rather than a 8086 in like five easy steps starting from distinguishing 8086 from 80286 etc
03:22:23 <ais523> right, you're supposed to start by seeing whether certain flags bits keep their value when you try to set them, I think?
03:22:27 <ais523> (rather than reverting back to 0)
03:22:46 <ais523> and that detects enough early CPUs that you can rule out all the ones that don't implement CPUID, and then use CPUID
03:23:33 <b_jonas> no, the flag bit distinguishes between pentiums with or without CPUID, that's near the last step, I think there are three or four more steps before. IIRC the first is to check what push SP pushes to see if you're on a 8086 or 80286, but I forget how you test for a 80286 vs 80386, then 80386 versus later
03:24:05 <b_jonas> I might be misremembering, I should look this up
03:24:22 <ais523> hmm, maybe INTERCAL's version test isn't so unique after all
03:26:08 <ais523> in any case, most software nowadays doesn't support anything earlier than i686 when compiling for 32-bit x86
03:26:23 <ais523> (and increasing amounts of software aren't supporting 32-bit x86 at all)
03:31:43 <b_jonas> no, you were closer to right. Intel architecutre manual volume 1 chapter 20.1.2. the test between 8086 vs 80286 vs 80386 or newer is with FLAGS: top bit is always set on 8086 but always clear on others in real mode but you *can* skip that part, test the three bits below them to see if they are changable on 80386 or newer vs fixed on older (though with different values on 8086 vs 80286);
03:32:57 <b_jonas> then there are two EFLAGS bits but you only really need one which indicates that CPUID is available, but of course to even access the top half of EFLGAS you need to know that you're on a 80386 or newer.
03:34:12 <b_jonas> so it's not as bad as I remembered, it's really only four tests, one to test if a bit in FLAGS can be both set and cleared and retains both, then test if a bit in EFLAGS can be both set and cleared and retains both, and if those all pass you have CPUID. unless of course you actually want compatibility with CPUs older than CPUID.
03:36:30 <b_jonas> though of course the later tests with CPUID are complex, because the official intel documentation doesn't promise you that SSE2 is always available in 64-bit code. mind you, that is actually *correct* from their perspective, because being able to use XMM registers requires operating system support, and even though in practice any 64-bit program can rely on that being there if they even want to use any
03:36:36 <b_jonas> OS ABI, the Intel CPU manual has to describe the more general case where there needn't be a typical operating system running.
03:39:13 <ais523> I think at least Rust does CPUID checks along the lines of "this software was compiled for Windows, so we can assume the existence of any CPU instructions that are required by Windows"
03:39:44 <ais523> …which leads to weird per-OS performance increases because some OSes require newer instructions than others, making software that isn't multiversioned run faster
03:45:38 <b_jonas> hehe, yes, that would lead to a consistent drawback on programs compiled for linux, because there will always be operating systems supporting parts of the linux system call ABI that run on the weirdest CPUs
03:46:13 <b_jonas> whereas most Windows programs these days can just check for at least Windows 10 and bail early on Windows 7 or earlier
03:47:34 <b_jonas> but I still think at least on a linux x86_64 program you can rely on SSE2 being there
03:49:25 <b_jonas> (I wouldn't rely on it being implemented completely correctly; I should go back some day and get a qemu x86_64 guest *without acceleration* to compile and test if it indeed has a bug in what NaN values some SSE instructions return and report the bug if it's there)
03:50:31 <b_jonas> I mean especially if you are linking to libc then the function call ABI requires XMM registers present
03:51:10 <b_jonas> if there's an fabs function, or a printf that can format doubles, then there has to be SSE instructions at least
03:55:23 <ais523> hmm, if the OS hasn't given permission to the CPU to use XMM registers, do attempts to use them actually fail? or do they succeed and just hide the CPUID bit?
03:56:08 <b_jonas> that said, even if the CPU instruction subset part is complicated, the OS ABI part turns out to be pretty simple, because most programs will, whenever they do a unix system call other than group_exit, check if it returns an error with an errno code that they don't specifically handle, and that automatically checks for old OSes not implementing any particular system call.
03:56:11 <ais523> I guess it doesn't really matter
03:56:31 <ais523> your programs will still break if they get context-switched while using registers the OS doesn't know exist
03:56:33 <b_jonas> ais523: IIRC yes, the CPU ABI is that the CPUID bit is only enabled when the operating system has explicitly enabled support for XMM registers;
03:57:02 <b_jonas> though that applies for XMM and YMM and ZMM only, x87/MMX registers also require OS support and I don't know how you test that
03:57:34 <ais523> and now you've made me wonder whether glibc actually puts ENOSYS in errno or whether it just aborts upon seeing that the OS doesn't support a system call it expected to exist
03:57:59 <ais523> it might plausibly depend on which system call you ask for – there are some that glibc will recognise as being conditionally supported
03:58:59 <ais523> it took me a surprisingly long time to realise that the reason why MMX registers are mapped over x87 registers is so that the OS will know how to context-switch them even if it doesn't know about MMX
03:59:53 <b_jonas> ais523: yes, but it's not *just* the OS, I think it's also user-space context switch or light threading libraries
04:00:00 <ais523> it would be logical to map mask registers over x87 for the same reason (so that programs could use masked EVEX-encoded instructions on 128-bit and 256-bit registers even if the OS didn't know about AVX-512)
04:00:07 <b_jonas> well no, sorry, ignore that
04:00:27 <b_jonas> the ABI basically says that MMX can't be initialized around generic function calls, the CPU is always in x87 mode
04:00:35 <b_jonas> so a user-space library doesn't have to support MMX
04:01:04 <ais523> well the ABI also doesn't let you initialise ymm registers around function calls either
04:01:32 <ais523> (and that's *on top* of making all the xmm registers call-clobbered)
04:01:54 <ais523> or maybe it's specifically returns rather than calls
04:01:57 <b_jonas> ais523: I don't think so, mask registers were introduced in AVX512, that's late enough that by that time the CPU architecture exposed a generic interface that operating systems can use to save all the state of a process
04:02:43 <b_jonas> but x87 also has enough complications that it would be very annoying if you mapped something else onto it now
04:03:32 <b_jonas> and they're technically still in the x86_64 linux ABI for passing a long double to a function (like fabsl)
04:03:42 <b_jonas> or do I remember that part wrong?
04:06:03 <ais523> b_jonas: I just checked, long doubles as arguments are passed using stack slots, but a long double return value is returned in ST(0)
04:06:46 <ais523> ST(1) can be used in the specific case of returning a complex long double (but no other cases, e.g. a structure containing two long doubles is returned via outpointer)
04:07:12 <ais523> so you were almost right
04:08:48 <ais523> (this seems slightly illogical to me – "first long double in ST(0), rest in stack slots" would be more consistent with the rest of the ABI – but I didn't design it)
04:31:01 <korvo> sorear: Okay, let's assume that my commit is not necessary. It is meant to patch up an upstream change in pyparsing anyway; as long as everything else works, it's not relevant to our main goal.
04:37:01 <sorear> ais523: preventing user code from writing registers the OS doesn't know about is crucial unless you want covert channels
04:37:29 <ais523> sorear: I don't think Intel has a very good track record of stopping those :-(
04:39:15 <ais523> this reminds me of the trick that was used to implement rseq before it was added as a system call (you change the base of the gs segment in such a way that it looks unchanged to the OS, then if you get context-switched the OS restores gs incorrectly, and you use that to cause the last command in the rseq to fail)
04:39:58 <ais523> although that's the opposite of a covert channel, it gets incorrectly clobbered as opposed to incorrectly non-clobbered
04:40:41 <sorear> "trap on FP register access" has been a standard feature of ISAs forever because people think they want lazy FP register restoring, e.g. cr0.TS, not quite the same as OSXSAVE but
04:41:54 <zzo38> I had wanted to design the CPU to avoid covert channels and other problems with it, as well as some enhancements; security is one issue but there are other issues too.
04:44:06 <ais523> this reminds me of when lazy FP state restore turned out to be exploitable (using speculative execution to leak other processes' FPU registers), but recent-at-the-time Linux was unaffected because they'd disabled it by default a little earlier
04:44:25 <ais523> due to it not being useful as a performance optimisation on modern CPUs
04:45:28 <ais523> I suspect the concern was mostly not so much x87 (unlikely to hold sensitive information) as SSE (which could plausibly hold sensitive information due to being used for inline memcpys)
04:46:45 <ais523> ah, it's specific to recent-at-the-time Linux on recent-at-the-time processors, older processors were still affected because the lazy restore was considered by Linux to be faster on those
04:47:53 <sorear> it's exploitable due to TS being checked too late, not due to an inherent property of the ISA
04:48:25 <ais523> yes, as usual this happened on Intel but not AMD (AMD has had its own specific vulnerabilities but they tend to look different from the Intel-specific ones)
04:48:25 <sorear> but intel did the same thing with page permissions so
04:50:13 <ais523> the recent ARM64-specific one was even more dramatic (the CPU was speculatively reading from register values if they looked like pointers, which could be attacked by getting crypto code to create numbers that looked like pointers internally if the key had a bit in a particular place)
04:50:27 <sorear> (most of my ISA stuff is indexed on riscv since that's what I've been doing since 2016)
04:51:41 <ais523> I think that one's less likely to be a problem on x86es for memory-ordering reasons, reads are acquire-ordered by default on x86 and so speculating on a read before the read instruction appears in the instruction stream is almost useless
04:53:00 <sorear> that sounds like something half-remembered that's either related to value predictors or prefetching
04:53:55 <sorear> dynamically most register values are small integers, all of which are architecturally valid pointers
04:54:28 <ais523> right, but "looking like a pointer" is different from being a valid pointer – it's something that you can predict on (e.g. by seeing which memory addresses are being accessed and looking for addresses that look similar)
04:55:01 <sorear> riscv has svukte now (negative addresses are rejected in U-mode before even hitting the PTW) and A64 probably has had something similar for a while but I don't think anyone's promoted mmap_min_addr to architecture
04:55:16 <zzo38> I think it is better to not put that many complications like that into the CPU since it can cause these kind of problems
04:56:49 <ais523> oh, I see, mmap_min_addr doesn't need to be architectural for security reasons, but it might potentially help for performance if you're trying to figure out what might be a pointer
04:57:06 <sorear> discover problem caused by complexity, solve it by adding more complexity, repeat until full employment is achieved
04:57:50 <korvo> Except for those of us who've burnt out, I suppose.
04:58:19 <zzo38> Another way can be: Redesign most of the computer, operating system, etc. It is not only about complexity and security but also the other problems.
04:59:01 <ais523> zzo38: I would like to do that but am having problems finding enough mental energy to do it
04:59:35 <ais523> although, I think that even though it would be beneficial to redesign everything properly, some parts of it are more beneficial than others (i.e. less effort to change and greater benefit)
05:00:18 <zzo38> I would want to make a discussion group to do it. I have no name for it so far, but I do have many ideas.
05:01:11 <ais523> the "lowest effort to greatest benefit" to me is in the "generalised ABI", i.e. the rules for how a process can use the processor registers and do argument passing and interact with the OS
05:01:34 <ais523> because that could be adopted piecemeal, one program at a time, without breaking existing systems, and yet it's an area with huge scope for changing things
05:02:30 <sorear> like you want to simplify the x86_64 sysv calling convention? what does that benefit, or am I taking you too literally?
05:02:30 <ais523> for example, it would be possible to enforce an object-capability system at that level (via static analysis of the source code or binary)
05:03:06 <ais523> sorear: the calling convention is part of it, btu actually I wanted to complicate it, the current calling convention is very rigid and it causes a lot of register spills as a consequence
05:03:11 <zzo38> For modifying existing systems, I suppose so, but I thought to do a new system; still the ABI (and perhaps those other things) would probably be one part of it though.
05:03:42 <korvo> But couldn't we have object-capability systems via static analysis already? Or is this like CHERI where the security property comes from a conjunction of correct hardware *and* correct software?
05:03:55 <ais523> korvo: we can but only within a single process
05:04:00 <zzo38> (Although, my idea of a system has a small number of system calls (possibly only one).)
05:04:25 <ais523> you need an expanded ABI to allow multiple processes to send capabilities between each other (e.g. via exeec)
05:05:00 <zzo38> I think that a combination of correct hardware and correct software would be a good idea. However, I think CHERI is security within a process and my idea is more about security between processes.
05:05:23 <ais523> even the single-process version would be good though
05:06:32 <sorear> CHERI works within an address space. "Process" can get a bit fuzzy
05:06:36 <ais523> fwiw, I am sceptical of CHERI – I don't think it actually enforces memory safety unless you modify the software to take advantage of it
05:06:55 <ais523> whereas working with almost unmodified software is its only real claimed advantage
05:07:11 <zzo38> And, to send capabilities is by passing messages between processes, including the initial message (the process won't run if the initial message contains no capabilities, unless a debugger is attached)
05:07:47 <korvo> ais523: Or we switch to unguessability. Usually a reference within a process is "unforgeable"; there's formally no tools for constructing references. But whenever we have any sort of coding, we have "unguessable" references instead. Cryptography, ASLR, etc.
05:08:25 <ais523> my working example is C code like «enum user_mode { admin, user }; struct userinfo { char name[12]; enum user_mode mode; }; void set_username(struct userinfo *info, char *name) { if (strlen(name) > 12) return; strcpy(info->name, name); }
05:08:34 <sorear> ? CHERI fails closed. if you have C code which relies on accessing objects with pointers derived from pointers to other objects, you have to modify it in order for it to do anything on CHERI besides segfault
05:08:34 <korvo> I'm currently looking at Smalltix as supporting capabilities by not having the tools necessary to construct paths e.g. into the Nix store. This is yet another step on the transitional path that Nix has laid out.l
05:08:54 <ais523> every "C but memory-safe" I've seen won't catch the bug in this code
05:09:31 <ais523> (CHERI only does if you explicitly narrow the permission on the info->name projection, but doing that would break too much C code)
05:10:16 <sorear> the only times you need to modify CHERI C to make it *more* secure is if you have a user-level memory allocator and you want CHERI to know about and enforce the subobject boundaries
05:10:35 <ais523> korvo: I don't trust unguessability as a security feature at all, given how many speculative execution vulnerabilities there are
05:10:51 <ais523> sorear: does CHERI catch the bug in the code I posted above?
05:12:12 <korvo> ais523: Um? Maybe we're talking about different stuff. Unguessability is stuff like TLS being technically insecure in the sense that a determined attacker could crack a key. Or are you thinking of like timing attacks?
05:12:35 * korvo distracted by kitchen
05:12:47 <ais523> korvo: I'm mostly thinking of same-CPU covert channel attacks (which includes timing attacks)
05:13:16 <ais523> if the attacker can't run code on your CPU then things are safer
05:13:26 <b_jonas> I think I'll go with option 1, where trying to write a literal array has the same semantics as trying to index out of bounds into an array or use-after-free of an array. the UB or runtime check is already there because of the array indexing, so it doesn't really have extra cost to add more of it.
05:13:37 <korvo> ais523: Ah, okay. That stuff threatens unforgeability too. In general, colocation doesn't appear like it can be safe unless we're doing hypervirt.
05:13:40 <sorear> https://ctsrd-cheri.github.io/cheri-c-programming/impact/subobject-bounds.html
05:14:13 <ais523> sorear: OK, that's exactly what I thought – CHERI can't support it without code changes, but can support it with
05:14:14 <korvo> And then ISTR that there's some master theorem about how your ISA has to be hypervirt-safe from the late 80s.
05:14:22 <strerror> If your security feature is “like ASLR”, there are already a lot of attacks on that, which need not involve CPU exploits; e.g. printf("%p")
05:14:38 <sorear> https://cheri-compiler-explorer.cl.cam.ac.uk/z/aKq1n7 what code changes?
05:15:19 <ais523> sorear: in that you used a compiler option that breaks too much existing code to enable by default
05:15:37 <zzo38> I thought also that it should need a capability to be able to measure timing at all. That also partially mitigates timing attacks, although it is not the only thing to do. Applications programs are deterministic except for system calls (and if the program is suspended or terminated by something external, which it cannot detect), and there are not many system calls, so hopefully that should help.
05:16:05 <ais523> zzo38: I also thought that
05:16:37 <korvo> zzo38: Yes, timers have to be tamed, and it's an open problem how to best do it.
05:17:04 <sorear> I'm struggling to interpret this as good faith. All of the provable, involable inter-object OCAP protections are worthless because intra-object protection fails in some cases that were never advertised?
05:18:20 <ais523> sorear: it's more "I've seen many people have a default assumption that an allocation boundary is a security boundary, and that isn't true for a substantial amount of existing C code" combined with "you need some rules for telling the compiler when a subobject boundary is supposed to be restrictive and when it isn't"
05:18:40 <ais523> programmers frequently use subobject boundaries that aren't supposed to be restrictive, and frequently use subobject boundaries that are
05:18:43 <zzo38> korvo: I think that redesigning the entire system is the way to do it, although it is possible that other people have other ideas.
05:19:19 <ais523> it doesn't make the protections worthless because a pretty high proportion of spacial exploits are cross-allocation
05:19:29 <ais523> so you're getting a pretty good mitigation percentage
05:19:43 <ais523> but there isn't a magic bullet to getting 100% memory safety from existing C programs
05:19:44 <b_jonas> zzo38: this doesn't apply to all the vulnerabilities mentioned, but it's really hard to keep a fast L1 cache, paging, SMP, large main RAM, and a fast CPU clock cycle, without also having lots of complexity that can cause vulnerabilities.
05:20:36 <zzo38> b_jonas: Yes, although some of these complexities can be avoided (and in some of the cases, they can be handled by the compiler instead)
05:21:15 <ais523> (interestingly, apparently the iOS kernel has a rule of not mixing things with different security properties within a single allocation, i.e. you need to use two different allocations with one pointing to the other – that means that an allocation boundary really is a security boundary inside the iOS kernel)
05:22:38 <b_jonas> ais523: hehe, writing past the end with strcpy.
05:23:11 <ais523> b_jonas: I wanted an example that a) is a plausible example of a bug that might occur and b) is easy for C programmers to notice as being buggy if told there's a bug there
05:23:44 <sorear> the point is compartmentalization, unaudited code can fail to be memory safe but it cannot be memory unsafe in ways that violate the security of code that _has_ been audited
05:23:59 <b_jonas> strerror: ASLR isn't trying to be unguessable in the cryptographical sense, it never did. pointers don't have enough address bits for that.
05:24:33 <ais523> sorear: OK, that's valid (although it differs from the security claims I typically see on the subject)
05:25:19 <korvo> zzo38: In Monte, the only way to get a system timer is with a top-level capability. It gives out absolute timestamps, but we called that .unsafeNow() since we were pretty sure that it's not safe. The idea is that a user might only get Timer.measureTimeTaken and nothing else. https://github.com/monte-language/typhon/blob/master/typhon/objects/timers.py#L65
05:25:34 <strerror> b_jonas: In a sufficiently large program it's not unguessable in any sense, since anything that prints a pointer (printf, JIT runtime, leftover debugging code) will disclose addresses immediately
05:26:12 <strerror> In the past “sufficiently large program” was emacs, now it's a web browser
05:27:05 <ais523> re: timing permissions, I think the ideal goal for a capability system would be "making it safe to run untrusted code" (browsers already do this!), and this raises the problem of preventing the code observing timing using network requests
05:27:29 <ais523> (it could also potentially use racing loops to create a timer, but that seems easier to fix at the compiler level)
05:27:54 <b_jonas> strerror: sure, that too, but I still think using cryptographically unguessable tokens for security makes sense in some cases, and is even hard to avoid in some, even if we suffer because it's undermined by virtualization putting untrusted code on the same CPU which may leak such values more easily than timing or power usage or other side channels
05:29:23 <b_jonas> ais523: sadly making it impossible to observe time and use it as a side channel is basically impossible in most practical settings. all settings where you want reasonable performance at the very least, and often even if you're fine with low performance.
05:29:26 <korvo> ais523: Networking's also a top-level capability (like a half-dozen fine-grained caps, actually) in Monte, and networking doesn't come with timing information by default. So it'd have to be a situation where you're permitted to call out to arbitrary webhooks to begin with, rather than a predefined situation.
05:29:28 <ais523> I have been thinking about ASLR a lot, I think too much ASLR is actually a net disbenefit to security (because it prevents you hardcoding pointers and the code that would otherwise hardcode pointers has to do something else)
05:29:30 <sorear> complexity rebalances in large systems. a big increase in µarch and compiler complexity saves a few % on time and energy, which means you get to make and install fewer chips and smaller power systems
05:29:31 <zzo38> korvo: I suppose it is one way to do it, and I would probably have the kernel to have a somewhat similar function; many application programs might use proxy capabilities, if they require any timing at all (for example, it is useful for many kind of programs to have the current date/time, but many (probably most) programs shouldn't need it)
05:29:55 <ais523> korvo: I'm mostly thinking of the "script on web page" situation – those are expected to be able to make network requests
05:30:16 <korvo> zzo38: pledge() is another cool approach. There's nothing wrong with having the time available in the VDSO; the problem comes from the assumption that any code in the process is allowed to touch the VDSO.
05:30:35 <b_jonas> ais523: I'm not sure, we are using dynamic linkers anyway to share libraries between processes, and they have to do relocation, and our dynamic linkers are robust enough, so why would ASLR make this worse?
05:30:46 <ais523> korvo: the problem is not just the vDSO, but the RDTSC processor instruciton
05:31:17 <korvo> ais523: Sure. Always worth remembering that E's authors were not able to fully rewrite ECMAScript to be cap-safe; ECMAScript is a big success story but it's still world-exposed.
05:31:24 <b_jonas> ASLR may still be a bad idea, but I don't think your argument proves that
05:31:31 <ais523> b_jonas: oh, I also see the dynamic linker as a problem – I think it allows too much
05:31:37 <ais523> but I'm not sure what the correct amount is
05:31:46 <sorear> ASLR was a big step back when the state of the art in attacks was return-to-libc (overflow a stack buffer and overwrite the return address with a pointer to system())
05:31:57 <zzo38> My own way is entirely involving capabilities, and would be a new instruction set too (so that there is no RDTSC, or at least, if there is, only the kernel is allowed to use it).
05:32:09 <b_jonas> ais523: even if the dynamic linker is allowed to load anything only when the process is starting, before it executes user code?
05:32:33 <korvo> ais523: Sure. We generally assume that a cap-safe environment must have "safe code loading", like e.g. JVM's bytecode verifier, to prove that the loaded code is not going to attempt any obvious wrongness. A code loader is safe when it respects isolation and confinement.
05:32:34 <sorear> arm and riscv both have no timer unconditionally exposed to user code
05:32:39 <zzo38> I do not want to use VDSO or whatever like that; when a program receives a capability, it might be a proxy capability, and a proxy capability can work like any other capabilities.
05:32:41 <b_jonas> I certainly see why dynamic linker invoked at runtime allows too much, but I like dynamic linker at startup time
05:32:43 <korvo> I suppose that this implies that the ISA itself must be tamed!
05:32:57 <ais523> b_jonas: one reasonable middle-ground would be what you suggest, but current dynamic linkers allow for the possibility of libraries loading at runtime and doing relocations both ways
05:33:47 <ais523> korvo: IIRC one of RDTSC and RDRAND has a way to disable it from the kernel and the other doesn't
05:34:26 <ais523> oh, right! one of my big realisations was you can prove that a program or portion of one doesn't receive via any side channel or covert channel via proving that it is deterministic
05:34:38 <sorear> the original riscv linux uABI guaranteed that user code _could_ access the cycle counter, was messily broken a couple years ago to dubious security benefit (if you have multithreading, you can estimate times by engineering a race, and noisy times can always be improved with statistics)
05:34:39 <b_jonas> ais523: certainly, but it's not like we can stop that because that function of the dynamic linker can be implemented completely in user-space by opening files with arbitrary filenames and mmap, and in contexts where you restrict those operations, you shouldn't allow invoking the dynamic linker either. now admittedly something like allowing to run untrusted code that can invoke the dynamic linker *is* a
05:34:44 <ais523> although this doesn't prevent it sending or forwarding via a side channel or covert channel
05:34:45 <b_jonas> bad idea even if people do it in practice in some high-level languages.
05:35:08 <zzo38> ais523: It is also one of my reasons for making it deterministic (by designing the instruction set and operating system in such a way)
05:35:46 <ais523> b_jonas: this would presumably be used in an environment where you need a capability to create new executable mappings
05:36:15 <korvo> ais523: Oh, there's no way that we would let the untrusted user submit ISA-specific instructions! Google tried that with Native Client, creating the amazing situation where exploits break through three distinct sandboxes like Tai Lung leaping out of prison at the beginning of Kung Fu Panda. No, we must JIT instead. That's why WASM and Monte do it, and why E had to become a distinct language from Java, Alice from ML, etc.
05:36:22 <sorear> deterministic concurrency would be an interesting security feature, but it doesn't help when mallory is doing RPCs and timing them on _her_ end
05:37:07 <ais523> sorear: yep, that's why I a) mentioned forwarding via a side channel and b) was worried about how to timing-sandbox network requests
05:38:10 <korvo> ais523: The tradeoff is something called a "spellserver". The user gives the spellserver a (cryptographic) cap and it executes (native) code on the user's behalf. The user isn't allowed to choose the code, but they are allowed to pass in other caps as arguments and delegate authority, so the spellserver can act on the user's behalf while doing optimized/privileged things.'
05:38:31 <sorear> native client checks every instruction in the image against an allowlist and ensures that no instructions not in the image can be executed without going through the validator again. I don't see how "ISA-specific" makes this any worse than JIT
05:38:49 <zzo38> In some cases, you can add extra delays where needed (e.g. a proxy capability might do this; my idea is that delay is one of the proxies included in CAQL)
05:39:04 <ais523> korvo: I hadn't heard the name "spellserver" before, but understand the concept
05:39:18 <b_jonas> ais523: I think one of the main difficulties is that even if you go most of the way to run untrusted code in a deterministic way and not allow it much IO capabilities, in the end you have to put some kind of timeout on it to stop it if it takes too much time, and that will be observable. but if you want any sort of performance (like in a Browser) then untrusted code can deliberately do things where
05:39:24 <b_jonas> timing can easily vary by a factor of hundred or thousand in a way that's very hard to prevent. so either you do some deterministic cycle counting but then your timeout will be vague by a factor of a hundred or thousand, or code will be able to observe timing.
05:39:50 <ais523> b_jonas: yes, I agree that this is a main difficulty
05:40:02 <zzo38> A proxy capability can also lie about the amount of time that has elapsed (this does not prevent external timing measurement, but can prevent internal ones)
05:40:22 <ais523> I guess you could go along the lines of "you have to prove your program will execute within X seconds or it doesn't get run" but that would require some really worst-case timing estimates
05:40:32 <korvo> sorear: The trick is to not sandbox at all. Instead of starting with a powerful encoding and trying to limit its behavior (taming), we start with weak primitives and add specific preselected behaviors for which we case-by-case prove safety. This means that the user does not have perfect control over what the CPU executes, but we already know that that control is exploitable, so we shouldn't offer it.
05:41:12 <ais523> korvo: fwiw I consider that to be a type of sandbox, too (but agree that it's massively preferable to the taming approach)
05:41:17 <sorear> nacl doesn't start off with a powerful encoding, it starts off with no allowed instructions and adds them one at a time
05:41:59 <zzo38> korvo: That is what I thought too. Usually this is by the use of a VM code, although I think a CPU could also be designed to help with it (in cooperation with the operating system kernel), although existing CPUs and operating systems are not that way
05:42:53 <korvo> The ISA is an encoding of behaviors.
05:43:10 <strerror> AFAIK, preventing information leakage and covert channels is much harder than preventing tampering. (This is also why unforgeable caps are better than unguessable.) In practice you want to share as little as possible, with air gaps or data diodes (yes, physical ones)
05:44:12 <strerror> I think it's reasonable to want a microarchitecture that prevents the latter, but the former is a lost cause
05:44:15 <zzo38> I also think that unforgeable is better than unguessable. Within one computer, I think it could work.
05:44:49 <korvo> With a language and CPU that work together, unforgeability can extend as far as the ports of a single motherboard. That's a pretty impressive integration!
05:45:06 <ais523> I'm a bit internally conflicted because it's clear to me that at least on present CPUs, a sophisticated attacker who can run arbitrary sandboxed code almost certainly has an arbitrary read primitive available – but most of our security depends on keys and passwords which are not safe in that threat model
05:45:24 <sorear> they got bored with nacl, dropped it, and are now adding LFI which as far as I can tell is exactly the same
05:45:54 <korvo> Well, assuming we trust the memory controller. E trusts iteratees and iterators and collections to be correctly implemented; Monte doesn't, or at least Monte assumes that remote computers can have incorrectly-implemented collections that iterate wrongly.
05:46:14 <zzo38> My idea is the operating system kernel and CPU to work together to do that; programming languages (e.g. a C compiler) might not (and does not need to know about the specific implementation, although of course the instruction set and system call interface would need to be known)
05:46:19 <b_jonas> there are a few contexts where we can afford the potentially huge performance hit, but most contexts where we want to run untrusted code aren't like that. so it's probably worth to pursue both routes: the apparently harder one of defining entirely new architectures without the traditional L1 cache where there are fewer timing differences – useful anyway for proven real-time industrial control
05:46:25 <b_jonas> applications so we might as well use it for something else too –, and the less hard one where we figure out how to do cryptographic operations in a way that the keys can't be leaked by timing or other side-channel attacks even to code on the same CPU, even if admittedly we have a bad track record with this.
05:46:34 <ais523> korvo: you're reminding me of the problems with mmapping files in Rust
05:46:46 <korvo> ais523: Right. But, also, simply-typed lambda calculi are cap-safe by default and it's clear to me that we can offer just about any computational abilty to users within a simply-typed context. We do this to ourselves.
05:47:07 <ais523> Rust programs frequently just do that in practice, even though it's unsound in theory (because the Rust implementation assumes that reading the same memory twice without writing it in between gives you the same value)
05:47:28 <sorear> i expect/hope we'll see a shift away from "isolation" and towards a model which distinguishes confidentiality domains from integrity domains
05:47:38 <strerror> Most of most people's security depends on them not being rich enough to be a target for serious attackers.
05:47:47 <ais523> you can work around the issue by mapping the memory as relaxed atomics rather than regular numbers, but few people actually do that
05:47:57 <sorear> integrity is cheap, confidentiality requires complete system partitioning and/or time slicing with complete state clears
05:48:27 <ais523> korvo: I assume you mean simply-typed lambda calculi with fixedpoint? not the non-TC version?
05:48:37 <korvo> Isolation's a really powerful primitive in distributed systems. Admittedly it's usually baked into spacetime and the configuration of computers in a room, but it's still useful.
05:48:40 <strerror> There are now devices like Yubikeys though, which sort of give you a segregated chip to store your keys
05:48:45 <zzo38> I have ideas about how to make it work with network transparency, although in that case many kind of external attacks are possible. You still cannot send or use a capability that you neither have yourself nor received from the other side, but exteral interference can still result in undesired operations with these capabilities (but encryption can mitigate this).
05:49:32 <strerror> ais523: hm, does mapping it as atomic actually remove the unsoundness, or just admit that it's there?
05:49:50 <korvo> ais523: I mean truly simply typed with no fixpoints. The sort of thing Cammy can do, by zero coincidence. TC means that the user will learn *something* about your computational substrate.
05:49:54 <ais523> strerror: it actually removes it, if you relaxed-read an atomic twice the compiler doesn't assume it'll get the same value both times
05:50:35 <b_jonas> the difficulty with making side-channel operations hard even with untrusted code running on the same CPU is that you can only make that work if *both* CPU manufacturers and compiler writers work on it together, you can't do it with just one side or the other.
05:50:41 <ais523> korvo: now you have me wondering how many useful programs can be written like that (both in terms of "it is possible to write" and in terms of "a typical program can figure out how to write")
05:51:11 <korvo> This was Monte's biggest weakness IMO. In Monte, E, Joule, etc. as well as Python's Twisted or Ruby's EventMachine, there's simply no guarantee of productivity for a sent message. There's not even a guarantee of receipt or way to find out what happened in case of error. Great model for UDP but terrible for in-memory event queues.
05:51:15 <sorear> at the llvm level you also have "unordered atomics" which were invented to handle Java's constrained behavior for data races
05:51:29 <ais523> how does that differ from relaxed?
05:52:11 <sorear> relaxed atomics cannot be reordered if they might alias
05:52:41 <b_jonas> sorear: wait, is that true?
05:52:43 <sorear> presumably it doesn't work very well since c++ didn't copy it
05:53:11 <sorear> yes, two relaxed reads to the same address must be executed in program order
05:53:25 <b_jonas> ok, I still don't understand the C++ atomics model then
05:54:22 <sorear> 2016 riscv allowed load instructions to the same address to be executed in either order, so relaxed atomics needed a fence until the model was tightened (matching arm and ppc instead of alpha)
05:54:53 <ais523> b_jonas: relaxed doesn't require ordering between multiple threads, e.g. if thread A relaxed-reads address X then relaxed-writes address X, and thread B relaxed-reads address X then relaxed-writes a function of its value to address X, then the value thread A reads can be based on the value that thread A writes
05:54:58 <b_jonas> so this java thing would be something more relaxed then relaxed atomics, but more strict than ais523's operation that lets you read an integer with valid but undefined result in case of a data race, because it still wouldn't allow tearing so you'd keep memory-safe pointers?
05:55:45 <Sgeo> https://en.wikipedia.org/wiki/ALGOL_68#Books,_channels_and_files
05:55:54 <ais523> (although this is only allowed if the value it writes doesn't depend on the value it reads – programs aren't supposed to be able to create an actual time paradox, even using relaxed atomics)
05:56:43 <sorear> pointer tearing can't happen on any real architecture ("single-copy atomicity"). fat pointers e.g. Go are a separate issue
05:57:31 <sorear> time paradoxes are called "out of thin air reads" and are unfortunately allowed in most memory models
05:57:45 <ais523> sorear: in the C++ memory model they're disallowed by fiat
05:57:52 <ais523> the standard says not to do them, without defining what that means
05:58:15 <sorear> I do love standard requirements that don't mean anything
05:58:35 <b_jonas> sorear: of course, but I think the read operation that ais523 wants would allow pointer tearing
05:58:45 <ais523> fwiw I am now at this point almost convinced that the correct definition is "do not create a loop in the happens-before + semantically-depends relation" although I may have problems actually justifying it
05:59:18 <ais523> b_jonas: I'm fine to allow pointer tearing because part of the rules for the operation is that if there is a race condition you don't do anything with the resulting value
05:59:18 <b_jonas> I can see why these are hard problems at least
05:59:30 <ais523> having a torn pointer is safe as long as you never dereference it
05:59:47 <b_jonas> ais523: but are you allowed to use the torn value as an integer?
06:00:00 <b_jonas> or do you have to check before you're even allowed to do arithmetic or conditionals on it?
06:00:05 <ais523> err, not just happens-before + semantically-depends, it also includes read-written-value
06:00:35 <ais523> b_jonas: with my suggestion, no, but I can see an argument that it should be yes
06:01:04 <ais523> in fact there was a long discussion about that in the Rust forums or bug tracker (I forget which)
06:01:25 <sorear> (also fun: `final` affects the memory model! if you have a class with final fields and you're on Alpha which does not enforce in-order loads in the presence of an address dependency, if you receive an object pointer you need to fence before reading final fields so the fields can't appear to change)
06:01:33 <ais523> and my attempted proofs that my version was safe didn't work for the version where using the read value as an integer were possible
06:02:44 <ais523> (that doesn't necessarily mean that it is unsafe, just that it's harder to prove safe)
06:04:03 <ais523> this came up in the context of implementing LLVM's "freeze" operation in Rust (which works as follows: in LLVM, "undefined" is a separate value that all types can have; an LLVM-freeze changes undefined values to an arbitrary value and is a no-op on defined values)
06:04:24 <ais523> and there was a huge debate about whether this was safe to add to the language, in the sense of not causing miscompiles
06:07:11 <ais523> (my guess is that it probably is, but I can't prove it and it may be difficult to prove)
06:07:16 <zzo38> I think you would have to ensure that the implementation is correct, but it could be done, e.g. marking a register as "in use" without changing its value, in one case, possibly
06:19:20 -!- Sgeo has quit (Read error: Connection reset by peer).
06:22:40 -!- ^[ has quit (Ping timeout: 256 seconds).
06:34:40 -!- ^[ has joined.
06:54:47 -!- ais523 has quit (Quit: quit).
07:21:54 -!- tromp has joined.
07:32:56 -!- wob_jonas has joined.
07:34:23 <wob_jonas> ais523: so an operation that can read an integer from memory non-atomically such that is unpredictable but not a trap value when there's an inter-thread race would be useful for running untrusted multithreaded code, which is why I think this came up in Java
07:35:41 <wob_jonas> but I don't know if that's the same operation as reading a relaxed atomic.
07:47:55 <wob_jonas> does that LLVM-freeze only make sense for types like built-in integer or float types, or would it also apply to pointers? because I don't see how you could implement it for pointers.
07:48:15 -!- tromp has quit (Quit: My iMac has gone to sleep. ZZZzzz…).
07:49:35 <wob_jonas> hmm, in https://logs.esolangs.org/libera-esolangs/2025-10-21.html#lse you say that relaxed atomics is appropriate for mmapping areas that other processes could modify, so maybe what I'm asking for *is* just relaxed atomic reads and writes. but the problem is that in the C++ model, relaxed atomic reads are considered unsafe if another process does a
08:02:10 -!- tromp has joined.
08:12:35 <wob_jonas> oh yeah, question. can you safely use C11 call_once from a signal handler, in the sense that you share a global once_flag variable that you may use from either any thread of the process or a signal handler in any thread? and if I want this, should I declare it volatile?
08:21:15 <wob_jonas> I guess this might not be the right question, because that's rarely useful.
09:06:09 -!- ais523 has joined.
09:07:30 <ais523> although I don't know for certain, I don't see why you couldn't LLVM-freeze a pointer (although, if the value was previously undefined, you wouldn't be able to dereference the resulting pointer – it would still be safe to treat its address as an integer though)
09:09:17 <ais523> and yes, I think the C++/C11 atomics model disallows a race between an atomic read and a non-atomic write – but on most processors it's impossible to do a write weaker than relaxed unless the write splits a cache line (I think the discussion about doing relaxed-atomic reads of mmaps required you to read it a byte at a time, because of that)
09:09:45 <ais523> in other words, your atomic read at the C++ level is racing with an atomic write at the asm level, so it works if you interpret the other process as being written in asm/machine code
09:10:22 <ais523> I guess technically this would be unsound if an implementation decided to optimize across processes, but that seems like an unlikely optimisation choice
09:10:54 <wob_jonas> oh, so not only do I have to do relaxed atomic reads, I have to do bytewise relaxed atomic reads? that sounds kind of annoying at first, but the compiler can probably optimize it to larger reads when I read a whole aligned word's worth of reads.
09:12:55 <ais523> I'm not sure whether compilers are allowed to take advantage of the fact that reads are guaranteed never to be torn, if it can't tell what's doing the write
09:13:01 <ais523> * are able to take advantage
09:13:32 <ais523> I can't think of any optimisations that would allow, so just reading it atomically as aligned u32s or the like is probably going to be safe in practice
09:14:04 <ais523> the glibc documentation says that their call_once is currently safe to call in a signal handler but they aren't committing to that yet
09:14:27 <ais523> this implies that in C11 generally it isn't safe, otherwise the glibc devs would probably have committed to following the standard
09:15:17 <ais523> (and the obvious implementation of it could deadlock if called from a signal handler)
09:16:20 <ais523> …actually I'm not sure *how* it could be safe to call from a signal handler, what happens if the signal arrives halfway through the interrupted thread running the function?
09:17:59 <wob_jonas> I'm not sure C11 even tries to define how user-defined signal handlers in a multi-threaded program work. I have the feeling that the C standard doesn't even want to be concerned with signal handlers, they are just forced to because the signal function was in an early C standard as kind of a mistake and they don't want to remove it now.
09:21:53 <wob_jonas> as for calling init_once from a signal handler while the initialization function is already in progress in the same thread, I think from the perspective of init_once, that's not really worse than an ordinary recursive call of init_once from the initialization routine when signal handlers aren't involved
09:26:40 <ais523> b_jonas: looks like call_once is not safe to call from a signal handler, despite the glibc docs (I have a TIO URL but it's too long to paste and am not sure I trust any of the URL shorteners I'm aware of)
09:27:29 <ais523> https://tio.run/##fVA7a8MwEN79K46UFJnYpZnddCshcyBTQYiTbAtkKUiyl5C/HlVXOy3pUA16fPoed4d1h5jSk7ZoRqngLfZeCRle@vfiFwy6s8L8waLUjqAiRBE1grOoeGtEB7Q1P/jkTD6NguzCRXSDRh4B3Wij8rCD10ydnJbQOsfoUsKlgLw2m4XUfD@90EGx42F//Nifyqa4LrKguon3wkqjPNM2Uo4dh7sLCmM41caeqa6KYmY1cQeh7UPo3Oo9pnpwL5v/HOnr7LNpy1Zr@WlXFVA55dIEZaZ0Q5KEVNd5fjvcblNtzvPMvwA
09:27:48 <ais523> doing it as a separate line was enough
09:28:29 <ais523> this looks very much like it deadlocked (and it's hard to imagine any other reasonable result)
09:29:11 <ais523> in particular it timed out without any noticeable CPU usage
09:29:38 <ais523> (what little CPU usage is shown is probably almost entirely from the compiler)
09:32:56 <wob_jonas> ais523: but wouldn't call_once deliberately deadlocked if you tried this without the signal handler, as in if foo tried to call_once(&flag, ...) ?
09:33:28 <ais523> wob_jonas: right, but it's an async signal, those can happen at any point (that's kind-of the definition of an async signal)
09:34:07 <ais523> so to be async-signal-safe, the code has to work regardless of when the signal arrives, including the most inconvenient possible time (which in this case is the middle of the function passed as an argument to call_once)
09:35:06 <ais523> a possible "fix" would be to mask all signals temporarily while running the call_once function (although that would have its own issues)
09:35:28 <wob_jonas> right, so you just shouldn't call call_once from a signal handler, there's no sane semantics, which is why my original question was a bad one.
09:36:19 <ais523> well the glibc documenters seem to have got this wrong too, so it's at least non-obvious to some people
11:29:48 -!- amby has joined.
11:37:43 -!- Lord_of_Life has quit (Ping timeout: 256 seconds).
11:40:17 -!- Lord_of_Life has joined.
11:40:44 -!- chiselfuse has quit (Remote host closed the connection).
11:40:57 -!- chiselfuse has joined.
11:58:01 -!- wob_jonas has quit (Quit: Client closed).
12:27:32 -!- sytra has joined.
12:28:32 -!- ais523 has quit (Quit: quit).
12:30:08 -!- sytra has quit (Client Quit).
12:30:21 -!- sytra has joined.
13:24:08 -!- sytra has quit (Quit: sytra).
13:54:53 <esolangs> [[Mango]] N https://esolangs.org/w/index.php?oldid=166337 * RaiseAfloppaFan3925 * (+1464) Created page with "{{Template:Stub}} Mango is an unimplemented programming language by [[User:RaiseAfloppaFan3925]] inspired by modern 2010-20s slang. {{infobox proglang | name = Mango | paradigms=imperative, procedural | author = [[User:RaiseAfloppaFan3925]] | year = 2025 | c
13:56:40 <esolangs> [[User:RaiseAfloppaFan3925]] https://esolangs.org/w/index.php?diff=166338&oldid=159606 * RaiseAfloppaFan3925 * (+179)
14:20:59 -!- tromp has quit (Quit: My iMac has gone to sleep. ZZZzzz…).
14:21:46 -!- tromp has joined.
14:23:52 -!- simcop2387 has quit (Ping timeout: 260 seconds).
14:23:52 -!- perlbot has quit (Ping timeout: 260 seconds).
14:33:51 <esolangs> [[Sorry]] https://esolangs.org/w/index.php?diff=166339&oldid=139445 * Yayimhere2(school) * (+1)
14:34:48 <esolangs> [[Sorry]] https://esolangs.org/w/index.php?diff=166340&oldid=166339 * Yayimhere2(school) * (+0)
14:54:47 <esolangs> [[Special:Log/newusers]] create * Swatinine * New user account
14:59:09 -!- simcop2387 has joined.
15:02:06 <esolangs> [[Esolang:Introduce yourself]] https://esolangs.org/w/index.php?diff=166341&oldid=166302 * Swatinine * (+150)
15:02:33 <esolangs> [[User:Aadenboy]] https://esolangs.org/w/index.php?diff=166342&oldid=166330 * Aadenboy * (-149) /* ESOLANGS */ class="rectwrap"
15:02:45 <esolangs> [[A=ab=bc=cd=d!]] https://esolangs.org/w/index.php?diff=166343&oldid=165004 * Aadenboy * (+17) /* Truth Machine */ class="rectwrap"
15:02:53 <esolangs> [[User:Swatinine]] N https://esolangs.org/w/index.php?oldid=166344 * Swatinine * (+26) Created page with "Hello! I'm Swatinine! "
15:03:58 <esolangs> [[User:Swatinine]] https://esolangs.org/w/index.php?diff=166345&oldid=166344 * Swatinine * (+38)
15:04:02 <esolangs> [[Mango]] https://esolangs.org/w/index.php?diff=166346&oldid=166337 * RaiseAfloppaFan3925 * (+992) Added Deadfish interpreter example + new keywords
15:07:43 <esolangs> [[Mango]] M https://esolangs.org/w/index.php?diff=166347&oldid=166346 * Aadenboy * (-9) remove unnecessary namespace inclusion
15:08:01 <esolangs> [[Mango]] https://esolangs.org/w/index.php?diff=166348&oldid=166347 * Aadenboy * (-1) move infobox to top
15:11:11 <esolangs> [[Mango]] M https://esolangs.org/w/index.php?diff=166349&oldid=166348 * RaiseAfloppaFan3925 * (+1) oops typo in the category, changed to high level
15:38:26 <esolangs> [[Talk:TDQ]] N https://esolangs.org/w/index.php?oldid=166350 * Yayimhere2(school) * (+143) Created page with "Hey, thanks for that!!! --~~~~"
15:38:45 -!- tromp has quit (Quit: My iMac has gone to sleep. ZZZzzz…).
15:46:09 <esolangs> [[Thing]] https://esolangs.org/w/index.php?diff=166351&oldid=135948 * Yayimhere2(school) * (+81)
16:04:41 -!- tromp has joined.
16:42:46 -!- tromp has quit (Quit: My iMac has gone to sleep. ZZZzzz…).
17:04:03 <esolangs> [[Bijection]] https://esolangs.org/w/index.php?diff=166352&oldid=138371 * Yayimhere2(school) * (+5)
17:05:38 -!- tromp has joined.
17:06:56 -!- Yayimhere has joined.
17:14:24 -!- Yayimhere has quit (Quit: Client closed).
17:14:34 -!- Yayimhere has joined.
17:17:52 <korvo> Yayimhere: Morning.
17:19:11 <Yayimhere> Korvo: Morning! are you good? and are you working on anything lol?
17:25:39 <korvo> Oh, to not be working on anything. I'm theoretically helping sorear clean up the NQL toolchain and merge in some community contributions, although frankly they do not need my help. I'm looking at Smalltix, a brand-new way of doing Smalltalk in Unix which I might operationalize. And I'm thinking of cleaning up the recent stub for Joy, but I'm not sure what should be done besides infobox.
17:28:02 <Yayimhere> cool! I actually just looked at the page for smalltix(though I dont understand, but thats a me problem). me personally, am trying to make an esolang focused on being undecidable, cuz I though it was interesting, for making a simple language. but its not going very well lol. but ill figure it out. but yea, nice!
17:28:55 <Yayimhere> ive been very inspired to do work, cuz I read some of my older works, and found them, quite good in my opion
17:30:18 <korvo> Undecidability is fairly easy. It's undecidable whether untyped lambda terms have normal forms, for example. What might your language compute?
17:31:20 <Yayimhere> I know it's simple, but I want to duo it creatively. but I think ill work on (if possible) make the language compile, self compiler style. but tbh idk what the hell im doing lol.
17:40:24 <korvo> No worries. There's no rush.
17:40:26 <Yayimhere> ok I think ive found a got concept!
17:41:23 <esolangs> [[Joy]] https://esolangs.org/w/index.php?diff=166353&oldid=166110 * Corbin * (+156) Fill out a bit more of this stub. I was going to be upset that this isn't just a contribution to catlangwiki, but on the other hand this is an opportunity to write a better article.
17:43:49 <esolangs> [[Quote]] N https://esolangs.org/w/index.php?oldid=166354 * Corbin * (+341) Stub a common concept. Not sure of the best name for this article.
17:51:42 <esolangs> [[Manfred von Thun]] N https://esolangs.org/w/index.php?oldid=166355 * Corbin * (+328) Stub for Von Thun.
17:54:05 -!- Yayimhere has quit (Ping timeout: 250 seconds).
17:58:09 <esolangs> [[Talk:Concatenative calculus]] N https://esolangs.org/w/index.php?oldid=166356 * Corbin * (+840) Still thinking on this one.
18:04:07 <esolangs> [[Concatenative language]] https://esolangs.org/w/index.php?diff=166357&oldid=156941 * Corbin * (+137) Update bluelinks. catlangwiki doesn't do underscores like MW, so I've hacked up something equivalent.
18:06:32 <esolangs> [[Concatenative language]] M https://esolangs.org/w/index.php?diff=166358&oldid=166357 * Corbin * (+14) Tighten up a bit of phrasing. Link [[monoid]] as main article for monoidal viewpoint.
18:07:51 <esolangs> [[Mlatu]] M https://esolangs.org/w/index.php?diff=166359&oldid=156894 * Corbin * (-12) Bluelink.
18:08:17 <esolangs> [[Special:Log/newusers]] create * Goodbyevoidhelloworld1 * New user account
18:10:35 <esolangs> [[Cammy]] M https://esolangs.org/w/index.php?diff=166360&oldid=165389 * Corbin * (+28) Bluelinks.
18:12:02 <korvo> One must imagine Sisyphus happy.
18:12:16 <korvo> Really, one must imagine that Sisyphus deserved it~
18:44:05 <esolangs> [[Quote]] https://esolangs.org/w/index.php?diff=166361&oldid=166354 * Aadenboy * (+64) distinguish? might be a little silly
18:46:49 -!- vista_user has joined.
18:47:07 -!- vista_user has changed hostmask to ~vista_use@user/DOS-User:11249.
18:53:17 <esolangs> [[Quote]] M https://esolangs.org/w/index.php?diff=166362&oldid=166361 * Somefan * (+33) str
18:55:24 <korvo> vista_user: Oh, I thought you were another webchat user. Welcome nonetheless.
18:58:34 -!- vista_user2 has joined.
18:58:39 -!- vista_user has quit (Ping timeout: 250 seconds).
18:58:39 <vista_user2> 20:55:24 <korvo> vista_user: Oh, I thought you were another webchat user. Welcome nonetheless.
18:59:46 -!- vista_user2 has changed nick to vista_user.
18:59:57 -!- vista_user has changed hostmask to ~vista_use@user/DOS-User:11249.
19:01:15 <esolangs> [[Quote]] https://esolangs.org/w/index.php?diff=166363&oldid=166362 * Aadenboy * (+23) format
19:01:34 <esolangs> [[Adofaiscript]] N https://esolangs.org/w/index.php?oldid=166364 * * (+1889) Started it
19:03:39 <esolangs> [[Adofaiscript]] https://esolangs.org/w/index.php?diff=166365&oldid=166364 * Aadenboy * (+50) {{WIP}} + categories
19:04:16 <esolangs> [[Quote]] M https://esolangs.org/w/index.php?diff=166366&oldid=166363 * Corbin * (+12) Funnier.
19:10:46 <esolangs> [[Quote]] https://esolangs.org/w/index.php?diff=166367&oldid=166366 * Aadenboy * (-26) merge
19:27:50 -!- sytra has joined.
19:30:43 -!- vista_user has quit (Ping timeout: 250 seconds).
19:48:08 -!- FreeFull has joined.
21:14:03 -!- sytra has quit (Quit: sytra).
21:19:48 -!- ais523 has joined.
21:19:54 <ais523> https://tio.run/##rVLRSuNAFH3PVxwrthNIW1OXhV1rHhasFkQFEV8WZJK5aYYdZ8Jkxj6sfnudJK1QX0Qw5GHmnjvnnHu4OW@qzabgDhncpMB8Pjq/WYyiQ6kL5QVh3jghzaTK9kpW6tV@jazVH9q8luHxh6dypblqa5HUDk9cavZspIjxP0L4epzdLS9ul7fnCcLhcXlxHZ/2KLnn3Jcs0Aa9BNf3V1cJHpc3iz@LBD@Of/2MTzGdoqmMVwI5gUObsamRe4ei4npFDYx3tXcdYecaZzju@eswlyvZ4Ej81YME5dpKR2xwSUqZBGtjlTgI9TT8swS9i7iXDDSa1kpq2mN6CAwhKzizbQfLqTSWUMuakFvi/@LfOGo6wRBsaDGWdb7i3dSKqGaz7e37LDdOKvUF47x0ZL/g@xOnpfJNxd4d7fUsWqyV7@
21:19:56 <ais523> FPhCw5b3Wr8xqF7Y1WRdEt89jARZOpw0k2nGGWDVOk2fAEL32kSHGIZs3r3YRci/YY9iOBq0h3s467Wdt7m9MWBzNK7NJEH0wa1rMwWmw2bw
21:20:06 <ais523> this is me trying to understand stdio buffering
21:20:27 <ais523> but the output is really confusing and not only doesn't seem to match the docs, I can't form a consistent model of it
21:24:08 <ais523> (sorry about the link being split over two lines)
21:29:51 <int-e> stderr is unbuffered by default?
21:30:10 <int-e> and the `setvbuf` changes that
21:33:40 <ais523> int-e: it's documented as line-buffered if interactive, fully buffered if not interactive (and in this case stderr is a pipe, so not interactive)
21:34:08 <ais523> but, if it were fully buffered, then the fwrite call shouldn't be able to see that the pipe has broken because it shouldn't produce output at all
21:34:16 <ais523> the second fwrite call, that is
21:34:25 <b_jonas> python 3.14 has significant changes in its garbage collector implementation, just in case anyone's interested in that sort of thing
21:34:26 <ais523> and returning 1 is just bizarre
21:35:00 <int-e> "The standard error stream stderr is always unbuffered by default." -- setvbuf(3)
21:35:41 <ais523> int-e: ah, you're reading the man page and I'm reading the info page
21:36:00 <ais523> the info page doesn't have that special case (and in fact says there are no special cases other than the interactive case)
21:36:12 <ais523> that's one mystery solved, at least – but the output for buffered stderr still doesn't make sense
21:36:33 <ais523> (and the buffering clearly does something because the second fwrite returns 0 rather than 1 without it)
21:37:13 -!- vista_user has joined.
21:38:31 <int-e> ais523: It acts normal for me in a terminal. (12, 12, -1)
21:38:49 <ais523> ooh, let me try in mine
21:39:18 <ais523> yes, normal in mine too
21:40:29 <ais523> (the original thing that prompted the experiment is a Rust bug report that was traced to stderr flushing doing something unexpected, although that was on mingw – there was speculation that it might act differently in a container for some reason, and now there's evidence of it acting differently in a sandbox/container on Linux too which is interesting)
21:41:09 <ais523> although it might be a case of different glibc version, or the like
21:41:28 -!- vista_user has quit (Remote host closed the connection).
21:45:23 <esolangs> [[Izeva]] N https://esolangs.org/w/index.php?oldid=166368 * Ivava * (+922) Created page with "{{WIP}} :'' Does not apply to '''IZEVA - International Council on Clean Transportation''' and other '''Izeva''' is easy(Maybe. It hasn't cycles) character-by-character esolang, has IO based on single accumulator. Hasn't good or useful commands. The only pleasant thing is a
21:46:47 <ais523> I think my computer has glibc 2.41 and TIO has glibc 2.28, although it's hard to be confident
21:52:22 <int-e> https://tio.run/##S0oszvj/P7lAQT8nM8nMBEQm6xrpGVnoFecr6HHpoQj8/w8A
21:53:31 <int-e> pretty confident about this one ;-)
21:56:06 <ais523> huh, I didn't realise you could just execute glibc as an executabe
21:56:53 -!- CodeMelon has joined.
21:57:48 <ais523> although it's usual to wait on IRC for a while to find that out, because people aren't necessarily checking it constantly
21:58:12 <ais523> for this channel, a good option is to read the logs to see if people have been talking (but not all channels have public logs)
21:58:27 <CodeMelon> I've just started going down the esolang rabbit hole
21:59:18 <CodeMelon> And ive wrote my first esolang script can you rate it for me?
22:00:23 <ais523> do you mean a program written in an esolang or a program that implements an esolang?
22:00:29 <CodeMelon> It's also a crude very short and not so fleshed out brainfuck explenation.
22:01:54 <ais523> reading BF is normally quite difficult
22:02:01 <CodeMelon> Its probably been done before anyway but this is my own version of this kind of script
22:02:38 <ais523> ah, we have a huge number of those already
22:02:48 <CodeMelon> Like a version of bf thats still bf but a little different
22:02:56 <ais523> so one more won't hurt much, but most of the existing ones aren't particularly creative
22:03:17 <korvo> CodeMelon: Out of curiosity, are you here from truttle1 the Youtuber?
22:03:46 <ais523> I agree that I don't think I understand what you've done
22:04:22 <CodeMelon> Can i just show you? Like is it allowed to write a short code snippet in chat?
22:04:36 <korvo> CodeMelon: Use a pastebin please! bpa.st is an option.
22:04:48 <korvo> Or maybe webchat does it automatically?
22:04:53 <ais523> don't post things directly in chat if they're more than about two lines long, use a pastebin instead
22:06:04 <ais523> ah, it's polyglot code
22:06:10 <ais523> between English and an esolang
22:06:20 <korvo> Guessing that the ASCII case bit is a carrier, like in the PNG format.
22:06:32 <ais523> korvo: appears to be https://esolangs.org/wiki/OOo_CODE
22:07:08 <korvo> ais523: Nice find.
22:07:19 <CodeMelon> Its said in the text and it does what it says it will do while explaining the basics of bf
22:07:40 <CodeMelon> So is this as my first esolang script good?
22:07:57 <ais523> in this situation you can use arbitrary ASCII text as a carrier, so it's basically just an encoding of a BF constant string printer
22:08:17 <ais523> so it isn't technically interesting, but it is artistically interesting – whether that's "good" or not is a matter of perspective
22:08:18 <korvo> Fun! I haven't checked that it's correct, but it's a decent concept. Did you generate this from another script or did you write it by hand? Impressive either way.
22:09:38 <CodeMelon> korvo I made a website that turns bf into oOo code and you can combine bf and self written text into oOo :)
22:10:08 <CodeMelon> Just a fun project i made this afternoon
22:10:17 <ais523> now I'm thinking about oOo code quines – it's easy enough if you just write the entire program with o and O, but making them use readable text as the carrier would be interesting
22:10:47 <ais523> it would by nature necessarily have to be substantially compressible, which means that the challenge would be to find a text document that was highly compressible but didn't *look* highly compressible, whilst still remaning meaningful
22:10:51 <korvo> Yep, that makes sense. Good times. Totally useless most of the time, unfortunately. Stego's the sort of thing that only makes sense during Little Brother scenarios, and then you need for your stego to be completely invisible rather than fairly obvious.
22:11:18 <ais523> korvo: it doesn't have to be steganography, it could be polyglotting instead
22:11:32 <ais523> i.e. the information channel's existence is obvious but it doesn't interfere with reading the source code a different way
22:12:00 <CodeMelon> I still have trouble writing quines theyre so scary :(
22:13:05 <ais523> something like the Code Golf Stack Exchange polyglot, being valid in over 300+ different languages/implementations, is *entirely* obvious signal but much of it may be hard to decode because it's masked by the other obvious signal
22:14:58 <korvo> CodeMelon: Once you've memorized the recipe, it'll be easier. Eventually it's a matter of figuring out how to print various characters. What have you tried so far?
22:15:24 <ais523> (now at 451 languages/implementations, I just checked)
22:15:47 <ais523> to be precise, two different versions of the same language only count if it produces different output without an explicit version check
22:15:56 -!- CodeMelon has quit (Quit: Client closed).
22:16:08 -!- CodeMelon has joined.
22:16:57 <CodeMelon> Umm... ive not tried to make a quine yet, ive just learned it today what it is since ive stumbled onto esolangs
22:18:13 <ais523> almost all quines follow one of two basic patterns, Underload has built-ins for all the relevant parts of the quning pattern so the quine is very short
22:18:52 <ais523> oh, the first quine is backwards
22:19:05 <CodeMelon> Woah ok thats very cool, gonna look into underload today after ive gone to sleep(its 12pm)
22:20:09 <ais523> but writing a quine in any language is basically doing one of those two things – the hard part is normally regenerating the source code representation of a string from the string itself (Underload "a")
22:20:38 <korvo> CodeMelon: No worries. Have a good night. Glad to show you something new.
22:20:52 <ais523> (there are a few other ways to do quines, but most of them can be considered to be cheating in one way or another, or are just overly complicated versions of one of those two basic patterns)
22:22:54 <CodeMelon> Oh and heres the bf -> oOo / bf + text -> oOo site i made, Im on mobile btw (i program on mobile judge me) so the site might not look right on pc + im not a graphics designer, https://ozelotgamer.github.io/oOoCoder.html
22:23:15 <ais523> > let a = " in \"let a = \" ++ show a ++ a" in "let a = " ++ show a ++ a
22:23:16 <lambdabot> "let a = \" in \\\"let a = \\\" ++ show a ++ a\" in \"let a = \" ++ show a +...
22:23:29 <korvo> ais523: On my plate to write up: https://dl.acm.org/doi/pdf/10.1145/3759429.3762631 "Gauguin, Descartes, Bayes: A Diurnal Golem’s Brain"
22:24:06 <korvo> They propose a *gauguine*: a program that probabalistically infers its own source code given a description of its own behavior.
22:24:40 <ais523> korvo: reminds me a bit of 7, except that its 6 command is deterministic
22:24:54 <ais523> (and very useful for quines)
22:25:22 <ais523> 7 is a bit of an impoverished version of the idea, though
22:25:48 <ais523> because it's just "generate source code for a program that produces this output" which in a sense isn't interesting
22:26:25 <ais523> a gaugine doesn't have to be diagonalised, right? it could just be "probabilistically infer the source code of a program given a description of its behaviour"
22:26:37 <ais523> …and I just realised that current coding LLMs actually do that%, albeit badly
22:26:39 -!- CodeMelon has quit (Quit: Client closed).
22:26:46 <ais523> * …and I just realised that current coding LLMs actually do that, albeit badly
22:27:06 <korvo> Sure. On the other end of the power spectrum, I'm looking at the self-normalization barrier, which I think really misses that a practical Unix terminal is simply typed in bytes.
22:27:36 <ais523> you could prompt one with "write a program that takes a text description of how a program behaves, and outputs the source code of that program" and with a perfect coding LLM that would make it into a quine
22:27:50 <ais523> in practice I doubt it'd manage a very good attempt
22:28:03 <korvo> Yep! Indeed, the paper's construction only specifies its behavior at a high level, and everything else is inferred. It looks like most of the program is about encoding the syntax of the Church language, which is the probabalistic PL used to run the program.
22:28:08 <korvo> To...sample from the program?
22:28:09 <ais523> although maybe it's seen LLM source code to plagiarise, I doubt it'd be able to recreate the weights / training data
22:29:09 <korvo> Ah! Yes, that concept's been explored. The original papers are on "Gödel machines", so named because there are obvious ways that they must provably be unable to improve themselves.
22:29:45 <korvo> The most recent iteration I saw was "Darwin Gödel machines", which added genetic algorithms and language models. TBF I think that genetic algorithms would be a great fit, but maybe not so much on the language models.
22:31:56 <ais523> it has crossed my mind that given a sufficiently good estimator of "how close" a program is to implementing a given behaviour, you could use a genetic algorithm to produce a program that implements it
22:32:09 <ais523> …and this may be what coding agents are actually doing, in the case where they work (rather than using knowledge or reasoning)
22:32:20 <korvo> ais523: Also, there's probably a way to cheat with LLMs. The fundamental idea is something like https://www.pcg-random.org/party-tricks.html
22:33:13 <korvo> And then there's a variety of ways to gradient-descent in the wrong direction for reasonably cheap. Something like "reverse prompt engineering" https://arxiv.org/abs/2411.06729v3
22:34:27 <korvo> Coding agents are definitely not doing genetic algorithms. Rather, they're doing chain-of-thought and lots of scratch tokens. We know from bertology that the models *can* emit high-quality code; we just didn't know what sorts of prompts and RL would elicit agentive code-writing behavior.
22:35:32 <korvo> (Cammy's reference implementation has a coding oracle "kamis" which uses the SOTA genetic algorithm for functional programming. See the refs on the wiki page.)
22:35:49 -!- Sgeo has joined.
22:36:18 <korvo> (The supervising author on those genetic-algo papers was O'Neill, the author of PCG. Curious coincidence?)
22:37:09 <ais523> korvo: well, a coding agent is a loop – take the existing state of the repository, prompt the agent with it, apply the action that it suggests
22:37:26 <ais523> one loop iteration clearly isn't an evolutionary algorithm, but the loop as a whole may be
22:37:39 <ais523> (I guess it can't be "genetic" unless you mix in old / parallel states)
22:39:47 <korvo> ais523: It has to have some sort of pressure which causes selection. In the O'Neill paradigm, fitness minimizes towards zero, but folks often run coding agents in an open-ended mode with ill-defined stopping points.
22:40:37 <ais523> korvo: right, so the condition for this to work is basically "is the LLM able to determine whether or not the new version of the repository is a better fit for the request than the old version?"
22:40:51 <ais523> and my guess is that sometimes that condition is satisfied and you get useful output, usually it isn't and you get useless output
22:40:51 <korvo> The full Brigg-O'Neill approach is to use structure (like, homomorphic structure) to rip apart candidates. Each candidate's shreds are put through a type-checker and added to the available gene pool.
22:41:23 <ais523> but notably, I don't think most coding agents revert when a change has made things worse
22:41:48 <ais523> (they may attempt to create a counteracting change, but there's no guarantee that it's a correct revert)
22:42:18 <ais523> I think you're right, agentic LLMs aren't actually doing this properly (but they would probably work better if they did)
22:42:49 <korvo> The key phrase to search is "meta in-context learning". The model has to optimize three goals at three different times: predicting the next token during pretraining, decreasing regret during RL, and writing good-enough code during inference.
22:43:31 <korvo> ais523: Haaaaave you read "Simulators" yet? The simulators viewpoint is the best way to understand how correct code might arise from a pile of memes.
22:44:05 <korvo> Warning: bertology, LessWrong-style rationalism, GPT-generated text; here's the original post: https://generative.ink/posts/simulators/
22:44:10 <ais523> I'm not surprised that it's possible, but I'm sceptical about how much of it is due to the LLM itself and how much is due to the scaffolding
22:45:09 <korvo> My work-safe summary: https://genai.stackexchange.com/q/260 Language models aren't agents, genies, oracles, or tools; they are general-purpose *simulators* which *simulate* conversations that humans might have with hypothetical agents, genies, oracles, or tools.
22:45:29 <korvo> They don't reason like humans. They reason like screenwriters imagining what humans might say.
22:46:19 <Riviera> Is that not "common knowledge?"
22:46:23 <korvo> RL turns any model into an agent. Take a weather simulation and say "make it look good for humans", and you'll eventually get attractive ladies who talk about how lovely the weekend will be. But the underlying simulation is only trying to get the weather right.
22:46:46 <Riviera> They were trained with textual input, and that's what they generate.
22:46:59 <Riviera> "Stuff that looks like texts."
22:47:00 <korvo> Riviera: Sadly, most practitioners seem to either believe that it's just matrix multiplication and don't know what a meme is, or think that they're literally summoning demons into the GPUs.
22:47:50 <korvo> The best take is from Emily Bender, who agrees with me that there's a gap between syntax and semantics. She has a great quip: "Play syntactic games, win syntactic prizes." They're meme machines.
22:48:12 <Riviera> I'm less concerned with whether I am missing something.
22:48:35 <ais523> korvo: I think you're arguing at a different point than the one I'm trying to think about
22:49:11 <ais523> i.e. I don't disagree with you but I'm working on a different part of the problem
22:49:48 <ais523> (and I'm using "agent" purely in the sense of "a program that runs an LLM in a loop and changes the input based on the output")
22:50:18 <korvo> ais523: Well, that thing that Naur talked about, theory-building, it's not something that the model can do. It's just not there. So whatever code is generated is *memetic*; it's emergent from cultural practices and shaped by the languages that we use to communicate, but not necessarily *grounded*. AI researchers complain of "symbol grounding".
22:51:29 <ais523> korvo: I am not disagreeing with this – I am instead trying to work out, in effect, how stupid/simple one loop iteration of an agentic loop can be whilst still producing useful output (and have a suspicion that you don't need anything close to as powerful as today's LLMs)
22:51:59 <korvo> Ah! We usually say that an agent has a *goal*. Without a goal and RL, that sort of loop will decay to a stationary distribution because the model is Markov.
22:52:10 <ais523> a good example is the "apply the suggestions that rustc gives you until you reach a fixed point/oscillator or compiling code" technique that beginners to Rust often try, and apparently try to work quite well
22:53:34 <ais523> one of my previous jobs was programming in OCaml, I found large refactors really easy to do, because you did the first step and then just chased compiler error messages until the refactor was finished
22:54:06 <ais523> I really enjoyed that (and Rust being OCaml-inspired, and intended to support the same basic workflow, is one of the things that initially got me to try it)
22:54:34 <ais523> this is the sort of thing that seems really automatable, although I never did automate it
22:55:51 <korvo> Idris has it automated. Write a type signature without a definition, add question marks for typed holes, and let the IDE search for candidates to recursively fill the holes. It only works in the simplest cases though.
22:56:37 <korvo> I added "kamis" to Cammy as an improvement on the older solver, "djinn", which does the simply-typed equivalent. It does great on basic plumbing but can't optimize for fitness WRT a goal.
22:56:44 <ais523> right – proof languages can get away with that, when used purely for proofs, because you can't end up with a wrong value of the right type
22:56:57 <ais523> anything of the type you want is sufficient
22:57:40 <ais523> fwiw, I have weird opinions about proof languages – I dislike tactics as a source code construct, as opposed to something you use in your IDE to generate the source of a proof
22:58:11 <ais523> because I want the proof to contain the actual reasoning, rather than a statement along the lines of "the proof is standard using these standard techniques"
22:59:26 <korvo> Makes sense to me. I'm on team Metamath; I only know Rocq, Idris, and I guess Agda for practical reasons. You'll find lots of esoteric folks agreeing; there's NQL, Metamath Zero, etc.
23:00:05 <ais523> Agda is the only one that I've used, and only for very basic things
23:00:09 <korvo> I've already tried a variety of models for generating uncompressed Metamath proofs. Totally useless, even with a constrained grammar that forces them to pick legal moves.
23:00:49 <korvo> I might as well spill the beans. One of my side projects involves the insight from the end of The Cell (2001): what if we inverted the direction in which the simulation is flowing?
23:01:33 <korvo> Rather than having a chat interface, what if we have the simulation integrate all of the available data, and only chat as an optional side-effect? Initial experiments are very promising, with the understanding that this can't be turned into exploitable labor.
23:02:15 <korvo> Wait, The Cell came out in 2000? Wow. I knew it was early 2000s, but that's early early.
23:02:41 <ais523> now you've got me thinking "by some methods of counting, 2000 was in the 1900s"
23:03:13 -!- FreeFull has quit.
23:23:59 -!- tromp has quit (Quit: My iMac has gone to sleep. ZZZzzz…).
23:25:42 <zzo38> I mentioned before attacking your ally deliberately in Pokemon, but I also remember once before I played, although I did not do so, if I was on the opponent's side, I would have deliberately attacked the ally (for damage, to attempt to knock out the ally, rather than only paralysis)
23:29:08 <ais523> zzo38: your example of intentionally paralyzing a Clefable confused me, because Clefable is one of the Pokémon that's least useful to do it on
23:29:40 <ais523> normally the reason I would want my own Pokémon paralyzed in competitive Pokémon is to protect them from being toxic-poisoned, but Clefable normally has Magic Guard and thus doesn't care about being toxic-poisoned as it is
23:30:18 <int-e> ais523: The TIO behavior is WEIRD, can't explain it, not even with the old glibc version in the picture. It would have to be patched, I think.
23:30:28 <ais523> (as this conversation suggests, Pokémon status conditions don't really reflect or act like their equivalents in real life)
23:30:53 <ais523> int-e: it's definitely running in a sandbox, which might potentially break things somehow?
23:31:16 <int-e> ais523: But it should not be doing any system calls, just fill the buffer.
23:31:54 <ais523> I can potentially see a sandbox breaking the output of fstat calls, which might cause glibc to act differently
23:31:59 <ais523> but acting that differently is strange
23:32:25 <zzo38> ais523: I know that, and it was a risky situation, so it might not have been the best move, but it was a risk I decided to take and it ended up helping.
23:32:30 <ais523> it's also possible that TIO injects extra flushes, somehow, in order to be able to show more output if a program crashes
23:34:37 <int-e> ais523: lol: https://github.com/TryItOnline/tiosetup/blob/master/files/system/tiopreload.cpp
23:35:31 <ais523> int-e: I don't think that explains the weird behaviour but it's definitely worth the link
23:35:47 <int-e> ais523: yeah it doesn't, but it's a ridiculous hack :)
23:36:03 <ais523> it isn't even hiding the output, just sending it to a different file descriptor
23:36:43 <ais523> also I'm vaguely surprised that defining a function called __builtin_printf actually works
23:36:50 <ais523> (in that it's permitted and isn't a no-op)
23:37:09 <esolangs> [[Mango]] https://esolangs.org/w/index.php?diff=166369&oldid=166349 * RaiseAfloppaFan3925 * (+0) Took me a while to figure out that this is a WIP and not a stub
23:37:26 <int-e> ais523: Oh, but there's another preload: LD_PRELOAD=libstdbuf.so:tiopreload.so
23:37:49 <ais523> both halves of that are suspicious
23:37:59 <ais523> stdbuf is a command that changes how stdio buffering works, and tiopreload could do anything
23:38:50 <int-e> ais523: and with LD_PRELOAD= ./t ... the behavior disappears (becomes 12/12/-1)
23:39:02 <int-e> so what is that thing
23:39:24 <ais523> oh, tiopreload is specifically just the thing that sends assertion failures to stdout
23:39:31 <ais523> so it must be stdbuf that's causing the problem
23:41:15 <ais523> int-e: I can reproduce locally using «stdbuf -i0 -e0 -o0 ./t 3>&2 2>&1 1>&3 | sleep 1»
23:41:19 <ais523> so it's a bug in stdbuf
23:47:12 <int-e> ais523: well, it's a bug in glibc where setvbuf(stderr, NULL, _IONBF, 0); followed by setvbuf(stderr, NULL, _IOFBF, 4096); behaves differently from just setvbuf(stderr, NULL, _IOFBF, 4096);
23:48:55 <int-e> ais523: and *that* I can actually reproduce locally. Isn't that fun.
23:49:57 <ais523> I checked the stdbuf source, and it doesn't seem to be a stdbuf bug (this conclusion is consistent with yours)
23:50:10 <ais523> in that stdbuf just calls setvbuf at program load and doesn't do anything else
23:51:02 <int-e> yeah, I decided to just emulate that behavior without the preload
23:52:53 <ais523> what if you give setvbuf an actual array as its second argument, rather than NULL?
23:53:25 <ais523> (should probably be static for lifetime reasons)
23:54:01 <int-e> then it's back to 12/12/-1
23:54:13 <ais523> the info page and man page disagree about what a NULL second argument means, which isn't particularly surprising given what we know so far
23:54:20 <ais523> (although the 12/1/0 behaviour doesn't match either of them)
23:54:35 <esolangs> [[Mango]] M https://esolangs.org/w/index.php?diff=166370&oldid=166369 * RaiseAfloppaFan3925 * (+502) Added 41 and 21