00:01:39 -!- tromp has quit (Ping timeout: 246 seconds).
00:11:27 <esowiki> [[Befunge]] https://esolangs.org/w/index.php?diff=80757&oldid=80726 * Quintopia * (+38) quine 2 needs a lot of trailing spaces to be a true quine
00:28:51 <esowiki> [[Befunge]] https://esolangs.org/w/index.php?diff=80758&oldid=80757 * Quintopia * (+191) /* Quine */
00:37:51 <esowiki> [[Befunge]] https://esolangs.org/w/index.php?diff=80759&oldid=80758 * Quintopia * (-44) /* Quine */
00:38:24 <esowiki> [[Befunge]] M https://esolangs.org/w/index.php?diff=80760&oldid=80759 * Quintopia * (-1) /* Quine */
00:41:59 <esowiki> [[Befunge]] M https://esolangs.org/w/index.php?diff=80761&oldid=80760 * Quintopia * (-2) /* Quine */
00:51:33 -!- tromp has joined.
00:54:56 -!- metcalf has quit (Quit: metcalf).
00:55:45 -!- tromp has quit (Ping timeout: 240 seconds).
00:59:30 -!- metcalf has joined.
01:05:18 <b_jonas> zzo38: oh! I found the third root. "http://zzo38computer.org/sql/"
01:10:47 -!- delta23 has joined.
01:28:58 -!- tromp has joined.
01:35:40 -!- tromp has quit (Ping timeout: 265 seconds).
02:12:32 <esowiki> [[Parse this sic]] https://esolangs.org/w/index.php?diff=80762&oldid=80749 * Digital Hunter * (+906) /* Example programs */ added a phi calculator program
02:13:39 <esowiki> [[Parse this sic]] M https://esolangs.org/w/index.php?diff=80763&oldid=80762 * Digital Hunter * (+0) /* Phi calculator */
02:24:35 -!- tromp has joined.
02:27:29 -!- delta23 has quit (Quit: Leaving).
02:28:39 -!- tromp has quit (Ping timeout: 246 seconds).
02:32:21 -!- Lord_of_Life has quit (Ping timeout: 264 seconds).
02:33:31 -!- Lord_of_Life has joined.
03:24:45 -!- ais523 has joined.
03:25:52 <ais523> I figured out a solution that doesn't involve mapping over NULL: put the block of memory you're iterating over at 0x80000000 exactly, use 32-bit arithmetic on your 64-bit pointers (this zero-extends the top 32 bits but they're all zeroes anyway), and then the sign flag will contain bit 31 rather than bit 63
03:26:11 <ais523> so you know you've gone below the bottom of your block when the 8 borrows down to a 7
03:26:40 <ais523> (you can also do this the other way, counting upwards and making your memory block end at 0x7fffffff)
03:28:13 -!- tromp_ has joined.
03:28:41 <ais523> you could also use any address where bit 16 is a 1 and bits 15..0 are all zeroes, then use 16-bit arithmetic, but there are potential performance hazards with 16-bit arithmetic
03:29:01 <ais523> a false dependency probably wouldn't be too bad for this sort of application, but partial register merge stalls would be horrible
03:29:10 <ais523> (also, it costs an extra byte)
03:32:52 -!- tromp_ has quit (Ping timeout: 256 seconds).
03:32:55 <zzo38> b_jonas: There are some additional files too. What is your opinion about the format I mentioned for mirroring (that I linked the document that I wrote)?
03:33:50 <zzo38> (For transfer purposes, it may be helpful for the request and response to be compressed, too)
03:34:03 <ais523> <b_jonas> fungot: in English, do people really pronounce "chassis" with the final "s" silent? if so, why? ← yes, they do, also the initial "ch" is pronounced "sh", and the "i" is pronounced as a long "e"
03:34:03 <fungot> ais523: a good choice, then i'm going to make mistaks so bad that they kill me, and a parallel! they're not writing stories about what life would be like, if only you made some decisions okay
03:34:48 <ais523> apparently it's a loanword from French (châssis), and English often borrows approximate pronunciations of loanwords
03:38:35 <zzo38> Yes, although some people pronounce some words differently, so sometimes there is more than one way in the dictionary
03:39:14 <zzo38> (Sometimes one way to pronounce it is only applicable to one meaning of the word)
03:44:37 * ais523 vaguely wonders if loanwords ever get returned to their original languages
03:45:52 <ais523> if you're going to keep it, it's more like theft than borrowing
03:47:23 <zzo38> Many words are still used in their original languages, I think.
03:50:38 -!- naivesheep has quit (Remote host closed the connection).
03:51:03 -!- naivesheep has joined.
03:56:23 <zzo38> (Although, sometimes there are already good English words, sometimes the one that isn't so common now, though. And, sometimes, should have to make up the new word; sometimes no language has the suitable words already.)
03:59:50 -!- ais523 has quit (Remote host closed the connection).
04:01:02 -!- ais523 has joined.
04:03:19 -!- mmmattyx has quit (Quit: Connection closed for inactivity).
04:21:47 -!- ais523 has quit (Remote host closed the connection).
04:22:44 -!- tromp has joined.
04:23:01 -!- ais523 has joined.
04:27:33 -!- tromp has quit (Ping timeout: 264 seconds).
04:30:02 -!- ais523 has quit (Remote host closed the connection).
04:31:15 -!- ais523 has joined.
05:16:37 -!- tromp has joined.
05:21:33 -!- tromp has quit (Ping timeout: 264 seconds).
05:25:47 -!- delta23 has joined.
05:27:30 -!- olsner has quit (Ping timeout: 246 seconds).
05:34:01 -!- olsner has joined.
06:02:22 -!- scoofy has joined.
06:43:27 <zzo38> I watched Murdoch Mysteries; they mentioned that the mathematician proved they could not solve the puzzle with the last two letters reversed (it is a puzzle like the 15 puzzle). But, they failed to consider, there are many duplicate letters. Is that deliberate? Did they know that is relevant?
06:46:05 -!- metcalf has quit (Ping timeout: 240 seconds).
07:11:05 -!- sprock has quit (Ping timeout: 240 seconds).
07:41:00 -!- tromp has joined.
08:04:57 -!- ais523 has quit (Quit: quit).
08:27:56 -!- Sgeo has quit (Read error: Connection reset by peer).
08:58:24 -!- tromp has quit (Remote host closed the connection).
09:10:22 -!- hendursa1 has joined.
09:12:05 -!- hendursaga has quit (Ping timeout: 268 seconds).
09:13:09 -!- tromp has joined.
09:19:54 -!- Arcorann has joined.
09:29:49 -!- Arcorann has quit (Ping timeout: 265 seconds).
09:35:48 -!- LKoen has joined.
09:38:07 -!- Arcorann has joined.
11:16:08 <b_jonas> ais523 re mapping over NULL, I believe you have to pass the MAP_FIXED_NOREPLACE flag to the fourth arg of mmap, but that might not be enough. I thought there was a process-wide prctl setting that protects you from mapping to 0 address by default, but didn't require privilege to override. you might want to check sources of old versions of dosemu (or possibly other real mode emulators) that use virtual
11:16:14 <b_jonas> 8086 mode under an x86_32 kernel, because I think that will map to the zero page.
11:18:59 <b_jonas> but I never really experimented with it, because I don't want a page mapped at zero address.
11:19:59 <b_jonas> ah, fizzie found the solution
11:23:09 <b_jonas> "<ais523> maybe I should just use a %fs: prefix" => I think that's reserved for libc and similar to implement thread-local variables, so you have to be careful
11:24:06 <b_jonas> "<ais523> I don't even think there's an official piece of documentation anywhere for how you make a system call from userspace" => man 2 syscall mostly documents it. some details about connection with signal handlers might be undocumented for all I know
11:24:45 <b_jonas> but for the syscall numbers themselves, you have to look them up in the header or include the header
11:25:02 <b_jonas> and the ABI of some syscalls might not be documented, so you may have to look them up in libc source code
11:25:08 <b_jonas> it is documented for some of them
11:26:32 -!- hendursaga has joined.
11:26:56 <b_jonas> I assume you already know that, rather than using the counter as an index, incrementing both the counter and pointers in two/three/four separate registers is often the best, and you're just doing weird golf here
11:27:45 -!- hendursa1 has quit (Ping timeout: 268 seconds).
11:31:09 <b_jonas> ah I see, the following conversation details how the fs and gs segment descriptors are used on x86_64.
11:31:13 <b_jonas> I didn't know most of that.
11:31:48 <b_jonas> I didn't even know that you can use _both_ fs and gs, with nontrivial effect, on x86_64. I thought only one of them was available.
11:36:43 <b_jonas> "<ais523> (that said, if the kernel is going to be assigning a meaning to ud1 combinations, maybe they aren't a good suggestion for intentionally illegal instructions)" => it won't, or at least there's a certain undefined instruction that is used to always trigger an error and so won't be assigned meaning, because gcc emits it in some cases for abort() or less optimized unreachable paths
11:37:18 <b_jonas> but there are plenty of system mode only instructions that will raise privilage faults, and so can be used to assign some meaning.
11:39:00 <b_jonas> plus there's also syscall and int and int 3 all those things for when you want to assign meaning to an instruction that will be handled by either the kernel or a signal handler, plus you can use an access to an invalid address or other faults
11:47:07 <b_jonas> zzo38: I'm not sure it's useful to define a mirror protocol from your side. you just use the protocol by archive.org or debian or github or MS Onedrive or Google Drive, because they're the ones with the warehouses full of hard disks and mirroring stuff from you.
12:02:42 -!- tromp has quit (Remote host closed the connection).
12:53:05 -!- Arcorann has quit (Read error: Connection reset by peer).
12:53:25 -!- tromp has joined.
12:54:56 <esowiki> [[Imeight]] https://esolangs.org/w/index.php?diff=80764&oldid=78625 * Kekcsi * (+500) added details
13:12:07 -!- ArthurStrong has joined.
13:12:39 -!- spruit11 has quit (Ping timeout: 246 seconds).
13:16:58 -!- tromp has quit (Remote host closed the connection).
13:18:46 -!- tromp has joined.
13:23:12 -!- arseniiv has joined.
13:27:09 -!- delta23 has quit (Read error: Connection reset by peer).
13:27:29 -!- delta23 has joined.
13:36:55 -!- tromp has quit (Remote host closed the connection).
13:39:54 -!- tromp has joined.
14:20:43 -!- spruit11 has joined.
14:35:13 -!- hendursaga has quit (Ping timeout: 268 seconds).
14:55:09 -!- Sgeo has joined.
15:28:50 -!- razetime has joined.
15:39:44 -!- ArthurStrong has quit (Quit: leaving).
15:47:21 -!- razetime has quit (Quit: Connection closed).
15:55:55 -!- spruit11 has quit (Read error: Connection reset by peer).
15:56:21 -!- spruit11 has joined.
16:36:21 -!- Melvar has quit (Ping timeout: 246 seconds).
16:52:29 -!- Melvar has joined.
17:00:30 -!- arseniiv has quit (Ping timeout: 246 seconds).
17:21:28 -!- arseniiv has joined.
17:54:36 -!- arseniiv has quit (Ping timeout: 240 seconds).
18:06:17 -!- arseniiv has joined.
18:15:59 -!- TheLie has joined.
18:16:13 -!- tromp has quit (Remote host closed the connection).
18:25:15 -!- tromp has joined.
18:47:35 -!- tromp has quit (Remote host closed the connection).
19:02:28 -!- xelxebar_ has quit (Remote host closed the connection).
19:02:47 -!- xelxebar has joined.
19:13:31 -!- ais523 has joined.
19:13:53 <ais523> b_jonas: yes, this is definitely well into "weird golf" territory, this sort of thing would be insane to do for other reasons
19:14:15 <ais523> I am basically planning to define my own ABI for the program, and am not using libc
19:14:46 <ais523> (even so, you have to turn off gcc stack protectors because they assume that %fs:(0x28) is a piece of thread-local storage not used by anything else)
19:20:33 -!- tromp has joined.
19:27:36 <b_jonas> ais523: yes. that gets difficult.
19:28:02 <ais523> I'm entirely willing to write this completely in asm if necessary
19:28:20 <ais523> currently torn about whether the non-performance-sensitive parts should be in asm or in C
19:28:47 <ais523> gcc lets you reserve a register for your own use program-wide, which is pretty useful for defining your own ABIs (it does, however, spout warnings if library functions overwrite it)
19:30:27 <b_jonas> "gcc lets you reserve a register for your own use program-wide" => I wonder for what architecture they added that
19:30:28 -!- tromp has quit (Remote host closed the connection).
19:30:42 -!- tromp has joined.
19:31:45 <ais523> b_jonas: I think it's for people doing very low level high-performance asm stuff where they want to keep global variables in registers
19:32:55 -!- ais523 has quit (Quit: sorry for my connection).
19:33:07 -!- ais523 has joined.
19:33:40 <b_jonas> ais523: but keeping a global variable constantly in a register is weird, even for a high-performance asm stuff. keeping a global variable in a register temporarily during a function or hot loop, that makes sense, but ordinary compiler optimizations can already do that.
19:34:30 <b_jonas> and doing it locally has much fewer problems with ABI compatibility
19:35:03 <ais523> b_jonas: there are some platforms where the number of registers is large compared to the size of RAM
19:35:38 <ais523> and some algorithms where you don't want to touch memory at all
19:38:24 <ais523> really, having a large number of global variables is normally an issue with your program in the first place, and if you have only a small number, keeping them in registers might make sense depending on how widely they're used
19:39:03 <ais523> I actually ran into a problem like this in aimake – its C output contains a lot of functions that call each other, and gcc seems unwilling to change the ABI of a function that isn't inlined
19:39:24 <b_jonas> ais523: number of registers is large => MMIX is one of them, 6502 may be one if you count the zero-page bytes as registers. maybe x86_64 with AVX512 and its 32 vector registers could count as one, but the problem is, you can't guarantee that they're preserved through callers
19:39:34 <ais523> it would make sense for all the internal state of the parsing automaton to be in global variables for just that source file
19:39:44 <ais523> which I think could be accomplished via that gcc feature
19:39:46 <b_jonas> "some algorithms where you don't want to touch memory at all" => in a hot loop. not in a whole program.
19:40:21 <ais523> some registers, like %r14, are hardly ever used as it is
19:40:54 -!- tromp has quit (Remote host closed the connection).
19:41:10 <ais523> oh, I was thinking of the PIC microcontrollers, which normally have around 100 or so bytes of registers, and often fewer bytes of general-purpose memory (40 or so)
19:41:29 <ais523> the documentation goes into a lot of detail on what registers you can use as extra memory under which circumstances
19:42:09 <ais523> (the whole purpose of the PIC is to have a lot of register-mapped hardware available, so most of the registers have side effects when read and/or written)
19:43:53 -!- sprock has joined.
19:44:11 <b_jonas> ais523: re parsing, just put that internal state to the first argument (or first few arguments) of all your function, then the ABI will force them to be in registers (%rdi then %rsi for linux, or %xmm0 then %xmm1 depending on type), at least in the boundaries of those functions. whether it's a C function argument or a C++ *this argument is mostly irrelevant.
19:44:25 -!- hendursaga has joined.
19:44:40 <kmc> what's the syntax for globally reserving a register in GCC?
19:44:41 <ais523> b_jonas: that was my first idea, but C isn't call-by-name and much of the internal state is writable
19:45:09 <ais523> kmc: register type varname asm("registername");
19:45:22 <b_jonas> ais523: hmm, so you want to return the modified state in the same register? that might be harder, yes.
19:45:37 <kmc> I think GHC's old C backend used that
19:46:04 <b_jonas> I guess you could write a specialized backend to the parser generator that writes x86_64 assembly directly
19:46:31 <ais523> b_jonas: I was working on that but stalled on it
19:47:40 <kmc> there were actually two flavors of the C backend, "registerized" and "unregisterized"
19:48:27 <b_jonas> ais523: I also have a hard time imagining that this parser register thing is an optimization that will solve an actual bottleneck
19:48:31 <kmc> the unregisterized backend outputted something reasonably close to standard C, and was good for portability, but bad for performance
19:48:49 <kmc> among other things it used a "mini interpreter" for computed tail calls
19:49:15 <ais523> b_jonas: I would imagine that the bottlenecks on parsing would be one of L1 read, L1 write, instruction decode, or instruction issue
19:49:34 <ais523> and putting the important parser variables into registers would seem to reduce pressure on all four of those?
19:50:00 <kmc> something like void *(*f)(); while (1) { f = f(); }
19:50:35 <kmc> so a computed tail call / jump (which are the main ubiquitous form of control flow in compiled haskell code) compiled to a return and a call, which is not very efficient
19:50:46 <ais523> of course, the *real* bottleneck is probably reading the file that stores the information you're parsing, it wouldn't help with that bottleneck at all
19:51:03 <kmc> then there was the registerized backend, which output horribly GCC-dependent code and mangled the resulting assembly further with a huge perl script
19:51:09 <kmc> and was only ever implemented for a few architectures
19:51:29 <ais523> kmc: it's not that inefficient (2-3 clock cycles on most processors) unless mispredicted
19:51:31 <kmc> in that case the advantage of producing C was not portability but rather the ability to use GCC codegen and optimizations
19:51:56 <kmc> ais523: well, it may have been worse in the early days when this was first implemented
19:52:00 <ais523> but an unconditional jump costs ¼ of a clock cycle, so is faster
19:52:09 <ais523> (assuming that you don't space them too tightly)
19:53:15 <kmc> what do you mean by 1/4 of a clock cycle? i can see that being meaningful for arithmetic (if you can execute 4 instructions at once) but how does it work for control flow
19:53:51 <ais523> kmc: on Intel processors, there are four main execution units for arithmetic, etc., 0, 1, 5, and 6 (AMD processors also have four but they're named differently)
19:54:01 <ais523> each can handle a subset of instructions
19:54:27 <ais523> jumps tie up port 6 for a cycle, probably so that they can be reversed if they turn out to have been mispredicted
19:54:46 <ais523> (6 is also usable for very basic arithmetic/logic, things like adds and xors)
19:55:07 <b_jonas> ais523: there is probably at least some esoteric valid optimization use for this global register thing
19:55:23 <ais523> and everything else about a jump happens in parallel with your arithmetic and logic, so it costs no time at all as long as jumps aren't too densely packed as it isn't on the critical path of the processor
19:55:42 <ais523> jumps also cost one instruction issue and one instruction retire, but Intel processors handle those at the rate of 4 per cycle
19:56:05 <ais523> so it's taking up almost exactly ¼ of the resources that you get each clock cycle
19:56:21 <ais523> /away for a little while
19:57:37 -!- tromp has joined.
19:58:26 <b_jonas> if the code is execution bound that is.
20:05:42 <ais523> b_jonas: it's actually close to theoretically impossible for code to be execution bound in general, on modern processors
20:06:16 <ais523> it can only really happen in the case of instructions that take multiple clock cycles and can't be pipelined
20:06:47 <ais523> in other cases, it's impossible to feed instructions into the execution units faster than the execution units can process them
20:07:09 <ais523> you can get a sort-of execution-bound when your code is trying to use one particular execution unit more than it can manage, though
20:07:33 <ais523> (e.g. recent Intel processors can only run vector shuffle operations on port 5, so your code will be execution-bound if it's trying to do those at a rate of more than one per cycle)
20:08:33 <b_jonas> "close to theoretically impossible for code to be execution bound in general" => yes.
20:09:01 <ais523> in AMD processors, the rest of the pipeline is faster compared to Intel processors, so those are more likely to become execution bound
20:09:09 * kmc wonders how they ended up named 0, 1, 5, and 6
20:09:19 -!- tromp has quit (Remote host closed the connection).
20:09:40 <kmc> it's like the old prank with the 3 pigs labeled "1", "2", and "4"
20:09:47 <ais523> e.g. if you write a few thousand register-register integer addition instructions in a row, that will be execution-bound because there are only four adders available and yet the rest of the pipeline could handle six addition instructions per cycle
20:09:56 <ais523> kmc: 2, 3, 4, 7 also exist but aren't ALUs
20:10:27 <ais523> 2 and 3 are used for memory read operations (read-modify instructions will use both 2 or 3, and one of 0/1/5/6, simultaneously)
20:10:40 <ais523> 4 is used for all memory writes (so you can only write memory once per clock cycle)
20:10:54 <ais523> and 7 is an address generation unit for writes
20:11:04 <ais523> but what confuses me is that 7 isn't *always* used for writes, even though it can't do anything else
20:11:13 <kmc> does that mean LEA-arithmetic happens on 7?
20:11:19 <ais523> sometimes writes will borrow the AGU from port 2 or 3
20:11:41 <ais523> LEA-arithmetic has varied on how it's done from processor to processor, but most modern processors do it on an ALU rather than an AGU
20:11:48 <ais523> so that the AGUs don't need to be told how to write registers
20:12:50 <kmc> (my wife says someone actually did the thing with the pigs at her high school)
20:12:54 <ais523> it looks like on recent Intel complex LEAs need to be done on port 1, simpler LEAs can be done on port 1 or 5
20:12:57 <kmc> (this is what happens when you grow up in farm country?)
20:14:11 <ais523> I'm never sure what counts as a complex LEA, but suspect it's something like index + base + displacement all being specified
20:14:51 <ais523> also, RIP-relative addresses are "slow" on modern processors for some reason, they tend to require complex-AGU resources even though they're conceptually quite simple
20:15:04 <ais523> (e.g. needing to go via port 1 for RIP-relative LEAs)
20:15:29 <ais523> probably the difficulty is in how %rip itself gets routed, because it isn't register-renamed like most registers are
20:19:11 <b_jonas> as in as slow as a double indexing
20:21:04 <ais523> oh, I think I figured out why memory writes can use the AGU from port 2 or 3
20:21:21 <ais523> it's so that if there's a read-modify-write instruction, the read and write use the same AGU as each other
20:22:08 <ais523> it also wouldn't surprise me if port 7 were a recent addition, and if earlier processors just used 2 and 3 as the only AGUs available
20:22:40 -!- tromp has joined.
20:26:19 -!- TheLie has quit (Remote host closed the connection).
20:30:14 <b_jonas> I don't follow all the details of the execution units. the practical effect is that it's often worth to interleave simple operations if they don't have dependencies either way. which is also why incrementing two pointers in parallel can be better than indexing, and the pointer compare-and-jump can go parallel with the computation.
20:32:04 <b_jonas> so in practice in those kinds of tight loops you want to optimize for decoder and decoded cache, after of course you arrange multiple loops to operate on nice L1-cache-sized chunks so the L2 cache doesn't bind you
20:34:03 <b_jonas> modeling the different execution units in detail rarely helps you. the execution latencies of the instructions do matter, but which execution units they can run on rarely does.
20:34:28 <b_jonas> unless you're writing a BLAS.
20:35:14 <ais523> interleaving doesn't help as much as you might think, because of how the reorder buffer works
20:35:29 <ais523> what you do need to do is to keep the loop-carried-dependency chain as short as possible, though
20:36:10 <ais523> because that will often be the limiting factor on how fast a loop can run
20:36:25 <b_jonas> ais523: do you mean the order you write the interleaved instructions doesn't matter as much? yes.
20:36:55 <ais523> b_jonas: right, the order of instructions is mostly irrelevant on out-of-order processors (which includes most modern processors apart from intentionally low-power-usage ones)
20:37:15 <ais523> they can be moved quite some distance (tens of instructions) in order to get them to run as early as possible
20:37:29 <ais523> I've discovered that the main effect from moving instructions round is to decrease register pressure
20:38:05 <ais523> if you have a temporary register that you need only for the course of a few instructions, it works best to put those instructions right next to each other so that the same register name can be reused as a temporary in a different block of instructions
20:38:12 <ais523> rather than needing to keep it live
20:38:20 -!- tromp has quit (Remote host closed the connection).
20:39:49 <b_jonas> ais523: yes, that can allow shorter instr encodings.
20:43:00 <ais523> or in extreme cases, avoid spills
20:51:51 -!- tromp has joined.
21:06:16 -!- tromp has quit (Remote host closed the connection).
21:27:31 -!- tromp has joined.
21:56:44 -!- tromp has quit (Remote host closed the connection).
22:04:15 -!- tromp has joined.
23:22:51 -!- tromp has quit (Remote host closed the connection).
23:53:42 -!- tromp has joined.
23:54:23 -!- b_jonas has quit (Quit: Lost terminal).
23:58:05 -!- tromp has quit (Ping timeout: 240 seconds).
23:59:59 -!- delta23 has quit (Quit: Leaving).