00:04:42 -!- mtm has quit (Ping timeout: 276 seconds).
00:05:23 -!- mtm has joined.
01:28:24 <esolangs> [[User:None1]] https://esolangs.org/w/index.php?diff=154317&oldid=152942 * None1 * (+56) /* My Esolangs */
01:43:48 -!- amby has quit (Quit: so long suckers! i rev up my motorcylce and create a huge cloud of smoke. when the cloud dissipates im lying completely dead on the pavement).
03:06:21 -!- ais523 has quit (Quit: quit).
03:09:39 -!- FreeFull has quit (Quit: Lost terminal).
03:38:36 <esolangs> [[General blindfolded arithmetic]] https://esolangs.org/w/index.php?diff=154318&oldid=154209 * Stkptr * (+3386) /* with +, -, * (at least FSM) */
03:40:35 <esolangs> [[General blindfolded arithmetic]] https://esolangs.org/w/index.php?diff=154319&oldid=154318 * Stkptr * (+188) /* Relation with Diophantine equations */
03:58:39 -!- tromp has quit (Ping timeout: 265 seconds).
04:24:41 <zzo38> Do you know if there are issues with optimization with the C compiler if you are storing a capability in a uint64_t variable even though arithmetic is not allowed (and will result in a run time error)?
04:27:40 <korvo> Like if the machine only has 32-bit words?
04:30:18 <int-e> (what kind of capability? in what sense is "arithmetic not allowed"?)
04:33:10 <korvo> A literal answer might be "no, C doesn't know what a capability is," but that's unfairly strict.
04:35:58 <int-e> This may turn into an ABI question very quickly, which is outside of the scope for C, though not for C compilers.
04:50:34 <zzo38> I described what I mean by "capability"; from the point of the view of the instruction set it is really just a 64-bit value that is tagged in such a way that arithmetic cannot be performed on it; it is called a "capability" due to the operating system. The machine will have 64-bit words since it will be compiled for a specific machine that has this feature.
04:51:34 <int-e> but that doesn't explain in what way arithmetic even could break anything, as long as the right values are passed to system calls, or maybe stored in special memory locations
04:51:38 <zzo38> I mean if the C compiler is not modified, but targets whatever instruction set it is a modification of (probably RISC-V) such that it otherwise remains compatible.
04:53:33 <zzo38> Performing arithmetic on the values would cause the CPU to result in a trap, so the program will not continue executing (unless the operating system handles the trap and allows the program to continue). It also happens if the program converts it to an array of bytes and then tries to extract the individual bytes; that is also an error resulting in a trap.
04:55:34 <zzo38> (In an actual program, typedef would probably be used, but I am asking about how the C compiler would handle such a situation if it is not modified to handle it.)
05:02:52 <zzo38> Would it cause a problem with unaligned memmove?
05:06:30 <int-e> So those capabilities have hardware support now? But the compiler is somehow unaware? That sounds... unlikely, those two go together.
05:08:00 <zzo38> My intention is in case you are using an existing cross-compiler and cannot modify it for whatever reason. (There are reasons why you might want to modify the compiler other than this too, but sometimes you might want to or have to use existing compilers that you cannot easily modify (even if it is FOSS, it might take too long to modify and maintain it, or too long to compile the compiler).)
05:08:04 <int-e> or, like, such capabilities would end up in special registers that are managed with special instructions that you'd use compiler intrinsics or inline assembly for; in the former case the compiler knows; in the latter case, the compiler will document how values are represented in registers passed into your assembly code
05:08:47 <int-e> I feel that this question is, simultaneously, way too specific, and too vague.
05:12:20 <zzo38> It seems to me that splitting them into individual bytes might be more of a problem than arithmetic would be. There are a few reasons I had for not using special registers, including that you can store them in ordinary memory (except areas that the operating system disallows for this use, such as video memory).
05:14:15 <zzo38> I think Flex computer does something similar, although it has its own instruction set. (From reading the documentation, it would seem to me that (although the documentation does not actually mention this, as far as I can tell), it is possible for the difference of two pointers to be zero even though they are two different pointers.) (This I am describing is not Flex computer, though.)
05:14:53 <zzo38> You may be right that it is too specific and too vague, but I am not sure how to make it better.
06:39:46 <esolangs> [[Amethyst]] https://esolangs.org/w/index.php?diff=154320&oldid=154299 * PrySigneToFry * (+998)
07:15:27 -!- Lord_of_Life has quit (Ping timeout: 252 seconds).
07:15:28 -!- Lord_of_Life_ has joined.
07:16:50 -!- Lord_of_Life_ has changed nick to Lord_of_Life.
07:36:10 <esolangs> [[Special:Log/newusers]] create * Xtex * New user account
07:39:24 <esolangs> [[Esolang:Introduce yourself]] M https://esolangs.org/w/index.php?diff=154321&oldid=154307 * Xtex * (+158) /* Introductions */
07:41:52 <esolangs> [[Esolang:Community portal]] https://esolangs.org/w/index.php?diff=154322&oldid=153079 * Xtex * (-13) /* Other real-time discussion communities */ refresh link
08:20:41 -!- craigo has quit (Quit: Leaving).
09:40:21 <esolangs> [[EchoLang (None1)]] https://esolangs.org/w/index.php?diff=154323&oldid=146239 * PrySigneToFry * (+16)
09:42:03 <esolangs> [[Brainfuck 2.0]] N https://esolangs.org/w/index.php?oldid=154324 * PrySigneToFry * (+4594) Created page with "Brainfuck 2.0 is designed by PSTF, it is based on [[EA Script, It's in the code.]] but it didn't costs anything. = Instructions = All commands work on a 30,000-bit tape, a stack, and a counter. == Basics == {| class="wikitable" |+ |- ! Instruction !! Meanin
09:42:57 <esolangs> [[Language list]] https://esolangs.org/w/index.php?diff=154325&oldid=154308 * PrySigneToFry * (+20)
10:05:24 <esolangs> [[Swapfuck]] M https://esolangs.org/w/index.php?diff=154326&oldid=148596 * Rdococ * (-142) /* Computational class */
11:10:32 -!- Sgeo has quit (Read error: Connection reset by peer).
12:02:27 -!- mtm has quit (Ping timeout: 244 seconds).
12:05:00 -!- mtm has joined.
12:21:14 <esolangs> [[User:None1]] https://esolangs.org/w/index.php?diff=154327&oldid=154317 * None1 * (+17)
12:39:25 -!- FreeFull has joined.
13:02:05 -!- amby has joined.
13:02:46 <esolangs> [[User talk:Hotcrystal0]] https://esolangs.org/w/index.php?diff=154328&oldid=153549 * PrySigneToFry * (+196)
13:39:14 <esolangs> [[XD]] https://esolangs.org/w/index.php?diff=154329&oldid=136855 * PrySigneToFry * (+138)
13:40:29 <esolangs> [[NH3]] https://esolangs.org/w/index.php?diff=154330&oldid=119728 * PrySigneToFry * (+55)
13:46:44 <esolangs> [[Talk:Braindrunk]] https://esolangs.org/w/index.php?diff=154331&oldid=139868 * PrySigneToFry * (+72) /* How many chance will the classic output "Hello, world!"? */ new section
14:17:48 -!- lynndotpy6 has quit (Quit: bye bye).
14:19:06 -!- lynndotpy6 has joined.
14:29:35 -!- ais523 has joined.
14:31:05 <ais523> <zzo38> Do you know if there are issues with optimization with the C compiler if you are storing a capability in a uint64_t variable even though arithmetic is not allowed (and will result in a run time error)? ← so there is a CPU sort-of like that which gets unintentionally targeted by existing C compilers – valgrind implements a virtual CPU and it can track extra bits of data about memory, e.g. whether that data is initialised
14:32:06 <ais523> in order to work with existing C programs that were written without knowledge of it, it had to allow a number of things that would otherwise not be allowed, e.g. in Valgrind memcheck, arithmetic on uninitialised data is permitted but returns uninitialised results, and copying uninitialised data is allowed, but branching on uninitialised data isn't
14:32:46 <ais523> I imagine it wouldn't be too hard to design the sort of capability-locking CPU you want as a Valgrind backend (relative to the other possibilities for implementing it, at least)
14:33:18 <ais523> and in practice, most compilers don't optimise copies of machine-word-sized things into anything other than copies (although they may copy through memory or through a vector register)
14:33:28 <ais523> because generally that's what's fastest on the CPU
14:35:51 <ais523> that said, I don't think optimisers *guarantee* to not, e.g., spill values by representing them as an arithmetic combination of other known values – they're allowed to do so in theory, just usually don't in practice
14:39:32 <int-e> oh, https://xkcd.com/3062/ resonates with this in a weird way
14:44:35 <b_jonas> https://sourceware.org/glibc/manual/latest/html_node/Atomic-Types.html#index-sig_005fatomic_005ft has some vague wording on this: "In practice, you can assume that int is atomic [wrt signal handlers, not threads]. You can also assume that pointer types are atomic; that is very convenient. Both of these assumptions are true on all of the machines that the GNU C Library supports and on all POSIX systems
14:44:41 <b_jonas> we know of." but that text was written so many years ago that the compilers may have changed since
14:45:30 <ais523> b_jonas: I think you need a "volatile" to benefit from those guarantees
14:45:51 <ais523> otherwise it breaks in situations as simple as reading the same global variable twice
14:46:49 <b_jonas> ais523: ok, but here we're not making a signal handler, we just want to know that the magical integer that zzo mentions is copied as a whole between memory and registers rather than eg. character by character
14:49:24 <b_jonas> though of course the processor will have to follow that tag bit for every word of a vector register separately across all loads and stores for this to work, but I think zzo38 is assuming that the processor does that
14:49:52 <ais523> right, compilers often use over-wide registers for that sort of thing, but don't tend to generate, e.g., XOR swaps
14:49:54 <b_jonas> because eg. copying a large structure or array can use vector registers
14:50:04 <ais523> oh, I think copying a large array can become a call to libc memcpy
14:50:22 <ais523> and that may attempt to copy a byte at a time in some situations
14:50:48 <b_jonas> yes, with the caveat that the compiler requires that the memcpy implementation is a no-op when the from and to address matches, which is not generally true in the C standard
14:51:27 <b_jonas> but I think that wording I quoted from the libc manual does try to imply that memcpy doesn't break pointers or ints into smaller pieces
14:51:53 <b_jonas> (they have to be aligned, but C or C++ requires that anyway of these types)
14:52:59 <b_jonas> and if instead of a well-written memcpy you write your own char* loop then of course you don't get such a guarantee
14:55:38 <int-e> Hmm how closely does this question tie into the earlier discussion about CHERI?
14:58:11 <b_jonas> that said, I find zzo38's hypothetical quite esoteric. normally you'd either make those capabilities a special type with copy and destroy functions in the library (with possibly compiler support if they're sufficiently magical); or make them similar to unix file descriptors so they're a plain integer but you have to close them explicitly and use extra functions to pass to a different process (except
14:58:17 <b_jonas> they need not be like unix file descriptors in that they need not have a guarantee that creating a new capability takes the lowest unused number, and they could be in a different namespace than ordinary file descriptors); or if you really want plain data then you'd make it wider, sized between 16 and 64 bytes, and filled with random data each time you create, then they can be bytewise copied fine.
15:00:50 <ais523> fwiw I've been trying to get gcc to optimise a copy of an int array followed by a char array into one big memcpy, but it refuses to do so
15:01:06 <int-e> FWIW I wish zzo38 had explained the tag bit detail a bit sooner. It makes the question much clearer.
15:01:18 <ais523> clang will generate a call to libc memcpy, though
15:01:47 <b_jonas> ais523: is it in code where you aren't allowed to write past the end of the char array and that end needn't be aligned?
15:01:52 <ais523> but oddly it generates two calls for the int-then-char
15:02:10 <ais523> b_jonas: yes, if gcc is allowed to write past the end (e.g. struct padding) then it does
15:03:40 <ais523> actually the compilers are showing a big lack of joined-up-thinking here
15:04:01 <ais523> if I have a struct with two array fields, and copy the struct, it compiles to one memcpy – if I copy both fields, it compiles two two memcpies, even though that is equivalent
15:07:58 <ais523> this is true in both gcc and clang
15:08:17 <ais523> although, clang uses a call to libc memcpy and gcc uses the processor's memcpy intrinsic
15:08:36 <ais523> (modern x86 and x86-64 have a memcpy routine in microcode that you can call into)
15:09:28 <int-e> "call" meaning `rep movs[size]`?
15:09:31 <ais523> it isn't immediately clear whether the processor's microcoded memcpy is faster or slower than one running in software – the issue is that the branch predictor cares about the memory addresses of branches in order to remember their histories
15:09:39 <b_jonas> sure, gcc has the right to use either a built-in memcpy that it can optimize inline or call a memcpy that libc or you provide
15:10:02 <ais523> so if the whole thing is written as a single rep movsq then the branch predictor has nowhere to store its memory of how the copy branched
15:10:44 <int-e> Naively... it /could/ attach it to the rep movs if it wanted to?
15:10:57 <ais523> int-e: there are multiple branches within the microcode, typically
15:12:14 <ais523> modern CPUs also have predictors for not just where branches go, but whether an instruction is a branch or not
15:12:29 <ais523> in order to be able to speculate past the "branch" before the instruction is even decoded
15:13:24 <int-e> heh then again you get extra speculation opportunities for `rep movs` because you know a lot of registers that won't change
15:13:35 <int-e> I mean if you want to
15:13:39 <ais523> AMD got hit with a new Spectre variant semi-recently in which the processor was trained into expecting that a return instruction would actually be a branch instruction, sending the speculative execution off to a gadget despite the existing Spectre mitigations
15:14:04 <int-e> (hmm it's not even speculation then, just out-of-order execution)
15:14:18 <ais523> I think I can reasonably get an efficient (i.e. implemented with aligned vector moves) memcpy down to just two branches
15:14:23 <ais523> one which special-cases short inputs and one for the loop
15:14:47 <ais523> (for long inputs you unconditionally copy the first vectorful of data using misaligned vector moves, likewise the last vectorful, and then just copy the aligned portion in between with a loop)
15:15:05 <b_jonas> ais523: sure, the cpu wants to predict the target of the ret instruction, and I think it uses both the normal branch predictor for that and some special mechanism just for rets. that's why they even tell you to use a two-byte ret instead of the one-byte near ret.
15:15:12 <int-e> memcpy has to deal with unaligned cases though and that'll be a few extra jumps?
15:15:47 <ais523> b_jonas: the two-byte ret case doesn't apply any more – when it did, the reason was that the branch predictor had a limit for how often in bytes it could make predictions
15:15:56 <ais523> so if you had a jump two executed bytes in a row, it would cause trouble
15:16:10 <int-e> I guess you can try extra arithmetic where you use a bit-wise or of the pointer arguments, mask the bottom bits, then add the length and compare that...
15:16:23 <ais523> int-e: no, the idea is you unconditionally copy the first (e.g.) 32 and last 32 bytes, using unaligned copies
15:16:24 <b_jonas> ais523: you know https://www.agner.org/optimize/#asmlib has an optimized memcpy written by someone who understands optimizing for different x86 cpus right? he can specialize it for different processor implementations because they behave differently
15:16:29 <ais523> then copy all the aligned 32-byte blocks in between
15:17:06 <int-e> ais523: you still have to check that the alignments match
15:17:09 <ais523> this works for any input that's at least 32 bytes long regardless of alignment (although if source and destination have different alignments you have to misalign either the reads or the writes, but you'd probably do that anyway)
15:17:27 <b_jonas> int-e: no you don't, if the alignments don't match the you still want unaligned reads and aligned writes
15:17:42 <b_jonas> so you can always just do aligned writes regardless if the alignment matches
15:18:00 <ais523> b_jonas: so I think the two reasonable ways to do it are either a) unaligned reads and aligned writes, or b) aligned reads, a shuffle/permute, and aligned writes
15:18:21 <int-e> b_jonas: but there are special instructions for aligned reads which might be a tad faster?
15:18:28 <ais523> b) isn't generally used, but it could in theory be faster
15:18:39 <ais523> int-e: those instructions aren't faster, they just trap given unaligned inputs
15:18:55 <ais523> aligned reads are faster than unaligned reads *but* this is true regardless of whether you use the aligned or unaligned read instruction
15:19:18 <ais523> you don't pay for an unaligned read unless the input actually is misaligned, regardless of which instruction you use
15:20:03 <ais523> (this is talking about modern x86, in which unaligned reads are actually as fast as aligned as long as they don't cross a cache line boundary)
15:20:21 <int-e> I guess that makes sense. But yeah it sounds like it might vary between architectures.
15:20:33 <ais523> (but normally when doing unaligned reads in a loop, you amortize the misalignment penalty across all the reads rather than treating the crossing and non-crossing ones differently)
15:20:58 <b_jonas> ais523: I don't think shuffle/permute is ever worth for that, because if you must do an unaligned copy then the x86 processor optimizes the unaligned read better than you can with shuffles; however, you can argue that sometimes when you have written the source recently but won't read the destination for a while you may want to do an aligned read and unaligned write to reduce latency
15:21:16 <ais523> like, for 16-byte reads, those cross boundaries ¼ of the time when misaligned so you approximate by saying they're 25% slower (the boundary-crossing read runs at half speed)
15:21:25 <int-e> especially if you find an architecture that didn't start out with supporting unaligned reads for everything the way x86 did
15:22:07 <ais523> b_jonas: so the reason it might help is that the unaligned read is reading two cache lines, then the next read is reading the same cache line again, so you have 1 more memory access
15:22:34 <ais523> but, that access is to L1 (as you only just read it) which might allow it to beat the arithmetic
15:23:31 <b_jonas> ais523: yes, I think it can beat the unaligned read when the processor can do store-load forwarding because you just stored, but I think in that case an aligned read and unaligned write won't be worse than trying to shuffle
15:24:01 <ais523> oddly, I discovered recently that x86-64 has two different misaligned-vector-read instructions nowadays (MOVDQU and LDDQU) – LDDQU is apparently faster for misaligned reads in cases where the memory isn't written to in the near future, MOVDQU if it is
15:24:06 <int-e> Anyway. I see plenty of room for zzo38's plan to go wrong. Like... would a CPU really add support for those tags to all vector registers, if it has them? That sounds like a big ask...
15:24:32 <b_jonas> ok, I concede, I think there might be some rare case when a vector byte shift is faster
15:24:51 <b_jonas> I don't think I'd ever write that, but if you're very good at optimizing you could
15:24:51 <ais523> I'm not entirely sure what's going on microarchitecturally to cause that sort of performance difference
15:25:08 <ais523> I guess the real question here is "what is the copy bound by"
15:25:26 <ais523> if either end of the copy is not in L1 then I think the bottleneck will be L2/L3/main memory speed
15:25:54 <ais523> and the manually-aligned and unaligned-read versions will run at the same spee
15:26:57 <b_jonas> and if you do copies of short ranges (which is very common in real code) then you can easily mess up and then bound will be the cost you're imposing on instruction decoding caches by using a more complicated code than you should have
15:26:58 <ais523> a cache-line-boundary-crossing L1 read is the same speed as an aligned L2 read, so the cost of the misaligned read doesn't matter as soon as you hit L2, that would be a good reason to do the misaligned read version
15:27:15 <ais523> so memcpy entirely within L1 is the interesting case
15:28:02 <ais523> incidentally, is L3 slower or faster than correctly prefeteched main memory nowadays? e.g. if you're just reading at sequentially increasing addresses, does hitting main memory rather than L3 even matter
15:29:00 <int-e> https://stackoverflow.com/questions/47425851/whats-the-difference-between-mm256-lddqu-si256-and-mm256-loadu-si256 says there's no difference except for Pentium 4?
15:29:03 <ais523> there's a program I've been performance-optimising for ages, and discovered the hardware prefetcher being cleverer than I expected
15:30:38 <ais523> int-e: ooh! so the difference is not "will write it soon" but "recently wrote it", and the instructions had different store-forwarding behaviour once
15:32:04 <b_jonas> yeah, the store-load-forwarding is why you may want aligned reads even if it means you have to do either unaligned writes or vector shifts, even though unaligned writes are usually a bad idea, but if you won't read the destination for a while then it might not matter too much
15:32:23 <int-e> Hmm so I guess the change is that they incorporated all the opitimzations into movdqu for the common case (cached write-combining memory)
15:32:33 <b_jonas> I don't really know if it's ever really worth, I just can't exclude that it's not worth
15:32:57 <ais523> int-e: isn't the normal sort of cache called "write-through" by Intel? write-combining is something else, and rarely used
15:33:04 <b_jonas> I'm not sure I even want to know, I don't think I want to optimize code that involves unaligned stuff (like byte-string handling for network communication) to this level
15:34:35 <ais523> b_jonas: the case that got me looking at this was writing a compiler that's operating on data that it can't prove aligned, even though it logically should be
15:34:51 <ais523> the same sort of thing as "memcpy where you don't know where the pointer comes from"
15:35:22 <b_jonas> unrelated question. in libcurl for HTTP/HTTPS, how do I tell the library that it should close persistent connections now because I won't be starting a new request for a while and they'll be too old to reuse by then anyway? the documentation is unclear. I still want to keep other data (SSL and DNS caches) between the requests, just not the network connections.
15:40:33 <ais523> hmm, in practice, I think most mallocs don't guarantee cache-line alignment when allocating large amounts of memory
15:40:56 <ais523> logically they should – there's basically no cost to doing so and it could help in some situations
15:41:06 <ais523> (there is cost for small allocations, which would be a reason not to do it there)
15:44:51 <b_jonas> these days all the malloc libraries have an API for aligned allocations and I think they all support 64-byte alignment properly because that's a common thing to ask for, so if you want cache line alignment you can get it
15:50:23 <b_jonas> hold on, that's not true, glibc doesn't seem to expose an api for aligned realloc
15:53:10 <int-e> ais523: I guess we're both wrong, the normal operation for RAM is write-back. Now I'll have to read what write-combining actually means to Intel. Write-through means all writes by the CPU end up as individual bus or memory transactions going out of the CPU. Write-combining relaxes that, but how much...
15:56:14 <int-e> (I actually checked how my own PC's RAM is currently configured. Which was less obvious than I thought it would be; /proc/mtrr doesn't list the default, only the special regions...)
15:56:51 <int-e> (but rdmsr 0x2FF works)
15:58:28 <ais523> int-e: ah right, yes, write-back
15:59:05 <ais523> my mental model for how write-combining works is that it has its own separate very small cache that is only used for writes, but I don't know whether that's correct or not
16:00:18 <ais523> non-temporal writes are write-combining even on regular memory, that's what makes me think it works like that
16:00:45 <ais523> <Intel> The non-temporal hint is implemented by using a write combining (WC) memory type protocol when writing the data to memory. Using this protocol, the processor does not write the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy.
16:06:05 <int-e> "13.3.1 Buffering of Write Combining Memory Locations" is where it goes into details of operation
16:09:23 <int-e> Also MTRRs are not the end of the story... there's a page attribute table mechanisms that I've never heard of. Can I inspect those? Hmm.
16:11:34 <int-e> I can, it's exposed in the debug-fs, cf. https://wiki.gentoo.org/wiki/MTRR_and_PAT
16:13:49 <int-e> and for me it only has entries in the lower 4GB, outside of RAM, so the default write-back should still win.
16:14:14 <int-e> but there are write-combining regions in there, presumably for the GPU
16:15:15 <int-e> this was... a fun rabbit hole
16:24:10 <int-e> Actually let me retract the vector register remark regarding tracking tag bits... you're already going to maintain those tag bits in caches and RAM, might just as well go the extra mile.
17:06:31 -!- tromp has joined.
18:01:57 <esolangs> [[Special:Log/move]] move_redir * 47 * moved [[NH3]] to [[NH]] over redirect
18:01:57 <esolangs> [[Special:Log/delete]] delete_redir * 47 * 47 deleted redirect [[NH]] by overwriting: Deleted to make way for move from "[[NH3]]"
18:02:57 <esolangs> [[NH]] https://esolangs.org/w/index.php?diff=154334&oldid=154332 * 47 * (-110)
18:04:08 <korvo> Tangent to the discussion's multiple points: how do folks feel about SWAR/broadword techniques? If we're storing a relatively small value with a relatively large number of tag bits, then we might end up with values that are 4+4, 8+4, or 8+8 bits wide.
18:32:13 <ais523> korvo: so most "tagging for security/safety", like CHERI uses and like zzo38 is using with capabillities, generally only needs to be done on wide values
18:32:48 <ais523> tagging small values is useful for things like carry bits, but normally SWAR-style vectorisation doesn't store carry bits (which is something that has caused some complexity for me recently)
18:33:39 <ais523> you can sometimes check for carry by comparing the one of the inputs to an addition to the output, to see if the result was lower than an input (IIRC it can't be between, so you can compare to either input safely)
18:34:01 <ais523> that said, unless I'm missing something, AVX2 has neither carry bits nor unsigned integer comparisons
18:34:34 <ais523> and the "compare to an input" doesn't work to check overflow for signed addition
18:44:32 <b_jonas> wait AVX2 doesn't have unsigned comparison? let me check that
18:45:07 <b_jonas> I thought even SSE2 had unsigned comparison
18:52:20 <b_jonas> huh, you're right, it takes three instructions to unsigned compare in SSE2. only AVX512 adds an instruction for it.
18:57:53 <b_jonas> nope, only *two* instructions for an unsigned compare: PMINU[BWDQ] followed by PCMPEQ[BWDQ]
19:23:28 <ais523> b_jonas: I think it's still three – that implements ≥ / ≤ not > / <
19:26:04 <ais523> and you need an extra instruction to invert the sense of the result
19:28:50 <b_jonas> ais523: kind of, but you usually don't need an extra instruction to invert a boolean vector, you can almost always just merge it for free in the next instruction that uses it (possibly also with giving inverted result)
19:29:42 <b_jonas> and this is easy to implement in library level with an inverted vector type and overloaded functions
19:30:04 <ais523> ooh, is that why PANDN exists in addition to PAND?
19:30:27 <ais523> (there isn't a PORN, maybe they didn't like the mnemonic)
19:31:34 <b_jonas> yes, this way PAND, PANDN, POR, PXOR cover all the boolean operations
19:32:31 -!- mtm has quit (Ping timeout: 252 seconds).
19:32:33 <b_jonas> I partly implemented this inverted scheme in the SSE4_1 vector library for work back when not every computer that I was working with had AVX yet
19:33:38 <b_jonas> I didn't completely implement it because there are operations that I didn't add, but while I was the only user it would be easy to just add more functions as I need them
19:34:08 <b_jonas> and I indeed added some in later versions
19:37:23 -!- mtm has joined.
19:38:29 <b_jonas> I mean PAND, PANDN, POR, PXOR cover all the two-input *bitwise* operations if you allow that sometimes the output is complemented
19:39:56 <b_jonas> one use is a PAND, PANDN, POR sequence for componentwise branchless if-then-else conditional
19:39:58 <ais523> I guess i'm thinking mostly of when you want to use the boolean as an if conditional (but in a SIMDy way, i.e. you evaluate both sides)
19:40:12 <ais523> in that case having an inverted result might not be what you want
19:43:11 <b_jonas> yes, sometimes you need to insert an extra invert.
19:43:53 -!- Sgeo has joined.
20:12:22 <b_jonas> ok, yet another different question. do you think I can still claim that sqlite3 is the second most installed software library in the world? I talked about this at some point on esoteric, but that was years ago and something like that could change during the years.
20:12:59 <b_jonas> in particular, do Windowses come with a copy of tzdb, or do they have an entirely different implementation? because maybe tzdb is ahead of sqlite3
20:16:50 <esolangs> [[Language list]] M https://esolangs.org/w/index.php?diff=154335&oldid=154325 * Buckets * (+14)
20:17:26 <b_jonas> and I guess the netlib code for decimal formatting of floating point may be a candidate too
20:18:02 <b_jonas> I want this for my cv or job interviews so it doesn't have to be strictly true, but it at least has to be plausible enough that a potential employer can believe it
20:18:18 <esolangs> [[User:Buckets]] M https://esolangs.org/w/index.php?diff=154336&oldid=154309 * Buckets * (+13)
20:18:57 <esolangs> [[Go back]] N https://esolangs.org/w/index.php?oldid=154337 * Buckets * (+1286) Created page with "Go back is an Esoteric programming language created by [[User:Buckets]] in 2020. {| class="wikitable" |- ! Commands !! Instructions |- | #[] || Infinite Loop/Start. |- | [] || Infinite Loop. |- | +m || Change the current Loop by + m. |- | -n || Change the current Loop b
20:21:01 <b_jonas> oh, https://sqlite.org/mostdeployed.html now talks about this explicitly
20:21:15 <b_jonas> it says "probably one of the top five"
20:21:47 <b_jonas> and "our best guess is ... second most deployed"
20:22:23 <b_jonas> ok, then I can absolutely round that up to second most deployed in a job interview, regardless of tzdb
20:23:15 <esolangs> [[Esorn]] M https://esolangs.org/w/index.php?diff=154338&oldid=154315 * Buckets * (+1)
20:24:29 <esolangs> [[Esorn]] M https://esolangs.org/w/index.php?diff=154339&oldid=154338 * Buckets * (+1)
20:26:57 <esolangs> [[Happy]] M https://esolangs.org/w/index.php?diff=154340&oldid=153949 * Buckets * (+1)
20:31:00 <esolangs> [[]] M https://esolangs.org/w/index.php?diff=154341&oldid=154304 * Buckets * (-1)
20:34:26 <zzo38> If capabilities are required to be aligned, then it is simpler; each general-purpose register needs one tag bit and each eight bytes of general-purpose RAM needs one tag bit, and you don't have to worry so much about broken apart copies, etc, although there is still the consideration of doing unaligned copies of smaller data.
20:35:06 <zzo38> However, I think there are some reasons why you might want unaligned capabilities; this is more difficult but I think still might be able to be done.
20:35:31 <zzo38> With unaligned capabilities, each general-purpose register has one tag bit and each byte of general-purpose RAM has two tag bits.
20:37:05 <zzo38> Copying blocks of memory into a place that will overwrite part of a capability should be safe (although there are a few considerations having to do with virtual memory), but it will have to ensure that ythe first byte of the source is not a non-first byte of a capability and the last byte of the source is not a non-last byte of a capability.
20:37:32 <esolangs> [[Happy]] https://esolangs.org/w/index.php?diff=154342&oldid=154340 * Ractangle * (+0)
20:37:53 <esolangs> [[Happy]] https://esolangs.org/w/index.php?diff=154343&oldid=154342 * Ractangle * (+0)
20:38:06 <esolangs> [[BrainWrite]] M https://esolangs.org/w/index.php?diff=154344&oldid=96575 * Buckets * (+220)
20:38:19 <ais523> <b_jonas> ok, yet another different question. do you think I can still claim that sqlite3 is the second most installed software library in the world? I talked about this at some point on esoteric, but that was years ago and something like that could change during the years. ← I would have guessed zlib as most-installed, although I forgot about tzdb
20:38:51 <ais523> ah, the page you link also mentions zlib
20:39:43 <esolangs> [[Array?]] https://esolangs.org/w/index.php?diff=154345&oldid=149392 * Ractangle * (-6) /* Implementation */
20:40:00 <b_jonas> yes, I also assume that zlib has more copies than sqlite3
20:40:08 <esolangs> [[Special:Log/move]] move * Ractangle * moved [[Array?]] to [[UNAI]]
20:40:08 <esolangs> [[Special:Log/move]] move * Ractangle * moved [[Talk:Array?]] to [[Talk:UNAI]]
20:40:29 <ais523> libpng is also mentioned on the sqlite page, although I would expect libpng to depend on zlib
20:40:39 <ais523> in which case it couldn't reasonably beat it
20:40:44 <b_jonas> zlib will eventually lose out as more modern compression libraries that also have their own implementation of zlib take it over, but that'll take like decades
20:40:53 <esolangs> [[UNAI]] https://esolangs.org/w/index.php?diff=154350&oldid=154346 * Ractangle * (-7)
20:41:55 <ais523> I'm not convinced, most computers with a more modern compression library will still need zlib for things that have it as a dependency
20:42:24 <ais523> I used zlib as a dependency of NH4 on the basis that the user was very likely to have a copy already
20:42:47 <ais523> hmm… now I'm thinking about the way that on Windows, programs are expected to ship their own libcs
20:43:25 <zzo38> Some programs might have different implementations of DEFLATE
20:43:38 <ais523> it makes it very difficult to use open-source compilers on Windows, because the libc needs to link against OS internals which aren't widely known (I think they might be publicly available but am not sure) and the Microsoft-provided libcs have license conditions preventing you linking against them with your own non-MSVC compilers
20:43:41 <esolangs> [[UNAI]] https://esolangs.org/w/index.php?diff=154351&oldid=154350 * Ractangle * (-91) /* Commands */
20:44:04 <esolangs> [[UNAI]] https://esolangs.org/w/index.php?diff=154352&oldid=154351 * Ractangle * (+0) /* Hello, world! */
20:44:18 <ais523> mingw works by linking against an OS-provided library which has a lot of libc functionality in it, although it isn't a standard libc and not all the functions in it obey the C specification
20:44:31 <ais523> and Microsoft say not to do that, but don't offer reasonable alternatives
20:44:33 <esolangs> [[UNAI]] https://esolangs.org/w/index.php?diff=154353&oldid=154352 * Ractangle * (-2) /* Cat program */
20:44:51 <zzo38> Will ReactOS be helpful for making open-source compilers on Windows?
20:45:13 <b_jonas> I don't understand, aren't you just supposed to use the libc that comes with MSVC, even if it is not compliant with the C specification in a lot of ways?
20:45:19 <ais523> zzo38: that's an interesting point, actually – ReactOS/Wine's shared codebase would probably a useful target to develop against if writing a Windows libc from scratch
20:46:35 <ais523> b_jonas: you can't do that legally, but I think you have confused two libcs – there is the libc that comes with MSVC which is mostly specification-compliant, and which you can distribute with your programs but can't legally link against with non-MSVC, so mingw can't use it – and there is the libc-like library which comes with Windows (called MSVCRT.DLL) which is not very standards-compliant
20:46:48 <ais523> (yes, it's confusing that the one called MSVC is the one that doesn't come with MSVC)
20:47:10 <b_jonas> ais523: you can't ship that MSVC, but can't you link to it and have the user install the library separately downloading it from Microsoft if it doesn't ship with Windows?
20:47:49 <b_jonas> I'm very likely confused about how all this works
20:49:54 <b_jonas> also for just the operating system interfaces, aren't they in a separately library that you can link to without linking to any libc-like thing?
20:50:13 <ais523> b_jonas: yes but the documentation for that is hard to find
20:50:23 <ais523> I think that's what Microsoft's "official" suggestion is, but AFAICT nobody has actually done that
20:50:38 <ais523> (i.e. "if you're writing your own compiler you don't get to leech off our libc")
20:51:02 <ais523> I wonder whether any of the existing Linux/Unix libcs works on Windows, using Windows kernel interfaces
20:51:08 <b_jonas> what does the clang C and C++ environment that zig distributes these days do?
20:52:22 <ais523> b_jonas: found it: "Distribution of the Visual C++ Runtime Redistributable package, merge modules, and individual binaries is limited to licensed Visual Studio users and is subject to such License Terms."
20:52:47 <ais523> so either the developer or the end user would need a Visual Studio license
20:53:12 <ais523> I do have a Visual Studio license, but for a computer that is no longer operational
20:53:28 <b_jonas> ais523: that says "distribution". you can still link to it, and have the user download it from microsoft, can't you? like with microsoft's fonts
20:54:20 <ais523> maybe? this is the point where I would start to not trust a programmer's interpretation of the law
20:54:57 <b_jonas> I mean, all this nonsense about windows being hard to develop to is part of why I prefer to develop for linux rather than windows. It's not even just libc directly, but that many other programmers find windows development difficult and so don't port their libraries to windows in a convenient form, so anything I try will run into a dependency being hard to use. Or the dependencies are available, but they
20:55:00 <ais523> it would be incompatible with the GPL (which only lets you link to closed-source system libraries in certain circumstances, and this isn't one of them)
20:55:03 <b_jonas> are for different pseudo-architectures on windows, like they use different libcs or compilers, and I can't link them together.
20:55:20 <ais523> b_jonas: I agree, although Windows is less hostile to develop for than Mac OS X nowadays
20:55:41 <ais523> Apple have lots of rules for developers to follow, which wouldn't be as bad if they didn't keep changing
20:56:07 <b_jonas> and game consoles are even worse, sure
20:56:26 <b_jonas> but OS X just never came up, whereas I use Windows for work all the time, so I do encounter that problem
20:56:41 <ais523> I haven't tried to develop for Android but a former coworker did, I'd put it somewhere around the Windows level of difficulty
20:57:12 <ais523> at least Windows can be cross-compiled onto relatively simply, meaning that the only real problem is the dependencies
20:57:38 <b_jonas> exactly, needing a visual studio license as a developer isn't a problem if it's for work
20:57:45 <ais523> BSD is easy to develop for, just as good as Linux
20:57:54 <b_jonas> but I don't want to have to spend time porting every dependency
20:58:53 <ais523> I haven't targeted many other things – some bare-metal microcontrollers where the major issue is that the toolchains are usually terrible
20:59:03 <b_jonas> like most potential dependencies are well-written and should be possible to port, but I shouldn't have to
20:59:21 <ais523> one of them, the officially supported compiler was an old version of gcc that had been patched to call out to a separate executable to see whether you'd paid for a license
20:59:44 <ais523> this is not technically a GPL breach because nothing was preventing you patching it back again, but they were hoping that people wouldn't go to the trouble of recompiling
21:00:21 <b_jonas> yeah, I don't care too much about the bare-metal microcontrollers, but they happen to be useful in that a lot of the rust and zig community do care about such uses, and that makes rust and zig get more attention and better development even for windows or linux
21:00:28 <ais523> (or you could just replace the separate executable with your own executable that implemented the same API)
21:01:12 <ais523> most of my microcontroller development was done in asm, though
21:01:27 <b_jonas> yes, shapez the game is following that model: they have a (cheap) payed licensed version sold on Steam and an open source version where you have to do some slight inconvenience to compile it for yourself
21:01:35 <ais523> because I was working with microcontrollers too small for most compilers to be able to target them, as their runtimes wouldn't fit
21:02:25 <ais523> they had substantially less than a kilobyte of RAM
21:02:36 <ais523> and a few kilobytes of flash memory to store the program
22:14:56 -!- tromp has quit (Quit: My iMac has gone to sleep. ZZZzzz…).
22:17:25 -!- tromp has joined.
22:59:09 -!- tromp has quit (Quit: My iMac has gone to sleep. ZZZzzz…).
23:00:45 <zzo38> I had seen many requests from different IP address and user-agents that do not seem to be legitimate, and had read that apparently they are LLM scraping that are badly implemented.
23:01:23 <zzo38> I had seen, and also had some ideas, about how to work against it or how to stop it. Will the use of a EICAR test file help at all?
23:04:10 <zzo38> Can redirects be used to confuse them?
23:05:11 <zzo38> (possibly redirects to http://127.0.0.1/ if that is able to be usable to control anything?)
23:09:52 <ais523> zzo38: there are lots of different badly implemented scrapers out there
23:10:09 <ais523> and no consensus about how to deal with them
23:11:54 <zzo38> They all seem to claim to be Mozilla-based, and use different IP addresses for every request (possibly with the exception of redirects, as far as I can tell)
23:12:26 -!- alec3660 has quit (Quit: https://quassel-irc.org - Chat comfortably. Anywhere.).
23:12:55 -!- alec3660 has joined.
23:15:31 <esolangs> [[Countup]] M https://esolangs.org/w/index.php?diff=154354&oldid=101775 * Buckets * (+2)