00:45:50 <fizzie> WDYT, if an IRC client send a periodic PING to avoid the dreaded "TCP connection was lost but the client has nothing to write" issue, should it bother to try to also verify the server responds with a corresponding PONG, or is that superfluous? It's definitely not necessary for the TCP thing, but hypothetically there might be a server that continues to speak TCP but not respond to commands. And
00:45:52 <fizzie> often it's used to estimate latency, but that's a different feature.
00:48:44 <zzo38> I check manually. I have the F2 key bound to PING and then I can see if PONG is received or not. (For automated IRC clients, it could check automatically)
00:50:21 <fizzie> Yes, this would be for an automaton. Just wondering if it's a failure mode that really needs worrying about, assuming I don't care about estimating the latency to try to jump servers if it's too high or w/e.
00:55:00 <zzo38> At least in my experience, if I try to PING and it isn't working, there will eventually be a connection error anyways. However, it might be worth checking after some (configurable) timeout anyways.
00:56:09 <zzo38> It has also happened to me that I was able to receive but not send. In this case, eventually the server will disconnect me due to a ping timeout.
00:56:16 <shachaf> What if it continues to speak TCP and respond to pings, but not to other commands?
00:57:09 <zzo38> Then I would think that the server is defective, probably.
00:57:47 <fizzie> I guess I could have it privmsg me to solve a CAPTCHA. But then what if the server sends that message to some other human who responds to it?
00:58:11 -!- chiselfuse has quit (Remote host closed the connection).
00:58:25 -!- chiselfuse has joined.
00:58:46 <zzo38> You can write a question that you do not expect anyone else to know the answer
01:31:14 -!- earendel has joined.
01:53:17 -!- dutch has joined.
05:27:01 <nakilon> fizzie I always thought client is exactly supposed to respond to PING with PONG, not just send periodic PING on their own
05:31:28 <nakilon> also if believe this line https://github.com/Nakilon/nakiircbot/blob/43bf3dfa932e78f19b656520d29629c9bf94c5bc/lib/nakiircbot.rb#L99 Quakenet used this command for measuring the latency too
05:33:34 <nakilon> I mean when I was making this comment I was reusing some old Quakenet bot that IIRC it had the timestamp parsing in it
05:33:57 <nakilon> but as it says in case of Libera there is just server name there
06:08:49 <esolangs> [[School]] https://esolangs.org/w/index.php?diff=87989&oldid=87986 * AceKiron * (+391) Added the PUSH and POP memory operants
07:54:15 <esolangs> [[Matrix (data structure)]] N https://esolangs.org/w/index.php?oldid=87990 * AceKiron * (+174) Created page with "A **matrix** is a data structure that can serve as an programming language's memory. The number of stacks may vary. Many languages have other methods of data storing as well."
07:54:27 <esolangs> [[Matrix (data structure)]] https://esolangs.org/w/index.php?diff=87991&oldid=87990 * AceKiron * (+3)
07:55:43 <esolangs> [[Matrix (data structure)]] https://esolangs.org/w/index.php?diff=87992&oldid=87991 * AceKiron * (+105)
07:56:17 <b_jonas> fizzie: I check PONG replies anyway to know when the server has processed my previous commands, which I need to know to not send more commands to the server that fit in its buffer, or else it would quit me.
07:56:29 <b_jonas> and at that point you probably want a timeout too
07:58:22 <esolangs> [[Category:Matrix-based]] N https://esolangs.org/w/index.php?oldid=87993 * AceKiron * (+181) Created page with "Languages primarily using one or more [[Matrix_(data_structure)|matrix]]s for storage. ==See also== * [[:Category:Queue-based]] * [[:Category:Stack-based]] Category:Langu..."
07:59:15 <esolangs> [[School]] https://esolangs.org/w/index.php?diff=87994&oldid=87989 * AceKiron * (+225)
07:59:58 <esolangs> [[Matrix (data structure)]] https://esolangs.org/w/index.php?diff=87995&oldid=87992 * AceKiron * (+10)
08:01:00 <esolangs> [[Matrix (data structure)]] https://esolangs.org/w/index.php?diff=87996&oldid=87995 * AceKiron * (+46)
08:06:20 -!- hendursa1 has joined.
08:08:51 -!- hendursaga has quit (Ping timeout: 276 seconds).
08:21:22 <esolangs> [[Matrix (data structure)]] https://esolangs.org/w/index.php?diff=87997&oldid=87996 * AceKiron * (+1)
08:47:39 <esolangs> [[Matrix (data structure)]] https://esolangs.org/w/index.php?diff=87998&oldid=87997 * AceKiron * (+64)
08:51:10 <esolangs> [[School]] https://esolangs.org/w/index.php?diff=87999&oldid=87994 * AceKiron * (-2)
09:05:37 -!- spruit11_ has quit (Quit: https://quassel-irc.org - Chat comfortably. Anywhere.).
09:05:59 -!- spruit11 has joined.
09:31:40 -!- Koen_ has joined.
09:32:53 -!- Sgeo has quit (Read error: Connection reset by peer).
09:48:52 <esolangs> [[School]] https://esolangs.org/w/index.php?diff=88000&oldid=87999 * AceKiron * (+15) /* Memory operants */
09:51:30 -!- Trieste_ has joined.
09:51:46 -!- Trieste has quit (Ping timeout: 240 seconds).
09:58:02 -!- Oshawott has joined.
10:01:31 -!- archenoth has quit (Ping timeout: 252 seconds).
11:04:25 <esolangs> [[Special:Log/newusers]] create * Bsoelch * New user account
11:20:40 -!- hanif has joined.
11:26:01 <fizzie> Yes, I mean, the client does need to respond to PING with a PONG, but that's a different thing.
11:30:30 <riv> does IRC need ping and pong? doesn't TCP already have this basically
11:34:04 <fizzie> TCP has an *optional* keepalive option. But I don't think it's very popular compared to application protocol heartbeats.
11:38:24 <fizzie> As for not sending too many things, I'm using a credit-based system (each byte costs so and so, some commands have an extra surcharge, the client gets credit at a fixed rate capped to some maximum value) to approximate that. That's what ircd (at least the real one, the one used at IRCnet) does on the server side. Of course it's not exactly exact due to network latency and so on, but it's been
11:40:23 <fizzie> On keepalive, IIRC the default timeouts tend to be huge (hours), and configurable only system-wide.
11:48:12 <esolangs> [[Meow]] https://esolangs.org/w/index.php?diff=88001&oldid=87959 * Martsadas * (+20) /* fixed mistakes*/
11:49:14 <esolangs> [[Meow]] M https://esolangs.org/w/index.php?diff=88002&oldid=88001 * Martsadas * (+27)
12:15:51 -!- hanif has quit (Ping timeout: 276 seconds).
12:50:37 -!- earendel has quit (Quit: Connection closed for inactivity).
12:52:38 <esolangs> [[Matrix]] M https://esolangs.org/w/index.php?diff=88003&oldid=42721 * PythonshellDebugwindow * (+50) Confusion
12:52:48 <esolangs> [[Matrix (data structure)]] M https://esolangs.org/w/index.php?diff=88004&oldid=87998 * PythonshellDebugwindow * (+50) Confusion
12:52:57 <esolangs> [[Matrix (data structure)]] M https://esolangs.org/w/index.php?diff=88005&oldid=88004 * PythonshellDebugwindow * (-17) m
12:58:01 -!- hanif has joined.
13:12:27 -!- hendursa1 has quit (Quit: hendursa1).
13:12:53 -!- hendursaga has joined.
13:40:19 <esolangs> [[Special:Log/newusers]] create * 4gboframram * New user account
13:47:28 <esolangs> [[Esolang:Introduce yourself]] https://esolangs.org/w/index.php?diff=88006&oldid=87982 * 4gboframram * (+184) /* Introductions */
14:17:17 -!- delta23 has joined.
14:32:41 -!- Koen_ has quit (Remote host closed the connection).
14:33:22 -!- velik has quit (Remote host closed the connection).
14:34:00 -!- velik has joined.
14:40:11 -!- velik has quit (Remote host closed the connection).
14:40:29 -!- velik has joined.
14:41:55 -!- velik has quit (Remote host closed the connection).
14:42:13 -!- velik has joined.
14:45:28 -!- velik has quit (Remote host closed the connection).
14:47:31 -!- velik has joined.
15:03:05 -!- normsaa has joined.
15:03:22 <normsaa> https://pastebin.com/px6HUCLV how can this binary be decoded?
15:06:38 <Corbin> normsaa: Where did it come from?
15:23:26 <int-e> . o O ( it's too bad that the flag doesn't identify the CTF this is from )
15:23:39 <int-e> normsaa: tell your "friend" to solve the problem properly, by themselves.
15:39:04 <Corbin> Also tell your friend to fix the overlapping assignment, which probably breaks the script.
15:51:39 -!- hanif has quit (Ping timeout: 276 seconds).
15:56:32 -!- hanif has joined.
16:03:37 -!- Koen_ has joined.
16:11:38 <b_jonas> riv: yes, IRC sort of requires PING and PONG for at least three reasons. some servers (not freenode, I haven't looked at libera yet) require that you send *one* pong after connecting, copying an unpredictable code from the PING that the server sends, as a sort of anti-spam measure. second, some servers, including freenode (again, haven't looked at libera yet) require that the client sends something
16:11:44 <b_jonas> every five minutes, to ensure that it can drop clients that are disconnected. it ensures that clients do this by sending pings, you don't need to reply to those technically, but replying to pings is an easy way to satisfy this requirement.
16:14:38 <b_jonas> thirdly, you can use pings for flow control. the way IRC works is that the server has a very small input buffer for each client, and if the client sends more than that input buffer over what the server has handled locally, it disconnects the client. the server handles commands for one client in series, so if you pay attention to local replies (replies from that server, not other servers), you can
16:14:44 <b_jonas> sometimes tell how much the server handled, and so how full the queue is. but not all commands have local replies, or the local reply isn't always easy to identify, so sometimes you want to send a command just to force a local reply. the best command for that is a local PING (as opposed to a PING to a different server), since that does nothing but send you a reply.
16:20:54 -!- velik has quit (Remote host closed the connection).
16:21:23 -!- velik has joined.
16:22:00 -!- velik has quit (Remote host closed the connection).
16:22:17 -!- velik has joined.
16:23:59 <b_jonas> int-e: wait, do you actually recognize what that is, or do you just know it's homework from what it looks like and how they simultaneously cross-post on multiple channels?
16:30:29 <fizzie> int-e: I was half-expecting doing a web search for the flag value would tell you where it's from (surely all of those have answers posted online?), but apparently it doesn't.
16:31:05 <int-e> fizzie: yeah. which /could/ indicate that it's an ongoing one, or just that it's very obscure
16:31:27 <fizzie> "It looks like there aren't many great matches for your search. Tip: Try using words that might appear on the page that you’re looking for. For example, 'cake recipes' instead of 'how to make a cake'."
16:32:57 <int-e> . o ( glados instead of cake )
16:38:06 <hanif> google gave me this video https://www.youtube.com/watch?v=JMrd8PoxvPc, but the author doesn't do this challenge in the video
16:38:06 <keegan> it's a piece of cake to bake a pretty cake
16:38:36 <hanif> and the ctf site linked is dead and unarchived
16:46:43 <b_jonas> fizzie: there's technically a third, most unlikely case: that it's from a site like Advent of Code that gives every logged in user a different test input
16:59:58 -!- Koen_ has quit (Remote host closed the connection).
17:00:26 -!- j-bot has quit (Remote host closed the connection).
17:00:40 -!- j-bot has joined.
17:07:26 -!- oerjan has joined.
17:21:23 -!- arseniiv has joined.
17:30:35 -!- normsaa90 has joined.
17:33:15 -!- normsaa has quit (Ping timeout: 256 seconds).
17:36:23 -!- normsaa has joined.
17:39:29 -!- normsaa90 has quit (Ping timeout: 256 seconds).
17:40:12 -!- hanif has quit (Ping timeout: 276 seconds).
17:41:39 -!- normsaa91 has joined.
17:41:45 -!- normsaa has quit (Ping timeout: 256 seconds).
18:02:07 -!- immibis has quit (Remote host closed the connection).
18:05:43 -!- immibis has joined.
18:05:47 -!- Sgeo has joined.
18:12:21 -!- normsaa91 has quit (Ping timeout: 256 seconds).
18:14:59 -!- Guest81 has joined.
18:15:41 -!- Guest81 has quit (Client Quit).
18:16:02 -!- normsaa has joined.
18:31:02 <fizzie> "Site compatible with IE 10 or above, Mozila [sic], ..." is probably not a good sign.
18:37:06 <velik> OW -- Wikimedia disambiguation page https://en.wikipedia.org/wiki/OW
18:38:23 <nakilon> looks like sometimes there is a default page and sometimes not
18:38:53 <b_jonas> fizzie: does it also have a link to where you can download Acrobat Reader to view their PDFs and the Java and Adobe Flash plugins, without mentioning Oracle for Java?
18:39:31 <b_jonas> also do they recommend at an least 256 color and at least 1024x768 pixel resolution display for best view?
18:40:54 <b_jonas> very long ago I made a script to say "best viewed with Mozilla" or "best viewed with Internet Explorer", always the other one than the viewer is using
18:41:49 <nakilon> b_jonas do you know who chukchas are?
18:42:22 <nakilon> ethnic Siberians who live deep in tundra with deers
18:43:02 <zzo38> b_jonas: What will do if neither is use?
18:43:22 <nakilon> smth like: "to keep chukcha busy give him a paper with 'read on the other side' written on both sides"
18:44:42 <velik> Chukchi people -- ethnic group https://en.wikipedia.org/wiki/Chukchi_people
18:51:00 <b_jonas> zzo38: one of them was the default. I don't remember which.
18:52:19 <fizzie> It didn't have those other things. Maybe it would have elsewhere on the site.
18:57:13 <esolangs> [[School]] https://esolangs.org/w/index.php?diff=88007&oldid=88000 * AceKiron * (-80)
19:00:25 -!- ais523 has joined.
19:01:08 <ais523> I think the reason why some servers ping during connection, and don't connect until they receive a matching pong, is to prevent non-IRC-related programs being tricked into connecting to IRC
19:01:33 <ais523> if your ircd ignores invalid commands (and many do), it isn't hard to put a segment of valid IRC commands in the middle of, say, an HTTP POST request
19:01:39 <riv> that is a good reason but only requires one PING right at the start
19:01:55 <keegan> I remember a spam attack on Freenode that worked by exactly that mechanism
19:02:01 <ais523> so you can create a web page with a script that causes the viewers to spam IRC, and this has been used to create IRC worms in the past
19:02:21 <keegan> it would POST a set of IRC commands that cause the user to join a bunch of channels and spam them with the URL of the page
19:02:50 <keegan> Postel's Law sounds good but is absolutely terrible for security
19:03:05 <riv> Postel's Law is bad
19:03:14 <keegan> also bad for long term maintainability
19:03:22 <riv> i remember that spam attack, that was funny
19:03:35 <velik> robustness principle -- design guideline for software that states: "be conservative in what you do, be liberal in what you accept from others" https://en.wikipedia.org/wiki/Robustness_principle
19:05:54 <keegan> as users expect the "best guess" behavior of implementations will continue working forever
19:06:11 <keegan> leading to the codification of insanely complex behavior as exemplified by the WHATWG HTML spec
19:06:48 <ais523> I do actually like what that HTML spec has done, though
19:07:06 <ais523> because it means that there are now set boundaries for exactly what you are and aren't allowed to do in HTML
19:07:29 <keegan> it's a regrettable necessity based on the early days of the web being dominated by ad hoc systems and postel's law
19:07:34 <ais523> it is Postellish in some respects, too, e.g. saying that web pages must be in UTF-8 but giving long complicated instructions for what to do if they aren't
19:08:46 -!- arseniiv has quit (Ping timeout: 260 seconds).
19:09:45 <ais523> actually, one related problem I've been having recently, which may be unsolvably difficult, and Stack Overflow has not been helpful:
19:10:06 <ais523> given a URL, which characters in it can be safely percent-decoded without changing the meaning of the URL
19:11:42 <ais523> I'm trying to write an HTML sanitizer and would prefer to avoid allowing people to put obfuscated URLs through it, but it's so hard to figure out the rules for what will and what won't work
19:12:28 <nakilon> generate string of all chars and escape it with some very common library used for that need
19:12:35 <nakilon> to see what chars it will process
19:14:10 <ais523> nakilon: that basically means assuming that the library is correct, which it probably won't be
19:14:15 <nakilon> pretty sure all libraries will process different set of chars )
19:14:28 <nakilon> there is probably no correct library
19:14:32 <ais523> that said, I have been trying various test strings on various browsers and one httpd, to see what happens
19:14:46 <nakilon> maybe some Chrome is implemented "correctly" but it won't provide a library
19:14:50 <ais523> (testing a wide range of httpds would be frustrating)
19:15:22 <ais523> one thing I did learn throughout all this is that the URL path component %2e%2e is in fact equivalent to .. and will cancel out the previous component
19:15:38 <ais523> which seems like an unwise decision from a security point of view, that's just asking for path traversal vulnerabilities
19:15:43 <nakilon> also the possible achievable "correctness" of your tool is limited by how correct the servers are
19:16:01 <nakilon> many of them work differently about URL escaping
19:16:30 <ais523> I think the only real option here is to have a parameter for what sort of dubious-looking escapings the user wants to exclude
19:16:37 <nakilon> also additional rules and bugs in redirects
19:22:08 <nakilon> teaching velik wolfram alpha, somehow it took the whole day to make 10 tests, and it's only a piece of Math examples; there are three other topics, maybe I'll make most of them tomorrow
19:32:17 <b_jonas> ais523: "prevent non-IRC-related programs being tricked into connecting to IRC" => yes, that might be part of the reason.
19:33:46 <b_jonas> "avoid allowing people to put obfuscated URLs through it" => yeah, that's probably impossible
20:13:55 <fizzie> Normally I don't pay *that* much attention to update sizes, but now updating blender wants to install "libembree3-3" that will take half a gigabyte of disk.
20:15:45 <ais523> that's a pretty big library!
20:16:18 <ais523> b_jonas: it just seems so wrong to let people post arbitrary URLs on, say, forums or the like, when you're supposed to be sanitising the content
20:16:40 <fizzie> "Intel® Embree is a collection of high performance ray tracing kernels that helps graphics application engineers to improve the performance of their photorealistic rendering application." I feel like they've probably got versions specifically tuned for a bazillion different (Intel) CPU models.
20:18:21 <fizzie> Heh, it's a single 485223648-byte .so file.
20:18:47 <b_jonas> ais523: I'm not sure I see why, except for the part where you might sanitize the protocol part (the part before the first colon) and add a max length
20:19:43 <ais523> fizzie: I suspect that you only need around eight different versions of your code to get peak performance on all 64-bit Intel CPUs
20:19:49 <b_jonas> ais523: though most of the time you'd probably throw in that HTML attribute that hints to search engines that this is a link by a third party submission and the search engine shouldn't think your site is deliberately linking to kiddy porn
20:20:02 <ais523> and maybe another four or five more for AMD
20:20:22 <ais523> b_jonas: I've already been looking through the list of rel= attributes
20:20:27 <b_jonas> it won't take away all the responsibility about what links you host, but you can't generally fix that by just looking at the URL
20:20:42 <ais523> I think probably at least nofollow and noreferrer should be in there for external links by default
20:21:17 <b_jonas> you absolutely want to whitelist protocols though because of javascript: links though
20:21:21 <ais523> although noreferrer is interesting because you can also set an HTTP header that tells the browser to noreferrer everything
20:21:40 <ais523> I explicitly turned it off on my website (as in, I outright said in the headers that I know this header exists and I'm choosing not to use it), which makes some security checkers really annoyed
20:22:00 <b_jonas> lol "only need around eight different versions of your code to get peak performance on all 64-bit Intel CPUs"
20:22:13 <ais523> b_jonas: I mean that there isn't a combinatorial explosion
20:22:32 <b_jonas> and then you want AMD and code running on GPU and a port for ARM64 etc
20:22:56 <b_jonas> ais523: yes, that's because it's really hard to make CPUs so there are only two or three companies making x86 cpus at a time
20:23:07 <oerjan> . o O ( rel=dontclickfortheloveofgod )
20:23:08 <ais523> although, in practice nowadays, I think you can get decent performance for CPU-bound code by writing a post-AVX2 version for most people, and a pre-AVX2 version (aiming for maximum compatibility) for people who are running on really old computers
20:23:22 <b_jonas> and they mostly develop at most three lines of them in parallel each
20:23:33 <b_jonas> one expensive, one home, and one low-power laptop one
20:23:34 <ais523> optimising for AMD does seem to be significantly different from optimising for Intel, though
20:24:01 <ais523> in particular, if the program isn't memory-bound, the next most relevant bottleneck on Intel is normally instruction dispatch, whereas on AMD it's usually something else
20:24:07 <b_jonas> ais523: and more importantly, you only need to make different versions of a few performance-critical functions, not everything in your code
20:25:01 <b_jonas> there might still be a combinatorial explosion if you want versions of your code that differ in ways other than the CPU hardware
20:25:09 <ais523> I came up against this in the fizzbuzz I've been writing over the last year or so
20:25:40 <ais523> I want to read a vector from memory, then do two instructions with that vector as one argument and a vector in a register as a second argument
20:25:52 <riv> how is the fizzbuzz coming??
20:25:54 <ais523> on Intel, it's optimal to read the vector from memory twice
20:26:08 <ais523> on AMD, you want to read it into a register and then use it from there
20:26:33 <ais523> this is because Intel is bottlenecked on instruction decode so simply using fewer instructions is a gain, the L1 cache can handle the second read
20:26:53 <fizzie> Well, yes. It is a C++ project. It's possible there's a combinatorial explosion of templates instead. There isn't that much code in terms of source code lexically.
20:27:00 <ais523> on AMD the instruction decode is faster but the L1 cache has less bandwidth, so you can spare an extra instruction to read into a register to spare the cache bandwidth
20:27:10 <b_jonas> ais523: that might change for future AMD cpus...
20:27:16 <ais523> riv: I think I have a plan, the issue is just finding the time to write this code
20:27:46 <ais523> b_jonas: it's possible, but AMD seem to have been going down the path of using hyperthreading to make use of the extra instruction dispatch capability
20:28:05 <ais523> (sorry, I meant dispatch not decode, AMD is bottlenecked on decode too but that only matters if you aren't in a loop because of the µop cache)
20:28:22 <b_jonas> that said, I agree that bottlenecking on either memory access or instruction dispatch is typical these days, the execution times don't matter as much, unless you are specifically writing matrix multiplication inner loops or things like that
20:29:08 <ais523> even matrix multiplication is bottlenecked on memory access, most of the fast techniques for it are based on trying to avoid cache spills
20:30:05 <b_jonas> ais523: yes, so it's only the inner loops where you actually have to care about the execution times of these floating point multiplication-add instructions.
20:30:22 <ais523> it's hard to think of something that wouldn't bottleneck on memory access – maybe things like prime factorization, or pathfinding
20:31:00 <ais523> b_jonas: oh, fused multiply-adds are fast, but that doesn't really matter, they're more beneficial in terms of accuracy than they are in terms of speed
20:31:35 <ais523> multiply then add is 2 cycles latency, fused multiply-add is 1 cycle latency, and they both have enormous throughput (values correct for recent Intel and also recent AMD)
20:33:55 <b_jonas> ais523: yes, the current CPUs are so optimized for that that you basically can't run out of multiplication units. I remember there was a time when the CPU was better at fused multiply-add than additions
20:34:25 <ais523> integer additions do actually beat all the floating-point stuff on most modern CPUs, though
20:35:02 <ais523> on Intel, this is primarily because the execution unit that handles jumps and branches can be used to do integer additions/subtractions if it isn't needed for the jump
20:35:13 <b_jonas> ais523: while floating point multiplications beat integer multiplications, yes
20:35:19 <ais523> (because it can handle fused compare-jump, fused subtract-jump, and friends)
20:35:38 <ais523> and yes, floating point multiplication performance is better than integer multiplication (although normally not that much better)
20:35:45 <b_jonas> only 64-bit ones though, because the mantissa is bigger
20:35:54 <ais523> Intel actually has two different floating point multipliers with different performance characterstics
20:36:04 <ais523> one has higher throughput, the other lower latency
20:36:36 <b_jonas> ais523: used for the same instructions? I didn't know that
20:36:38 <ais523> actually, the main throughput bottleneck I tend to hit is vector shuffles
20:36:49 <ais523> b_jonas: I think they're mostly used for different instructions
20:37:05 <ais523> Intel normally has only one vector shuffler, it's fast but you can only use it once per cycle
20:37:14 <ais523> and lots of useful instructions fall into the "vector shuffle" group
20:38:30 <ais523> there's also the infamous lane-crossing penalty (especially on AMD, but I think it affects Intel too)
20:39:15 <ais523> where it costs something like 3 cycles to do anything that combines the top half of a register and the bottom half of the register, when the register is "sufficiently large" (normally a recently introduced vector size)
20:39:58 <ais523> this is why lots of vector instructions are incapable of mixing the top and bottom half of a YMM register, they're instead basically designed as two XMM instructions in SIMD (even if they aren't normally SIMD instrucitons)
20:41:35 -!- arseniiv has joined.
20:42:04 <b_jonas> VPSHUFB for ymm registers specifically
20:42:50 <ais523> that was the example I was going to use
20:43:17 <b_jonas> I am following this because AVX2 is now available on lots of CPUs
20:43:38 <ais523> and there's an annoying lack of backwards compatibility, too – even if Intel or AMD figure out how to make five bits of the index useful, they won't be able to make their VPSHUFB instructions actually handle them
20:43:41 <b_jonas> (including my new home computer)
20:43:43 <ais523> because it would break backwards compatibility
20:43:53 <ais523> I've had an AVX2-capable computer for a few years now
20:44:56 <b_jonas> ais523: they just add a new instruction for that. they're adding lots of new vector instructions all the time anyway.
20:45:30 <ais523> but what do they even name it?
20:45:41 <b_jonas> no, I think it's VPERMsomething
20:45:46 <ais523> (with a VPSHUFB6 coming in a few years after AVX-512 is more mature?)
20:46:00 <b_jonas> let me look it up, I think it's later than AVX2
20:46:00 <ais523> oh, the PERM stuff normally has worse granularity than SHUF, this will be confusing
20:46:38 <b_jonas> no, there is now a VPERMB that is a full byte level shuffle even on a zmm register
20:46:47 <b_jonas> the lower granularity was a thing of the past
20:47:08 <ais523> knowing how AVX-512 is going, this is likely to have been specified by Intel but not actually implemented by anything
20:47:08 <b_jonas> well, thing of the past that's still in many CPUs that we're using now
20:47:15 <arseniiv> <b_jonas> very long ago I made a script to say "best viewed with Mozilla" or "best viewed with Internet Explorer", always the other one than the viewer is using => rofl oh my
20:47:24 <ais523> there's a lot of AVX-512 which was specified but with no implementations
20:48:28 <ais523> this may end up leading to another FMA3/FMA4 debacle some time in the future
20:48:52 <b_jonas> ais523: there's even a VPERMI2B instruction to byte level permute *two* zmm registers
20:48:59 <ais523> (FMA got specified prior to being implemented, with Intel and AMD proposing different plans; each then implemented the *other's* specification, leaving them incompatible)
20:49:14 <b_jonas> each implemented only the other's specification?
20:49:15 <ais523> I think AMD implemented Intel's specification because they wanted to be compatible
20:49:26 <b_jonas> I know they implemented incompatible stuff
20:49:30 <b_jonas> but I didn't know they swapped
20:49:32 <ais523> and Intel implemented AMD's specification because they couldn't get their own to work, it needed too much internal rearchitecturing
20:49:49 <ais523> (presumably this is why AMD came up with their version in the first place, it would be easier to implement)
20:49:57 <b_jonas> but 3DNow was AMD's specification that was never in Intel, right?
20:50:08 <ais523> b_jonas: yes, although a couple of 3DNow commands survived
20:50:34 <ais523> admittedly, SSE is much better-designed than 3DNow was, although both are dubious in terms of encoding
20:51:07 <b_jonas> I never really looked into the details of what 3DNow does. it was obsolete by the time I could have cared.
20:51:31 <b_jonas> we already had SSE4.1 by the time I started to care about SIMD instruction stuff
20:52:00 <ais523> b_jonas: think SSE with 64-bit-wide vectors
20:52:27 <ais523> I thought MMX wasn't vectorised at all
20:52:37 <ais523> 3DNow is, as long as you want a pair of single-precision floats
20:53:17 <ais523> it was the first vector unit; it simply just wasn't a very good one
20:53:25 <b_jonas> MMX has the drawback that it shares state with the FPU, and you have to do a slow switch of the FPU between MMX and traditional mode each time you want to use it, since the existing ABI expects the FPU to be in non-MMX mode
20:53:50 <b_jonas> MMX is "vectorized" in that it can handle two 32-bit floats in a 64-bit register
20:54:11 <ais523> hmm, maybe I got them muddled then
20:54:20 <b_jonas> but two floats per register is still a big help
20:54:21 <ais523> or maybe 3DNow uses the MMX registers for its vectors
20:54:39 <b_jonas> it also handles packed integers
20:54:52 <ais523> https://en.wikipedia.org/wiki/3DNow!
20:55:03 <ais523> right, 3DNow! seems to be an extension to use the MMX registers as vector registers
20:55:14 <b_jonas> apparently MMX *only* handles integers
20:56:23 <ais523> oh, so MMX does int vectorisation and 3DNow! does float vectorisation?
20:56:25 <b_jonas> I know these days MMX is only useful to get a few extra registers that you can sometimes access with shorter encodings than anything in the later instruction sets, and basically never worth to use
20:56:49 <b_jonas> I have no idea what 3DNow does
20:56:55 <ais523> I'm actually vaguely surprised that MMX didn't become the standard for non-vectorised floating point
20:57:15 <b_jonas> ais523: what do you mean "the standard"?
20:57:23 <ais523> it is saner than x87, and supported by all 64-bit CPUs
20:57:54 <ais523> like, the ABI passes floats in MMX registers, assumes MMX mode at call boundaries, and the like
20:58:05 <b_jonas> ais523: which ABI? we can't change the x86_32 ABI, it's too late for that, and x86_64 comes with always SSE2 so by that time the point is moot
20:58:19 <b_jonas> also if MMX only handles integers then that can't work
20:58:34 <ais523> no, MMX definitely does floats
20:58:55 -!- Lord_of_Life has quit (Ping timeout: 260 seconds).
20:59:01 <ais523> it doesn't do floats, only ints
20:59:07 <ais523> that's why people don't use it for float maths :-)
20:59:19 -!- Lord_of_Life has joined.
20:59:39 <b_jonas> ais523: also SSE2 is the standard for passing floats in the x86_64 ABI, and that's a good thing
21:00:02 <b_jonas> because with SSE2 there, MMX is almost never useful
21:00:21 <b_jonas> and SSE2 adds advantages, both wider vectors and a better instruction set
21:00:23 <ais523> so it looks like we have three sets of registers: integer; x87/MMX/3DNow!; and XMM/YMM/ZMM
21:00:35 <b_jonas> oh, 3DNow also uses the x87 registers?
21:00:43 <ais523> (also random special-purpose stuff like flags, but I'm not counting those)
21:00:56 <b_jonas> also (sigh) we also have AVX512 mask registers.
21:01:21 <b_jonas> on AVX512-capable CPUs that is
21:01:29 <ais523> x87 interprets the registers as one of three float formats (long double, plus formats which are almost but not quite the same as float and double); MMX as 64-bit integer vectors; and 3DNow! always as two floats
21:01:43 <ais523> b_jonas: to be fair those are really helpful for some applications
21:02:42 <b_jonas> ais523: no, x87 specifically stores 80-bit floats, not long doubles. there's a difference because long double is 64-bit floats in the MSVC ABI
21:03:08 <ais523> well, yes, but they're what has been known as "long double" for ages on Intellish processors
21:03:23 <ais523> but they got deprecated with the change to 64-bit
21:03:40 <b_jonas> because SSE2 handles 64-bit floats, yes
21:05:43 <esolangs> [[Cabra]] M https://esolangs.org/w/index.php?diff=88008&oldid=81202 * PythonshellDebugwindow * (+0) /* Language Definition */ Fix typo
21:05:46 <ais523> Wikipedia says that 3DNow! invented SFENCE
21:05:47 <b_jonas> wtf there's a KADDBB/KADDW/KADDD/KADDQ AVX512 instruction? I never noticed that
21:06:30 <b_jonas> ais523: I admit I don't follow how the fence instructions work. I leave them to slightly higher level libraries.
21:06:31 <ais523> but that seems unlikely to me, because my understanding of the x86 memory model is that an SFENCE is only useful with non-temporal writes or write-combining memory, and I didn't think those were implemented at that point
21:07:00 <ais523> b_jonas: I can describe the general (non-x86-specific) implementation fairly easily
21:07:15 <ais523> imagine loads and stores as not happening instantly, but being spread out over time
21:07:46 <ais523> an lfence stops a load crossing it (it has to happen entirely before if it's before the lfence, or entirely after if it's after the lfence)
21:07:51 <ais523> likewise, an sfence stops a store crossing it
21:08:13 <fizzie> I think PPC conventionally has a double-double as its `long double` type.
21:08:44 <ais523> if one thread is storing two pieces of data, and another thread is loading them, then you need to sfence between the stores and lfence between the loads if you want to prevent the loading thread seeing the new value of the second store, but the old value of the first store
21:09:45 <b_jonas> spread out over time how? you mean they happen at different times to different layers of the cache hierarchy, going down the hierarchy if either the smaller caches need to free up space or to make the value known to other CPUs?
21:10:14 <ais523> b_jonas: imagine that you send a "request to write memory" but then continue executing before the request has been handled
21:10:20 <ais523> and let the motherboard respond to the request at some later time
21:10:21 <b_jonas> ais523: "stops a load crossing it" at what levels of the hierarchy?
21:11:01 <ais523> it's a logical rather than physical barrier, it's not bound to a specific level of hierarchy
21:11:08 <ais523> so you have to match an lfence on one thread with an sfence on another
21:11:32 <b_jonas> I still think I don't need to know the details of this, what I do need to know is the atomic and mutex abstractions over them that libraries provide me
21:11:50 <ais523> on x86-64 specifically I think it's handled as part of the cache coherency mechanism
21:11:56 <b_jonas> because I don't think I write inter-thread (or inter-process) communication code that is at a lower level than those
21:12:34 <b_jonas> nor CPU-level code that handles memory mapped to the video card or other memory-mapped IO
21:12:43 <ais523> one way to think about it is that sfence is one of the two main mechanisms for implementing the "release" atomic ordering, and lfence is one of the two main mechanisms for implementing the "acquire" atomic ordering
21:13:20 <ais523> atomic release is sfence then write; atomic acquire is read then lfence
21:14:03 <ais523> only, x86 has extra guarantees that most processors don't, so sfence is usually a no-op and I think many atomic libraries leave it out there on x86-64 (even though they would use it on other processors)
21:14:18 <ais523> (lfence is not a no-op, though, and is important in atomic code)
21:15:19 <b_jonas> and as far as I understand, the compilers need to know about both fences and atomics, because they have a meaning not only on the CPU level, but for what the optimizer isn't allowed to do, and current compilers indeed do this. (in contrast, I think the compiler needn't know about mutexes directly.)
21:16:22 <ais523> the compiler does need to know about the acquire/release rules on mutexes, but either it can see the atomic read/write in the function, or else it can't see anything at all and thus has to assume the worst
21:16:51 <ais523> oh, this reminded me of a weird case of wanting a compiler barrier specifically
21:17:13 <b_jonas> yes, the fast (non-contented) cases mutex functions must be fast so the optimizer will see into the functions when necessary
21:17:21 <ais523> the idea would be in functions that undropped permissions, did a system call with checks, then dropped them again
21:17:43 <ais523> to do the checks with permissions raised, and to compiler-barrier to ensure that the undropping is done before the checks
21:18:01 <ais523> this sounds like it violates the least-permissions principle, but the point is to protect the checks from return-oriented programming
21:18:11 <ais523> in order to get permission to do the system call, the code would need to run through the checks too
21:18:46 <b_jonas> ais523: what kind of permission checks? aren't those undropping, permission checking, and dropping three system calls, and the compiler already mustn't reorder system calls?
21:19:01 <b_jonas> it also reminds me of something similar by the way
21:19:29 <ais523> b_jonas: say, you want an mprotect() wrapper that checks that you aren't making anything executable if it was previously nonexecutable
21:19:55 <b_jonas> ais523: ah, so by permission checking you just mean accessing memory that may or may not have read/write/execute permissions?
21:20:07 <b_jonas> hmm, that might be difficult
21:20:18 <ais523> I didn't mean permission checking, just checking in general
21:20:42 <b_jonas> but even so, can't a system call basically write anything to anywhere in your user memory, so you usually can't reorder memory accesses around them anyway?
21:21:11 <ais523> oh, that's interesting – the point being that compilers wouldn't optimise system calls anyway due to not knowing what they do?
21:21:30 <b_jonas> ais523: yes, except maybe a few specific system calls of which they know the meaning
21:21:51 <b_jonas> there are system calls like pread that can write even to memory that you didn't pass a pointer to to the system call
21:21:58 <ais523> I know there are some functions that can system call, and that the compiler treats specially
21:22:30 -!- delta23 has quit (Quit: Leaving).
21:24:35 <b_jonas> preadv can write anywhere, and a compiler has to assume that an unknown system call can do things worse than that
21:24:53 <ais523> preadv seems so specific
21:25:18 <ais523> I can see why it could be useful – it saves the overhead of making multiple system calls when you want to do that operation specifically – but I'm unclear on how common that particular operation would be
21:25:25 <b_jonas> it's specific in that it's a particularly badly behaving system call, that's why I'm giving it here as an example
21:25:37 <b_jonas> most system calls are tamer than that, but the compiler can't easily rely on that
21:26:14 <ais523> I meant, I was thinking on a different line of thought when you mentioned preadv
21:26:25 <ais523> like, what was the motivation behind adding that to the kernel? who needed it, and what do they do with it?
21:27:21 <ais523> 99% of programs would just read into a large buffer and then copy the data into the appropriate final locations, rather than spending time coming up with a big description for preadv
21:27:29 <ais523> although, preadv is faster because it reduces cache pressure
21:28:21 <b_jonas> I think it might be there because they wanted to get asynchronious regular file reading to work, which turned out quite hard and they're still struggling with it, but anyway the interface of the async reading allows similar scatter-gather read because it has to allow multiple reads at the same time, so they added a normal non-async interface at that point
21:29:03 <b_jonas> but maybe someone just used a cyclical buffer and wanted to micro-optimize the number of system calls, since the context change for system calls used to be slower than now
21:29:19 <b_jonas> preadv is very old, you have to remember that
21:29:25 <b_jonas> so it can have some odd historical reason
21:29:33 <ais523> …now I'm wondering if preadv is faster than mmap + memcpy
21:29:47 <ais523> it could be, I guess? because the physical memory you mmap into has to be cached
21:31:24 <b_jonas> ais523: yeah, look, the manpage says "these system calls first appeared in 4.2BSD"
21:31:34 <b_jonas> so old it's hard to speculate about it
21:32:55 <b_jonas> ais523: readv is older than pread if the system call numbering can be believed
21:33:25 <ais523> b_jonas: while searching about uses of preadv, I found some mailing list archive mentions which implied that readv was newer
21:33:31 -!- oerjan has quit (Quit: Nite).
21:37:26 <b_jonas> ais523: anyway, what this reminded me of is the SSE floating point control word. these control the rounding mode, the exception mask, and two bits to change denormal inputs and results to zero in floating point instructions because those denormals would cause a big slowdown on certain CPUs. anyway, the compiler *should* know about the semantics of the SSE floating point control word to the extent that
21:37:32 <b_jonas> it's not allowed to reorder floating point arithmetic around changing the control word, but current compilers don't yet know this, so it's not quite clear how you can write to the floating point control in a useful way without potential undefined behavior.
21:37:46 <b_jonas> the situation is similar to the atomic operations back when multithreading was new and compilers didn't yet know much about it
21:39:49 <ais523> what's the performance of changing the floating-point control word like?
21:39:57 <ais523> I can easily imagine algorithms which want to change it a lot
21:40:20 <b_jonas> and no, you can't just change the floating point control word in a non-inlinable function, partly because the ABI says that the rounding mode etc has to be in its default state between function calls, and more importantly because the compiler is normally allowed to reorder a floating point operation around an unknown function call.
21:40:34 <ais523> IIRC AVX-512 dedicates a couple of bits of the instruction to override parts of the FPU control word
21:41:17 <ais523> fwiw, on gcc you could probably get away with an asm voltatile that takes the results of previous FPU instructions and inputs of subsequent FPU instructions as read-write parameters and then doesn't change them
21:41:38 <ais523> would be annoying to write, but gcc would be forced to put the control-word-changing operation in the right place
21:41:49 <b_jonas> ais523: there are two cases when you want to change the floating point control word a lot. one is if you want to use a non-default floating point control word, but also do function calls or returns to code that you don't control since technically you have to restore the default control word because any library function is allowed to do floating instructions; the other is interval arithmetic which can
21:41:55 <b_jonas> change the rounding mode a lot
21:42:09 <b_jonas> ais523: but more likely you just want to change the control word once, then do a lot of float operations
21:43:35 <b_jonas> well, admittedly there's a third case, if you want to read the exception flags and have to reset them for that reason
21:43:45 <ais523> I was thinking of interval arithmetic
21:43:52 <ais523> exception flags might also be relevant in some algorithms
21:44:13 <b_jonas> and I don't know about what performance writing the control word has, I'm mostly concerned about cases when that doesn't matter
21:50:07 <b_jonas> apparently you can get slowdowns for denormal results in both Intel and AMD, and the optimization manuals for the two brands detail when these can and can't happen and what you should do about them
21:50:35 <b_jonas> the actual rules do differ between Intel and AMD, at least for some of their CPUs
21:52:15 <ais523> I think for many float operations, denormals are emulated in firmware rather than having dedicated hardware
21:52:19 <ais523> so the performance is terrible
21:52:58 <b_jonas> ais523: that's the gist of it, but the details are complicated. not all instructions with a denormal input or output give a slowdown.
21:53:24 <b_jonas> that's why there's an optimization manual
21:53:30 <b_jonas> I don't claim to really understand the rules
21:53:55 -!- normsaa has quit (Ping timeout: 256 seconds).
21:53:57 <b_jonas> I just have to know where to look if this becomes important
21:54:27 <b_jonas> fungot, does a circle have no end?
21:54:28 <fungot> b_jonas: yeah you said that geographic location has to do with the code than by using only the functional aspects of scheme systems support arbitrary-size vectors...
22:16:19 <arseniiv> fungot: alright, but at least it should have a coend, does it? (I don’t know what a coend is, that’s something from category theory)
22:16:20 <fungot> arseniiv: for example because it pollutes the default namespace
22:17:31 <arseniiv> hm that’s a bit too philosophical
22:18:55 <Corbin> fungot: What is a coend but a colimit over a bifunctor?
22:18:56 <fungot> Corbin: i suppose fnord is the original?
22:19:21 <fungot> Corbin: no, he refuses to give me access to your harddrive.)
22:20:36 <arseniiv> fungot always tries to fool you with words when you ask significant questions
22:20:36 <fungot> arseniiv: even the ' web-sivu' in the xvid format... would it be for me making return values in your helper function seem to always be using fd3 itself, though
22:21:31 <arseniiv> fungot: at least tell me this: xvid or divx?
22:21:31 <fungot> arseniiv: what, thin and unnecessarily bendy? and!... but is that what you recorded was to later evaluate ( quux zot)
22:21:54 <arseniiv> exactly as I said, no definite answer :′(
22:23:54 <fungot> arseniiv: what with my poor grammar and ' be's all over the state. so you can do
22:36:08 -!- Cale has quit (Remote host closed the connection).
22:38:37 -!- Cale has joined.
22:39:16 -!- chiselfuse has quit (Write error: Connection reset by peer).
22:39:16 -!- hendursaga has quit (Write error: Connection reset by peer).
23:46:08 <esolangs> [[Esolang:Sandbox]] M https://esolangs.org/w/index.php?diff=88009&oldid=87590 * PythonshellDebugwindow * (+18) rd
23:46:38 <esolangs> [[Esolang:Sandbox]] M https://esolangs.org/w/index.php?diff=88010&oldid=88009 * PythonshellDebugwindow * (+1) Rd
23:46:51 <esolangs> [[Esolang:Sandbox]] M https://esolangs.org/w/index.php?diff=88011&oldid=88010 * PythonshellDebugwindow * (+1) :