00:06:32 <zzo38> OK, apparently zsync uses a file called .zsync to control it; I will read the documentation to see how suitable it is.
00:08:18 <zzo38> However, what I have, I think that what is helpful is each "realm" has its own timestamp, and may be by name (ordinary files and NNTP) or by hash (Fossil repositories). By hash is always immutable; by name can be mutable (ordinary files) or immutable (NNTP).
00:13:46 -!- Arcorann has joined.
00:19:06 <zzo38> Is there a backup protocol that does that?
00:31:38 -!- tromp has joined.
00:36:14 -!- tromp has quit (Ping timeout: 272 seconds).
00:39:31 <zzo38> I think .zsync files are probably not suitable for my use.
00:40:28 <zzo38> (I also do not see documentation about the zsync file format, anyways.)
00:43:59 <fizzie> Yeah, I haven't seen it documented either. There's a page that explains what it's made of conceptually -- http://zsync.moria.org.uk/paper200503/ -- but doesn't bother to describe the exact file format. But you're right that it's probably not really designed to be extensible.
00:45:39 -!- tromp has joined.
00:46:47 <esowiki> [[Chatlog]] M https://esolangs.org/w/index.php?diff=80707&oldid=80689 * PythonshellDebugwindow * (+57) Move cats to bottom; add some too
00:48:16 <esowiki> [[Language list]] M https://esolangs.org/w/index.php?diff=80708&oldid=80670 * PythonshellDebugwindow * (+14) /* C */ Add [[Chatlog]]
00:49:52 <esowiki> [[Language list]] M https://esolangs.org/w/index.php?diff=80709&oldid=80708 * PythonshellDebugwindow * (+14) /* P */ Add [[Pewlang]]
00:50:47 -!- tromp has quit (Ping timeout: 272 seconds).
00:52:03 -!- arseniiv has quit (Quit: gone too far).
01:26:33 -!- tromp has joined.
01:30:36 -!- tromp has quit (Ping timeout: 240 seconds).
01:40:48 -!- sftp has quit (Excess Flood).
01:41:25 <oren> monitors should just have feel-able markings in the shape of H D M I
01:41:32 -!- sftp has joined.
01:41:41 <oren> so that you can tell which one is the HDMI port
01:49:05 <esowiki> [[Divrac]] N https://esolangs.org/w/index.php?oldid=80710 * PythonshellDebugwindow * (+2287) Make language
01:49:38 <esowiki> [[Language list]] M https://esolangs.org/w/index.php?diff=80711&oldid=80709 * PythonshellDebugwindow * (+13) /* D */ +[[Divrac]]
01:50:08 -!- delta23 has quit (Quit: Leaving).
02:20:42 -!- tromp has joined.
02:23:00 -!- craigo has joined.
02:25:06 -!- tromp has quit (Ping timeout: 246 seconds).
02:30:05 -!- Lord_of_Life has quit (Ping timeout: 240 seconds).
02:35:35 -!- Lord_of_Life has joined.
03:02:46 -!- tromp has joined.
03:07:06 -!- tromp has quit (Ping timeout: 246 seconds).
03:43:55 -!- tromp has joined.
03:44:41 -!- imode has joined.
03:48:45 -!- tromp has quit (Ping timeout: 272 seconds).
04:10:01 -!- tromp has joined.
04:14:57 -!- tromp has quit (Ping timeout: 264 seconds).
04:49:15 -!- scoofy has joined.
05:04:06 -!- tromp has joined.
05:08:57 -!- tromp has quit (Ping timeout: 264 seconds).
05:22:00 <Sgeo> Has anyone made a spacesort analogous to sleepsort that just puts stuff into a large array?
05:37:00 <Sgeo> https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=581dc6e3ac2423bf0b82105cb731faf9
05:45:44 <shachaf> This sort of thing is sometimes called "bucket sort" or other names.
05:48:26 <shachaf> "counting sort" was what I meant, I think.
05:48:55 <shachaf> Radix sort often uses this kind of thing.
05:51:10 <Sgeo> It has a legitimate use???
05:51:49 <Sgeo> I guess I was thinking that since I was trying to tamper with the obviously esoteric sleepsort, the corresponding wasteful of space would also be esoteric
05:52:04 <shachaf> Sure, if there's a small number of values.
05:55:56 <shachaf> It's a linear-time sorting algorithm.
05:58:06 -!- tromp has joined.
06:03:02 -!- tromp has quit (Ping timeout: 272 seconds).
06:18:34 -!- tromp has joined.
06:23:21 -!- tromp has quit (Ping timeout: 264 seconds).
06:27:56 -!- tromp has joined.
06:32:47 -!- tromp has quit (Ping timeout: 272 seconds).
06:42:11 -!- tromp has joined.
06:46:33 -!- tromp has quit (Ping timeout: 246 seconds).
06:51:06 -!- tromp has joined.
07:12:40 <zzo38> How is sleepsort working?
07:18:23 <zzo38> On a television show, for a few seconds it showed a different television show for a few seconds; it displayed two overlapping station indicators. And then, it switched back to the correct show, and then a few seconds after that, it was interrupted again, for less than one second I saw a message that said "Press any key to watch TV".
07:18:56 <zzo38> (It is a message and font style which are not applicable to any of the equipment I have.)
07:45:33 -!- sprock has quit (Ping timeout: 264 seconds).
08:04:07 -!- tromp has quit (Remote host closed the connection).
08:10:35 -!- tromp has joined.
08:19:57 -!- hendursa1 has joined.
08:20:32 -!- hendursaga has quit (Ping timeout: 268 seconds).
08:27:31 -!- kritixilithos has joined.
08:34:06 -!- user24 has joined.
08:47:08 -!- Taneb has quit (*.net *.split).
08:49:40 -!- APic has quit (*.net *.split).
08:49:41 -!- iovoid has quit (*.net *.split).
08:49:41 -!- joast has quit (*.net *.split).
08:49:43 -!- naivesheep has quit (*.net *.split).
08:49:43 -!- zzo38 has quit (*.net *.split).
08:49:43 -!- ineiros has quit (*.net *.split).
08:49:43 -!- mla has quit (*.net *.split).
08:49:51 -!- kritixilithos has quit (*.net *.split).
08:49:51 -!- hendursa1 has quit (*.net *.split).
08:49:52 -!- xelxebar has quit (*.net *.split).
08:49:52 -!- V has quit (*.net *.split).
08:49:52 -!- Arcorann has quit (*.net *.split).
08:49:53 -!- shachaf has quit (*.net *.split).
08:49:53 -!- dnm has quit (*.net *.split).
08:49:53 -!- grumble has quit (*.net *.split).
08:49:54 -!- moony has quit (*.net *.split).
08:49:54 -!- jix has quit (*.net *.split).
08:49:54 -!- mniip has quit (*.net *.split).
08:49:54 -!- BWBellairs has quit (*.net *.split).
08:49:54 -!- scoofy has quit (*.net *.split).
08:49:54 -!- MDude has quit (*.net *.split).
08:49:54 -!- Melvar has quit (*.net *.split).
08:49:54 -!- lifthrasiir has quit (*.net *.split).
08:49:56 -!- olsner has quit (*.net *.split).
08:49:56 -!- sftp has quit (*.net *.split).
08:49:56 -!- Sgeo has quit (*.net *.split).
08:49:56 -!- glowcoil has quit (*.net *.split).
08:49:57 -!- ^[ has quit (*.net *.split).
08:49:58 -!- myname has quit (*.net *.split).
08:49:58 -!- wesleyac has quit (*.net *.split).
08:49:58 -!- izabera has quit (*.net *.split).
08:49:58 -!- Hooloovo0 has quit (*.net *.split).
08:49:58 -!- FireFly has quit (*.net *.split).
08:50:00 -!- aloril has quit (*.net *.split).
08:50:00 -!- pikhq has quit (*.net *.split).
08:50:11 -!- wmww has quit (*.net *.split).
08:50:16 -!- ocharles has quit (*.net *.split).
08:50:16 -!- Deewiant has quit (*.net *.split).
08:50:17 -!- Soni has quit (*.net *.split).
08:50:17 -!- mich181189 has quit (*.net *.split).
08:50:17 -!- shikhin has quit (*.net *.split).
08:50:17 -!- none30 has quit (*.net *.split).
08:50:17 -!- Discordian[m] has quit (*.net *.split).
08:50:21 -!- Lord_of_Life has quit (*.net *.split).
08:50:21 -!- sebbu has quit (*.net *.split).
08:50:21 -!- HackEso has quit (*.net *.split).
08:50:21 -!- clog has quit (*.net *.split).
08:50:21 -!- Cale has quit (*.net *.split).
08:50:21 -!- int-e has quit (*.net *.split).
08:50:22 -!- fungot has quit (*.net *.split).
08:50:22 -!- paul2520 has quit (*.net *.split).
08:50:23 -!- dionys has quit (*.net *.split).
08:50:23 -!- ski has quit (*.net *.split).
08:50:23 -!- hakatashi has quit (*.net *.split).
08:50:23 -!- ornxka_ has quit (*.net *.split).
08:50:24 -!- interruptinuse has quit (*.net *.split).
08:50:24 -!- user24 has quit (*.net *.split).
08:50:24 -!- imode has quit (*.net *.split).
08:50:25 -!- b_jonas has quit (*.net *.split).
08:50:25 -!- nakilon has quit (*.net *.split).
08:50:26 -!- Lymia has quit (*.net *.split).
08:50:26 -!- sparr has quit (*.net *.split).
08:50:27 -!- j4cbo has quit (*.net *.split).
08:50:27 -!- lambdabot has quit (*.net *.split).
08:50:27 -!- shinh has quit (*.net *.split).
08:50:27 -!- j-bot has quit (*.net *.split).
08:50:27 -!- heroux has quit (*.net *.split).
08:50:28 -!- vertrex- has quit (*.net *.split).
08:50:28 -!- atehwa has quit (*.net *.split).
08:50:28 -!- fizzie has quit (*.net *.split).
08:50:28 -!- orbitaldecay has quit (*.net *.split).
08:50:28 -!- oren has quit (*.net *.split).
08:50:29 -!- tromp has quit (*.net *.split).
08:50:29 -!- craigo has quit (*.net *.split).
08:50:29 -!- harha_ has quit (*.net *.split).
08:50:29 -!- spruit11 has quit (*.net *.split).
08:50:29 -!- zeroed has quit (*.net *.split).
08:50:29 -!- copumpkin has quit (*.net *.split).
08:50:29 -!- Bowserinator has quit (*.net *.split).
08:50:29 -!- rodgort has quit (*.net *.split).
08:50:30 -!- kmc has quit (*.net *.split).
08:50:30 -!- relrod has quit (*.net *.split).
08:50:30 -!- djanatyn has quit (*.net *.split).
08:50:30 -!- haavard has quit (*.net *.split).
08:50:32 -!- df1111 has quit (*.net *.split).
08:50:34 -!- trn has quit (*.net *.split).
08:50:35 -!- stux|RC-only has quit (Excess Flood).
09:01:44 -!- APic has joined.
09:01:44 -!- orbitaldecay has joined.
09:01:44 -!- oren has joined.
09:01:44 -!- fizzie has joined.
09:01:44 -!- interruptinuse has joined.
09:01:44 -!- trn has joined.
09:01:44 -!- haavard has joined.
09:01:44 -!- sparr has joined.
09:01:44 -!- FireFly has joined.
09:01:44 -!- djanatyn has joined.
09:01:44 -!- izabera has joined.
09:01:44 -!- olsner has joined.
09:01:44 -!- relrod has joined.
09:01:44 -!- paul2520 has joined.
09:01:44 -!- BWBellairs has joined.
09:01:44 -!- V has joined.
09:01:44 -!- atehwa has joined.
09:01:44 -!- kmc has joined.
09:01:44 -!- vertrex- has joined.
09:01:44 -!- wesleyac has joined.
09:01:44 -!- myname has joined.
09:01:44 -!- mniip has joined.
09:01:44 -!- jix has joined.
09:01:44 -!- moony has joined.
09:01:44 -!- rodgort has joined.
09:01:44 -!- pikhq has joined.
09:01:44 -!- ^[ has joined.
09:01:44 -!- aloril has joined.
09:01:44 -!- fungot has joined.
09:01:44 -!- Lymia has joined.
09:01:44 -!- heroux has joined.
09:01:44 -!- Soni has joined.
09:01:44 -!- mich181189 has joined.
09:01:44 -!- shikhin has joined.
09:01:44 -!- Deewiant has joined.
09:01:44 -!- grumble has joined.
09:01:44 -!- ornxka_ has joined.
09:01:44 -!- j-bot has joined.
09:01:44 -!- shinh has joined.
09:01:44 -!- Bowserinator has joined.
09:01:44 -!- Hooloovo0 has joined.
09:01:44 -!- hakatashi has joined.
09:01:44 -!- dnm has joined.
09:01:44 -!- lambdabot has joined.
09:01:44 -!- int-e has joined.
09:01:44 -!- copumpkin has joined.
09:01:44 -!- j4cbo has joined.
09:01:44 -!- zeroed has joined.
09:01:44 -!- Cale has joined.
09:01:44 -!- ocharles has joined.
09:01:44 -!- glowcoil has joined.
09:01:44 -!- nakilon has joined.
09:01:44 -!- clog has joined.
09:01:44 -!- b_jonas has joined.
09:01:44 -!- HackEso has joined.
09:01:44 -!- shachaf has joined.
09:01:44 -!- lifthrasiir has joined.
09:01:44 -!- Melvar has joined.
09:01:44 -!- spruit11 has joined.
09:01:44 -!- harha_ has joined.
09:01:44 -!- xelxebar has joined.
09:01:44 -!- ski has joined.
09:01:44 -!- sebbu has joined.
09:01:44 -!- dionys has joined.
09:01:44 -!- Sgeo has joined.
09:01:44 -!- MDude has joined.
09:01:44 -!- Arcorann has joined.
09:01:44 -!- craigo has joined.
09:01:44 -!- Lord_of_Life has joined.
09:01:44 -!- imode has joined.
09:01:44 -!- scoofy has joined.
09:01:44 -!- tromp has joined.
09:01:44 -!- hendursa1 has joined.
09:01:44 -!- kritixilithos has joined.
09:01:44 -!- user24 has joined.
09:01:44 -!- Taneb has joined.
09:01:44 -!- naivesheep has joined.
09:01:44 -!- zzo38 has joined.
09:01:44 -!- ineiros has joined.
09:01:44 -!- mla has joined.
09:01:44 -!- iovoid has joined.
09:01:44 -!- joast has joined.
09:01:44 -!- stux|RC-only has joined.
09:01:44 -!- sftp has joined.
09:02:57 -!- mniip has quit (Ping timeout: 624 seconds).
09:03:39 -!- mniip has joined.
09:07:19 -!- metcalf has joined.
09:10:04 -!- hendursaga has joined.
09:10:29 -!- hendursa1 has quit (Ping timeout: 268 seconds).
09:14:20 <esowiki> [[Befunge]] https://esolangs.org/w/index.php?diff=80712&oldid=80685 * Quintopia * (-74) Now DNA-code outputs codons equiprobably.
09:14:37 <metcalf> I've found an old programming game / artificial life that seems to have dropped off the internet and made it available again https://corewar.co.uk/biomass/biomass.html
09:15:36 <metcalf> The images are missing, but I'll recreate those if I manage to compile to code with X11 display.
09:24:59 -!- df1111 has joined.
09:36:02 -!- wmww has joined.
09:36:09 -!- none30 has joined.
09:38:34 <esowiki> [[Pewlang]] https://esolangs.org/w/index.php?diff=80713&oldid=80704 * Tasty Kiwi * (-23)
09:42:04 -!- none30 has quit (Ping timeout: 240 seconds).
09:42:57 -!- df1111 has quit (Ping timeout: 258 seconds).
09:43:24 -!- wmww has quit (Ping timeout: 240 seconds).
09:43:31 <esowiki> [[Joke language list]] M https://esolangs.org/w/index.php?diff=80714&oldid=80452 * Tasty Kiwi * (+14) add pewlang
09:50:33 <esowiki> [[Pewlang]] M https://esolangs.org/w/index.php?diff=80715&oldid=80713 * Tasty Kiwi * (+18) add 2021 category
09:56:32 -!- Sgeo has quit (Quit: Leaving).
09:57:04 -!- user24 has quit (Quit: We must know, we will know).
10:00:03 -!- Sgeo has joined.
10:31:13 -!- Discordian[m] has joined.
10:33:10 -!- kritixilithos has quit (Quit: quit).
10:34:50 -!- wmww has joined.
10:42:00 -!- none30 has joined.
10:46:03 -!- tromp has quit (Remote host closed the connection).
10:47:57 -!- df1111 has joined.
11:01:30 -!- Sgeo has quit (Read error: Connection reset by peer).
11:06:28 -!- tromp has joined.
12:02:39 -!- imode has quit (Ping timeout: 272 seconds).
12:11:44 <b_jonas> zzo38: you can also upload public files to archive.org
13:38:57 -!- arseniiv has joined.
13:39:35 <arseniiv> fungot do you integrate by parts or you’d better divide and conquer?
13:39:36 <fungot> arseniiv: mr president, the commission communication on sustainable aquaculture, you may ask. in 1848, during the debate on the fayot report because it is difficult for me to respond other than in exceptional cases. the globalisation of democracy. science is obviously an issue, as in the countries of the european union
13:40:22 <fizzie> "Science is obviously an issue."
13:40:34 <fizzie> fungot: Are you one of those anti-science folks?
13:40:34 <fungot> fizzie: ladies and gentlemen, that i am not in a position to take action. mr stevenson also raises the problem of organized crime. this convention is accompanied by a deadline so that we can only welcome the fact that we are spending european money to solve their own problems on to europe.
13:41:01 <esowiki> [[Pewlang]] M https://esolangs.org/w/index.php?diff=80716&oldid=80715 * PythonshellDebugwindow * (+169) /* Examples */ cats
13:41:23 <fizzie> I think we can all appreciate the fact fungot is not in any sort of position to take action.
13:41:23 <fungot> fizzie: commissioner, you stated that responsibility for the tasks that will face those two countries.
13:42:55 <HackEso> 429) <fungot> fizzie: i, myself, will bring an end to all.
13:44:23 <esowiki> [[User:PythonshellDebugwindow]] M https://esolangs.org/w/index.php?diff=80717&oldid=80664 * PythonshellDebugwindow * (+46) /* Languages */ +[[Divrac]]
14:12:20 -!- tromp has quit (Remote host closed the connection).
14:27:46 -!- tromp has joined.
14:30:31 -!- ubq323 has joined.
14:36:21 <fungot> olsner: mr president, thanks in particular to mr john fnord and the continuation of the bombing of serbia can stop. the eldr resolution is quite clear that the game is over, to visit kaliningrad. i salute the progress which is well represented by the euro very quickly in setting up a european food authority
14:36:21 <HackEso> 10) <fungot> GregorR-L: i bet only you can prevent forest fires. basically, you know. \ 13) <fizzie after embedding some of his department research into fungot> Finally I have found some actually useful purpose for it. \ 14) <fungot> oerjan: are you a man, if there weren't evil in this kingdom to you! you shall find bekkler! executing program. please let me go... put me out! he's really a tricycle! pass him! \ 56) <fungot> i am sad ( of course
14:38:17 * olsner shall now idle for a few more years
14:44:21 -!- Arcorann has quit (Ping timeout: 264 seconds).
14:46:07 <b_jonas> fungot: that war has ended for good, thank god
14:46:08 <fungot> b_jonas: mr president, mr churchill, who i hope will soon be time for decisions to be monopolized by a small lite. one might even imagine cases where codecision or majority voting in environmental matters. this requires careful supervision, as we all know this is not exactly encouraging either and detracts from the fact that the replacement fats are safe? one final point on the jobs of 2 000 observers will be placed on an aspe
14:47:08 <b_jonas> that must be some linear logic thing
14:59:24 <esowiki> [[FROM HERE TO THERE]] M https://esolangs.org/w/index.php?diff=80718&oldid=78837 * PythonshellDebugwindow * (+40) /* Create a variable, initialize it to 7, set it to 19, increment it, then divide it by 4 and print its value */ Shorten titlie
15:14:21 -!- ubq323 has quit (Ping timeout: 264 seconds).
15:56:11 <fizzie> fungot: "Placed on an aspe"? Is that some sort of woodworking tool, like an adze?
15:56:11 <fungot> fizzie: i would like to reiterate my previous remarks demonstrate that to me seems obvious, but which wish to ignore the historical fact that one country has won three world wars. the issue is one of the following: ‘to draw up and maintain a list of themes, but it has been said by mr smith, i am concerned about the content of several of the speeches this morning and i have had this afternoon.
16:06:14 <fizzie> I thought we had only had two of those world wars so far.
16:11:13 <APic> Nah, the 3rd was the Cold War
16:11:18 <APic> Now we are inside World War IV
16:11:22 <APic> Also knwon as Information War
16:15:04 <fizzie> I heard something about World War Z, was that also one of them?
16:25:46 -!- tromp has quit (Remote host closed the connection).
16:31:53 -!- ais523 has joined.
16:32:11 <ais523> sleepsort is actually just another sorting algorithm in disguise, usually heapsort
16:32:21 <ais523> because you're basically just delegating the task of actually sorting onto the OS scheduler
16:32:49 <ais523> and heapsort isn't a horrible algorithm, although it is really confusing
16:33:15 <int-e> it just moves things around quite a bit
16:33:33 <int-e> and the memory access patterns aren't too cache-friendly
16:35:07 <int-e> oh, what you have is not the usual heapsort; you're just managing the data in a priority queue (usually a heap structure)
16:35:28 -!- kritixilithos has joined.
16:35:46 <ais523> "put everything into a priority queue and take it back out again" is heapsort by definition, assuming that the priority queue is implemented as a heap
16:35:59 <ais523> but, I agree that heapsort is more often done in-place
16:36:33 <int-e> to my mind the term "heapsort" is tied to a particular binary heap implemented in place
16:39:54 <ais523> also it's apparently possible to create a heap by inserting an array into it in O(n) time
16:40:04 <ais523> err, inserting an array into an empty heap
16:40:13 <ais523> this is faster than inserting the elements into it one at a time, which would be O(n log n)
16:40:32 <ais523> but, there's no way to speed up the "extract in sorted order" part of the algorithm, which is n log n regardless
16:41:44 <int-e> there's also the trick of moving everything up and then correcting the heap from below which saves comparisons on average... maybe Knuth analyses this? I forgot where I have this from.
16:42:38 <int-e> (it's n log n, but it had a smaller constant factor in the average case)
16:45:33 -!- tromp has joined.
16:50:43 <ais523> it crosses my mind that an intrusive heap might potentially be useful
16:51:09 <ais523> probably implemented as an array of pointers to elements, and a index in each element pointing back to the appropriate element of the array
16:51:34 <ais523> that way, the elements could automatically find themselves in the heap so that they could be deleted, move themselves when their keys were modified, etc.
17:21:45 -!- tromp has quit (Remote host closed the connection).
17:26:00 -!- scoofy has quit (Ping timeout: 246 seconds).
17:36:12 -!- metcalf has quit (Quit: metcalf).
17:36:28 -!- metcalf has joined.
17:39:21 -!- tromp has joined.
17:40:44 -!- metcalf has quit (Client Quit).
17:40:46 <hendursaga> Anyone know of any software using genetic programming or similar to reduce code size for golfing? I think I had one or two bookmarks but they got lost.
17:41:00 -!- metcalf has joined.
17:42:49 <hendursaga> I'm mostly thinking stack-based ones, because I tend to use too many dup's and swaps and whatnot
17:45:32 -!- kritixilithos has quit (Quit: quit).
18:08:36 -!- imode has joined.
18:21:31 <esowiki> [[User:Hakerh400]] M https://esolangs.org/w/index.php?diff=80719&oldid=80642 * Hakerh400 * (-81)
18:23:40 -!- TheLie has joined.
18:26:47 -!- tromp has quit (Remote host closed the connection).
18:26:52 <kmc> hendursaga: some of the superoptimization techniques could be applicable maybe? https://en.wikipedia.org/wiki/Superoptimization
18:28:04 <kmc> I don't have a link but I remember a paper about a system that would guess at new peephole optimizations, test them against a bunch of random inputs, and the ones that passed those tests were sent to a SAT solver for an exhaustive proof that they are meaning-preserving
18:28:59 <kmc> maybe you could do something similar
18:32:08 <fizzie> I think I remember someone giving a talk about something like that in the context of Ethereum smart contracts, where there's a very measurable cost in the "gas" that's needed to run them.
18:45:51 <kmc> ah, makes sense
18:45:55 <kmc> good application
18:49:12 -!- tromp has joined.
19:08:44 <hendursaga> Hmmm, I recall looking at a superoptimizer recently
19:12:06 <ais523> I once tried writing a superoptimizer by implementing a subset of x86 in z3 syntax
19:12:10 <ais523> but didn't get that far
19:12:59 <shachaf> ais523: https://github.com/zwegner/x86-sat seems neat.
19:14:33 <ais523> indeed, although that isn't quite the same as what I was working on
19:14:53 <ais523> I was attempting to optimise it for SAT search, so that an exists-forall search would find a sequence of instructions that implemented some specific algorithm
19:15:49 <ais523> that's probably why I didn't get very far
19:18:18 <ais523> http://nethack4.org/pastebin/z386.smt2 if you're interested
19:18:45 <ais523> (it doesn't have a good UI yet, or indeed any real UI)
19:20:33 <shachaf> Writing SMTLIB2 directly? Courageous.
19:22:11 <ais523> writing a generator may well have been harder
19:22:56 <shachaf> I wish SMT solvers standardized on an API rather than just S-expressions.
19:23:05 <int-e> `define-fun` makes this sane
19:23:06 <HackEso> define-fun`? No such file or directory
19:23:31 <ais523> the s-expressions *are* the API
19:23:33 <int-e> oh right, I keep forgetting to mask `
19:23:51 <shachaf> I mean an API that doesn't involve parsing.
19:23:59 <ais523> although, requiring the parser to be an exact subset of Common Lisp's was probably a mistake
19:24:05 <int-e> you only have to pretty-print it :P
19:24:09 <shachaf> It's probably less important when you give the solver one big instance rather than manny small instances.
19:24:33 <int-e> I don't disagree...
19:25:04 <int-e> ...but that z386.smt2 example is not the reason why a standard API would be helpful
19:25:47 <int-e> a better example would be, say, integrating smt solving into a compiler for eliminating bounds checks
19:26:04 <int-e> where you'll have many, often small problems
19:27:49 <ais523> does that actually require SMT solving in most cases?
19:28:06 <ais523> tracking integer ranges will get rid of most of them in a purely compositional way
19:28:23 <ais523> (also, in that case, you'd probably want the SMT solver to work on the compiler's IR directly)
19:32:10 <int-e> ais523: strength reduction (to reduce the number of variables) and simple interval/offset/stride amalysis will get you quite far I suppose.
19:32:38 <ais523> strength reduction doesn't reduce the number of variables, does it?
19:33:16 <ais523> anyway, I was also thinking about range analysis in another context: code which evaluates the exact expression that the user specified, without rounding intermediate results
19:33:24 <ais523> a sort of "intermediate results are bignums" way of writing languages
19:34:03 <ais523> this would remove entire categories of security vulnerabilities, generally make code much more readable, and with good range analysis might be no more inefficient than the original
19:34:40 <ais523> but, some programs keep their integers in range with an analysis of the form "this program won't be running long enough for this counter to overflow" and I think there should be some way to formalise that
19:35:09 <esowiki> [[Special:Log/newusers]] create * Eekee * New user account
19:36:37 <ais523> it'd be nice for a program to say "you're trying to make me operate on a 860MiB file, but this program will have an integer overflow if operating on a file larger than 750MiB so I won't even try", and for those bounds to be automatically calculated by the compiler
19:36:45 <ais523> rather than getting most of the way through and then failing
19:39:59 <esowiki> [[Esolang:Introduce yourself]] https://esolangs.org/w/index.php?diff=80720&oldid=80705 * Eekee * (+353) /* Introductions */ +eekee
19:48:48 -!- ubq323 has joined.
19:49:22 <esowiki> [[Esolang talk:Introduce yourself]] https://esolangs.org/w/index.php?diff=80721&oldid=73907 * Eekee * (+805) /* Y'all know some goof has split the introduction section into two, right? */ new section
19:50:25 -!- metcalf_ has joined.
19:50:44 -!- metcalf_ has quit (Client Quit).
19:51:10 -!- metcalf_ has joined.
19:52:45 -!- metcalf has quit (Ping timeout: 264 seconds).
19:52:45 -!- metcalf_ has changed nick to metcalf.
20:03:58 <b_jonas> ais523: I do something like that, but not intrusively: a binary heap implemented as a normal array, and a dictionary of the same elements keyed on a different key, and each element in the dictionary has the index of the corresponding element in the implementation of the heap, and I update the index every time I move elements in the heap (whether up or down). this way you can find an element in the heap
20:04:04 <b_jonas> any time and increase its weight to move it later. it helps with a traditional priority queue for timers too, where you often want to postpone timers or forget them entirely.
20:09:37 <b_jonas> "this program will have an integer overflow if operating on a file larger than 750MiB so I won't even try" => ah yes, that bug in ffmpeg. that was for reading 2 gigabytes though, a bit less arbitrary.
20:13:00 <b_jonas> 'some programs keep their integers in range with an analysis of the form "this program won't be running long enough for this counter to overflow"' => yes. the rule I have for that is, if you increment a counter by 1 each time only (as opposed to adding larger numbers, or incrementing many separate counters in parallel then tallying them into it), then it will never reach 2**62 plus its starting value.
20:13:06 <b_jonas> for example, a generation counter or a reference counter is safe to keep as 64 bit long. but of course you often want an optimization that lets them be just 32 bit long, and that requires either more difficult arguments, or overflow checks every time.
20:14:21 <b_jonas> this applies even if you sometimes save the counter to the disk and later retrieve it from there, like in a database generation counter, again as long as you don't merge multiple counters into it with addition.
20:14:27 <shachaf> I wonder whether binary heaps actually do better than B-trees in practice, for many of their applications.
20:16:30 <b_jonas> that said, you can usually make runtime overflow checks very cheap, if they almost never have to detect an overflow.
20:16:51 -!- fizzie has quit (Quit: Coyote finally caught me).
20:16:58 <b_jonas> then your program will give a cryptic error message on overflow, instead of undefined behavior
20:18:08 -!- Sgeo has joined.
20:19:43 -!- fizzie has joined.
20:21:52 -!- tromp has quit (Remote host closed the connection).
20:27:52 -!- ais523 has quit (Read error: Connection reset by peer).
20:30:08 -!- ais523 has joined.
20:32:28 <ais523> <shachaf> I wonder whether binary heaps actually do better than B-trees in practice, for many of their applications. ← there's such a thing as a B-heap (which is just a binary heap laid out differently in memory), that might be a fairer comparison
20:32:54 -!- tromp has joined.
20:33:29 <ais523> b_jonas: I agree that runtime overflow detection (e.g. Rust in debug mode) is preferable to undefined behaviour or silent wrapping
20:33:50 <ais523> but, I would prefer compile-time overflow detection, or ideally, the program just functioning as the programmer expected (the programmer's intent is normally clear)
20:35:26 <ais523> for example, starting calloc with «if (size * nmemb > SIZE_MAX) return NULL;» doesn't actually work (and good compilers will warn about this), but it is clear what the programmer means and it shouldn't be hard to teach a compiler how to generate code for it
20:35:45 -!- ais523 has quit (Read error: Connection reset by peer).
20:36:02 -!- ais523 has joined.
20:36:25 <ais523> and at least x86-64 has a perfectly usable 64-to-128-bit multiply instruction
20:36:50 <ais523> s/.*/(note that gcc has an intrinsic for "multiply and check for overflow", &)/
20:36:58 <ais523> (one of my comments got eaten in the connection reset)
20:37:35 <ais523> hmm, should that be & (sed-style) or $& (perl-style)? normally the exact regex syntax doesn't matter for these s/// things, but this is a case where it does
20:45:42 <int-e> but this may well amount to asking whether someone is a Perl programmer or not
20:47:21 -!- TheLie has quit (Remote host closed the connection).
20:48:37 -!- sprock has joined.
20:50:20 -!- imode has quit (Ping timeout: 272 seconds).
21:00:32 -!- imode has joined.
21:11:19 <ais523> shachaf: we were discussing alignment a while ago; I've been doing some benchmarking of various memory writing techniques recently (on a recent Intel processor), so decided to try misaligned writes
21:12:02 <ais523> they seem to be slower than aligned writes when the written data is in L1 or L2 cache, but no slower when the cache line to write has to be fetched from L3 cache or beyond
21:12:46 <ais523> (I'm guessing that the processor can compensate for the misalignment faster than it can evict a line from L2 cache)
21:13:18 <ais523> also, for aligned writes, it doesn't matter whether you use a misalignment-capable writing instruction or not, the processor notices that the address is aligned
21:13:18 <int-e> is this for misaligned access within a single cache line?
21:13:57 <ais523> int-e: I tested using misaligned accesses that collectively cover memory, so some within a cache line and some crossing cache lines, but to consecutive addresses
21:15:01 <ais523> this is a Whiskey Lake microarchitecture, it seems (I had to look it up, there are way too many different naming schemes for Intel processors)
21:15:02 -!- craigo has quit (Ping timeout: 272 seconds).
21:15:06 <int-e> well, it seems interesting to check whether the slowdown is just due to touching more than one cache line
21:15:39 <int-e> though that interferes with linear memory access so hrm, tricky
21:15:40 <ais523> I'm not sure there's much use for misaligned writes that are guaranteed to fit within a single cache line
21:15:51 <ais523> and I'm not sure how you'd write the test, either
21:16:29 <ais523> the /other/ issue with misaligned writes is that you can't bypass the caches with them and write directly to main memory (there isn't an instruction in the instruction set for it, vmovdqnt and friends work on aligned memory only)
21:16:32 <int-e> well, access words stuff at a... 64 byte? stride
21:16:40 <int-e> and vary the offset?
21:16:57 <ais523> I guess I can imagine a packed structure that's 64 bytes long but where the internal fields are misaligned
21:17:43 <int-e> this question isn't really motivated by practical considerations
21:17:58 <ais523> right, this is #esoteric after all
21:18:16 <int-e> it's more about having a model for the slowdown, possibly even an explanation
21:18:27 <ais523> actually, another interesting test would be to have a loop that writes only half of each cache line
21:18:37 <ais523> and see whether it's faster, slower, or the same speed as a loop that writes all of it
21:18:46 <ais523> (but you could do that with both aligned and misaligned data)
21:19:16 <int-e> hmmmm. expetations... it should actually be slower?
21:20:00 <ais523> I think it might depend on the number of reads of other cache lines you're doing in parallel
21:20:19 <int-e> if you write a whole cache line you don't have to wait for it to be fetched from higher up the memory hierarchy... or is that down?
21:20:34 <ais523> I'm not sure which direction in memor is which
21:21:09 <ais523> but, it wouldn't surprise me if processors couldn't map a line into cache without fetching it from the next-larger cache
21:22:18 <int-e> I agree it's an interesting experiment.
21:22:20 <shachaf> ais523: I saw some discussion of this on Twitter recently. Let me see if I can find it.
21:22:34 <ais523> prefetchw, which prepares a line for writing, is documented as reading the line into cache and taking an exclusive lock on it
21:23:17 <ais523> ooh! that's probably why vmovdqnt seems to be almost exactly twice as fast as vmovdqa for writing main memory
21:23:22 <shachaf> I think it was this thread: https://twitter.com/trav_downs/status/1358288656188854272
21:23:30 <shachaf> Though that's talking about AMD, and I think there was something about Intel too.
21:23:55 <ais523> I bet vmovdqa has to read the line and then write it, whereas vmovdqnt doesn't have to read it at all
21:24:13 <ais523> although I don't bet very much, because I'm not that sure
21:28:13 <ais523> oh, another thing I discovered is that the number of bytes you write at a time speeds up accesses to L1 and L2 but not beyond
21:28:23 <ais523> (actually I only tested L2, but with L1 it's kind-of obvious)
21:29:10 <ais523> this isn't surprising, the L2/L3 bottleneck is one of the most common limiting factors on a program's speed
21:33:45 <shachaf> By the way, did you know there's a simple trick for making a queue that support insert, remove, and monoidal product in amortized constant time?
21:34:03 <shachaf> It's a variant of the two-stack queue trick.
21:34:13 <ais523> how is the product defined?
21:34:21 <ais523> appending one queue to another?
21:35:02 <shachaf> I mean product of the contents, sorry.
21:35:29 <ais523> I'm still not sure I grasp what the definition is
21:36:15 <shachaf> For example you can ask for the maximum of the elements currently in the queue.
21:36:30 <ais523> ah, basically a fold over the entire queue
21:37:19 <ais523> can't you just implement that by draining the entire queue and calculating the product as you go, amortizing it against the time spent to insert?
21:37:38 <ais523> or, hmm, is this nondestructive?
21:37:47 <ais523> in that case I can see how it might be doable but it's nontrivial
21:38:16 <shachaf> Oh, you can ask for the product without removing elements, if that's what you mean.
21:38:32 <shachaf> For example you can use this to calculate sliding window products over an input list.
21:40:08 <ais523> I'm guessing it involves some sort of cache of partial results for sections of the queue; alongside each element, you have a number of elements later than it for which the monoidal product has been calculated (maybe 0), and the value of the product over that many elements
21:40:51 <ais523> I'm not 100% sure this algorithm works, but if it's wrong it sounds fixable
21:40:52 <int-e> . o O ( data FingerQueue a = Empty | One a | Nest a (FingerQueue (a,a)) a )
21:40:56 <shachaf> Hmm, I suppose it does have that property, but I think it's much simpler than what you're thinking.
21:41:23 <int-e> oh, wait, you need digits too
21:41:37 <shachaf> The trick is like a two-stack queue, except the out-stack contains running products instead of values.
21:42:01 <shachaf> So you might store ([abc,bc,c],[f,e,d],def)
21:42:26 <int-e> . o O ( data FingerQueue a = Empty | OneTwo (Digit a) | Nest (Digit a) (FingerQueue (a,a)) (Digit a); data Digit a = One a | Two a a
21:42:36 <shachaf> (The "def" is the product of the entire in-stack.)
21:42:49 <shachaf> When the out-stack is empty, you compute running products of the in-stack into it.
21:42:51 <ais523> ah, I see, the product of the in-stack is calculated twice
21:43:00 <ais523> but that doesn't affect the constant-time nature
21:43:06 <ais523> because it's calculated exactly twice and 2 is a constant
21:43:42 <shachaf> You can also represent it with a regular mutable queue with a separator between products and values: [abc,bc,c|d,e,f] def
21:44:18 <ais523> this seems like a useful data structure
21:44:28 <ais523> although I'm not immediately clear on what you'd use it for
21:45:05 <ais523> I wonder if there's a way to make it work for random access (think a database table where you can add and remove rows in any order, and always want to know the maximum value of a particular column)
21:45:05 <shachaf> The original context I wanted it for was computing maximums of windows of an array.
21:45:29 <shachaf> Though for that case there's a more specialized algorithm which I think is more efficient.
21:51:34 <b_jonas> "also, for aligned writes, it doesn't matter whether you use a misalignment-capable writing instruction or not, the processor notices that the address is aligned" => yes, the manual explicitly says that there's no performance difference anymore between the aligned and non-aligned instruction on newer cpus, nor between the integer vector bitwise and floating point vector bitwise instructions, they're
21:51:40 <b_jonas> just separate because they had different performance on older cpus
21:53:54 <b_jonas> and also that misaligned access only has performance penalties if it's from different cache lines (or possibly if you're trying to write then very quickly read from an overlapping address, but that's not really an alignment trouble, it's just missing the optimization where quickly reading the same thing that you wrote can bypass even the L1 cache)
21:54:01 <ais523> hmm, following multiple links from shachaf's linked twitter thread, I found a Microsoft blog post that uses Intel syntax and the "movabs" instruction
21:54:01 <ais523> which leads me to think that x86 assembler syntax actually isn't properly standardised at all
21:54:01 <ais523> (both AMD and Intel call the instruction in question "mov", I think "movabs" is a gcc/gas thing)
21:54:24 <b_jonas> "this is a Whiskey Lake microarchitecture" => what ... is hat a real architecture name? they have the weirdest names these days
21:55:07 <ais523> there's other weirdness, like "xorps / movdqu" as a method of zeroing memory (I believe that both AMD and Intel have processors on which xorps / movups would be equivalent and faster with no downsides)
21:55:40 <ais523> xorps is one of those weird exceptions
21:56:05 <ais523> which is treated as an int instruction by some processors and a float instruction by others
21:56:23 <ais523> the whole movdqu/movups distinction is one of the weirdest bits of x86 as it is
21:57:32 <ais523> b_jonas: I think even some very recent processors have a latency (but not throughput) penalty if you access the same register with both an int instruction and a float instruction in consecutive cycles
21:57:56 <ais523> the problem being, they don't agree on whether xorps is an int or a float instruction, so there's no way to get a guaranteed match between it and another instruction
21:59:57 <ais523> you can work around the issue by using pxor, but that's one byte longer if targeting an xmm register with a source for which any registers involved are numbered in the range 0-7
22:00:39 <b_jonas> "<ais523> I'm not sure there's much use for misaligned writes that are guaranteed to fit within a single cache line" => there used to be one somewhat rare but important use, for when you want to shift a vector register by a variable number of bytes, and to do that, you do a maybe-unaligned read to get an index vector, then use that index vector with the PBLENDB or newer shuffle instruction. maybe the
22:00:43 <ais523> I think pxor and vpxor actually have different rules for this, just to make things even more complex
22:00:45 <b_jonas> specific use case is obsolete with some later instructions, but I think something similar still exists.
22:01:16 <ais523> that uses a misaligned *read*
22:01:20 <ais523> and it is still useful I think
22:01:35 <ais523> but writes are less obviously useful
22:01:37 <b_jonas> or maybe it's not for shifting bytes, where you don't need this too much, but something similar.
22:01:53 <b_jonas> ok, sorry, that's stupid, it's probably not actually a use case, ignore it
22:02:01 <ais523> (especially because the normal advice for small copies where the relative distance between source and destination is misaligned is "read misaligned, write aligned")
22:02:53 <ais523> I've been working on a program over the last few days where I have an infinitely repeating PSHUFB pattern but the repeat length is an odd number
22:03:09 <ais523> so I have the choice of making 32 copies of it so that I can read it aligned at any position, or of reading misaligned
22:03:34 <b_jonas> "<int-e> it's more about having a model for the slowdown, possibly even an explanation" => the L1 cache caches memory in 64-byte chunks called cache lines. if you're writing misaligned data, the L1 cache has to work harder because it has to update two of these, and also possibly communicate more with the L2 cache if the lines aren't yet in the L1 cache. isn't that enough of a mental model?
22:04:18 <ais523> b_jonas: I think int-e thinks that that's a possible explanation of the slowdown but doesn't know whether it's the *correct* model or not, and wants to find some way to establish whether or not it is
22:04:46 <ais523> I think it's been established that *page*-misaligned memory accesses are slow because you need to access the TLB twice, but that's a very rare case
22:05:46 <int-e> I'm off for tonight though, have fun!
22:07:37 <b_jonas> "<shachaf> By the way, did you know there's a simple trick for making a queue that support insert, remove, and monoidal product in amortized constant time?" => there's a heavyweight trick for that: use a finger tree, but in each node, cache the fold of the leaves descending from it. there's probably a much better way if you want just a queue. This is more useful for the general case, when you want a
22:07:43 <b_jonas> sequence that you can cut and paste anywhere and want to be able to get the product of keys in any subsequence.
22:08:27 <b_jonas> or similarly for an ordered keyed dictionary where you want to be able to compute the fold of a weight (secondary key) field for any range
22:08:53 <b_jonas> a structure that you can insert to, remove from, and answers questions like "what is the sum of weights of items with key between x and y"
22:09:28 <b_jonas> but it's possible that this doesn't actually give you amortized constant time, only amortized logarithmic time
22:11:02 <ais523> b_jonas: I was trying to work out if it's always possible to keep the tree balanced in amortized constant time, but it is, you can use any of the normal tricks for self-balancing trees and just recalculate the cache value of the modified nodes at every tree rotation
22:12:02 <shachaf> Hmm. I think you can keep B-trees balanced in amortized constant time, but not if you have folds in the internal nodes.
22:12:12 <ais523> of course, for invertible structures like sums, there are often much cheaper algorithms (e.g. if you want to know the sum of elements in a queue, just track the sum and add to it on push, subtract from it on shift)
22:12:14 <shachaf> Because then you need to ascend up the spine on every mutation.
22:12:30 <shachaf> Yes, invertibility makes it much easier.
22:13:19 <b_jonas> "<shachaf> The original context I wanted it for was computing maximums of windows of an array." => ah yes, that might be related to one of my favorite algorithmic problems, one I've already mentioned here: you get a list of machine word sized nonnegative integers, find the infix in it to maximize the product of the length of that infix and the minimum of its values
22:14:05 <ais523> hmm, I wonder if it's possible to make a sort of heap where the heap property is updated only lazily, and if this might be amortized-faster than more regular implementations
22:14:58 <shachaf> Which thing could be faster?
22:15:03 <b_jonas> "<ais523> b_jonas: I think even some very recent processors have a latency (but not throughput) penalty if you access the same register with both an int instruction and a float instruction in consecutive cycles" => even if they're the same vector size?
22:16:19 <ais523> b_jonas: both companies have seen that behaviour at various points in time, I forget which one it is which has it on recent processors, let me look it up
22:17:16 <b_jonas> also, if you're trying to use xorps for zeroing a register, I think all cpus except very old ones have optimizations specifically for zeroing registers, both general and vector, and these might make the ordinary problem of integer vs float instructions moot. but maybe that's not what you want the xorps for.
22:18:48 <ais523> apparently Intel Skylake+ has some such delays, but only for rare combinations like integer logical operation → floating-point multiply, they don't trigger on things like moves; AMD Zen 3 has a 1-cycle delay for any sort of mismatch
22:19:01 <b_jonas> ais523: oh, you want specifically misaligned *write* guaranteed within a cache line only. yeah, I don't know a good case for that.
22:19:07 <ais523> so that's the most recent architecture family for both Intel and AMD
22:20:49 <b_jonas> "<shachaf> Because then you need to ascend up the spine on every mutation." => no, not if you use a balanced tree optimized for persistent (no mutating in place) access
22:21:09 <b_jonas> and if it's a queue, inserting only at the ends, that can help
22:22:15 <b_jonas> ais523: re integer and float vector penalties, I see.
22:22:32 <b_jonas> and AMD Zen isn't the low power notebook family, right?
22:22:47 <b_jonas> if it's any sort of mismatch, that is serious
22:23:11 <b_jonas> if it's just certain floating point multiplies, that sounds less of a problem
22:25:22 <ais523> Zen is the super-high performance family
22:26:05 <b_jonas> I learned something new then
22:26:10 <ais523> but it's a latency penalty, not throughput, so I'm guessing that the issue is that there are integer vectors and floating point vectors on different physical parts of the chip
22:26:35 <ais523> an xor is cheap enough that having two separate xor units is probably cheaper than finding the wires to quickly move data from one end of the chip to the other
22:27:14 <b_jonas> yes, but a latency penalty often translates to a throughput penalty, because if you try to solve it, you run into a decoder bottleneck, even on AMD
22:27:19 <ais523> all this said, I'm rather confused at why all these floating-point logical operations exist at all, xor'ing floats is not the sort of thing that's particularly mathematically meaningful
22:27:32 <b_jonas> ais523: they exist for conditionals mostly
22:27:44 <b_jonas> also for negation and for ldexp and frexp
22:27:53 <ais523> using the all-1s "floating point" value as a boolean true
22:27:54 <b_jonas> negation isn't too important, you can use subtraction for that
22:28:10 <b_jonas> but ldexp and frexp and safe conversion to and from integers does have to involve them
22:28:12 <ais523> is all-1s +infinity, or is it a NaN?
22:28:14 <b_jonas> but mostly it's comparisons
22:28:32 <b_jonas> the floating point comparison instructions before AVX512 give floating point result in the performance sense
22:28:44 <ais523> b_jonas: compilers compile -x on floating point x to an xor with a sign-bit constant
22:29:03 <ais523> subtraction from 0 actually gives different results (related to negative versus positive zero and the sign bit on NaNs)
22:29:09 <b_jonas> ais523: yes, you have to subtract from -0
22:29:21 <b_jonas> because -0 is the additive unit of floating point, not 0
22:29:24 <ais523> that still has a different effect on the sign bit of NaNs
22:29:51 <ais523> also, subtraction from -0 is no more efficient than xor with the sign-bit number
22:29:59 <b_jonas> it has a different effect on NaNs, but compilers in general don't give good enough tools to handle NaNs deterministically:
22:30:07 <ais523> actually… is the sign bit number -0 when interpreted as a float?
22:30:29 <b_jonas> you can't force a compiler to not swap the operands of a floating point addition without inline assembly
22:30:32 <ais523> `! c printf("%g", 0x8000000000000000L);
22:30:47 <ais523> that looks suspiciously like a negative zero to me
22:30:58 <ais523> which would make sense, as it's just 0 with the sign bit flipped
22:31:11 <ais523> so I guess the question is "xor with -0 or subtract from -0?"
22:31:26 <ais523> floating point xor and subtraction are equally fast, xor might use less power though?
22:31:56 <b_jonas> ais523: my guess is that compilers (usually) compile negation that way because it's less likely that you run out of execution units that do floating point addition, which can in rare cases be a bottleneck in numeric code, like in matrix multiplication AI nonsense, if it's hyperoptimized, and otherwise the difference doesn't matter
22:32:34 <ais523> oh, that might be the case on old processors
22:32:44 <b_jonas> no, I mean on recent Intel cpus
22:32:51 <b_jonas> it's a difference that doesn't come up often, I admit
22:32:53 <ais523> I think in modern processors, floating point adds can be done on any vector unit, and floating point xors can't be done on a scalar unit
22:33:01 <b_jonas> but I don't see a reason to use subtraction instead of xor
22:33:34 <b_jonas> if you do not care about negative zero, then subtraction may be faster if you have register pressure, because loading 0 is faster. but usually you care about negative zero.
22:34:15 <ais523> on Coffee Lake, xorps and pxor both run on 0/1/5 (the three main ALUs), addps and addss only run on 0/1
22:34:32 <ais523> OK, so you're right, there is one vector unit that can handle floating point logic but not floating point add
22:34:42 <b_jonas> "<ais523> I think in modern processors, floating point adds can be done on any vector unit, and floating point xors can't be done on a scalar unit" => that sounds like you're either talking about a newer microarchitecture or another brand than the ones I'm the most familiar with
22:34:57 <ais523> the other ALU is 6, which can do XORs but probably isn't capable of reading vector registers
22:35:04 <ais523> b_jonas: no, I was just mistaken
22:35:40 <b_jonas> ais523: wait, is this for a vector negate, or for negating a single float?
22:35:43 <ais523> for a comparison, paddq runs on 0/1/5
22:36:50 <ais523> but it hardly matters, assuming you're using SSE/AVX as the FPU (which is the only sane default on x86-64), single-float and vector-float instructions are encoded and executed almost identically (the only difference is a single bit in the instruction encodings)
22:37:16 <ais523> also I think the vector instructions are actually faster in cases where the destination is a different register from the sources
22:37:20 <b_jonas> ais523: 32 bit or 64 bit? doesn't 64 bit float subtract have a longer instruction encoding by one byte?
22:37:32 <b_jonas> whereas for the xorps it doesn't matter
22:37:36 <ais523> b_jonas: oh right, there's the SSE special case
22:37:40 <b_jonas> because you can use xorps for 64-bit
22:37:47 -!- TheLie has joined.
22:37:53 <ais523> where …ps instructions are 1 byte shorter
22:37:57 <b_jonas> I don't really remember all the SSE2 encodings
22:38:34 <ais523> basically, the way to think about it is: you have two encoding families for vector instructions, SSE and VEX
22:39:09 <b_jonas> all I remember is that there's a rare special case when using MMX registers as temporaries when you run out of general purpose registers in a tight loop can be worth just because MMX has shorter encodings than SSE2
22:39:18 <ais523> the only functionality difference is that the VEX encodings have a V at the start of the instruction name, and zero the high 128 bits of a ymm register if you write the low 128 bits; the SSE instructions have no V and will leave the high 128 bits unchanged
22:39:50 <ais523> both encoding families have a special case where they're one byte shorter
22:40:00 <ais523> SSE is one byte shorter for …ps instructions, which is a simple enough rule to remember
22:40:02 <b_jonas> ais523: yes, I understand that part, but I mean the pre-AVX MMX/SSE/SSE2/SSSE3/SSE4.1/SSE4/2 instructions have a lot of encoding detail
22:40:22 <b_jonas> and I don't remember when the shortcuts apply, as in when you need fewer prefixes
22:41:13 <b_jonas> ais523: although while we're there, you say that modern AMD has latency penalties for mixing integer and floating point vector instrs. does it also have penalties for mixing 32-bit and 64-bit floating point instructions, including for the bitwise?
22:41:22 <ais523> also, instructions are allocated to various "maps"; map 1 instructions are 1 byte shorter in SSE
22:41:31 <ais523> whereas they're one byte shorter in VEX only if none of the input arguments access a register numbered 8-15 inclusive (it's OK if the output does)
22:42:44 <ais523> b_jonas: apparently not, you can type-pun float as double and back for free
22:43:38 <ais523> that's another reason I think it's likely that the issue is "this register is stored physically near the floating-point units" rather than anything to do with caching extra metadata about floats
22:45:56 <ais523> (also, for anyone reading this who doesn't know, it's worth remembering that registers named in machine code instructions generally no longer have any real relationship to physical registers on the processor, they're just a convenient way of telling the processor which instructions are meant to give their output to which other instructions)
22:46:13 <b_jonas> ais523: sure, but you could have 32-bit and 64-bit float units physically distant too, just as the integer vs float unints
22:46:54 <b_jonas> or are all the optimized-for-marketing fma units so important that that's not worth?
22:47:51 <ais523> fma is mildly useful, I'm not sure how heavily it drove into processor marketing
22:48:09 <ais523> although, it is an operation that is *very* hard to do efficiently without processor support, so the people who needed it probably really needed it
22:48:34 <ais523> (and it also made Intel rearchitecture significant parts of their processor internals, so they probably wouldn't have done it unless there was heavy demand)
22:49:17 <ais523> until FMA was added, Intel processors couldn't take more than two inputs to any internal instruction, and there were horrible things going on micro-architecturally to avoid having to do that
22:49:30 <ais523> but there's pretty much no way around taking three inputs to an FMA instruction
22:49:32 -!- tromp has quit (Remote host closed the connection).
22:49:43 <ais523> I guess if Intel are going to go to that effort, they're going to market it heavily
22:51:29 <b_jonas> it's certainly useful to optimize fma so that the CPU can do matrix multiplications quickly. but Intel pushed that a bit too far, pushing benchmarks for matrix multiplications or similar used for fashionable AI nonsense, apparently optimizing for the benchmarks of that one thing more than the large variety of tasks that a CPU should handle. that's when you got a cpu where you run out of execution units
22:51:35 <b_jonas> with floating point add faster than with fma. that was a few years ago admittedly.
22:53:04 <ais523> looks like FMA runs on 0/1 nowadays
22:53:29 <ais523> the same as floating point add, and floating point multiply, individually
22:55:58 <b_jonas> also some people claim that they're optimizing for marketing "number of instructions" the way AVX512 was defined, in a way that adds a bit too much historical load that they have to support in all future CPUs. I don't really buy that. I think it only seems like that partly because they're adding a larger variation of instructions that are *easy* to implement, which seems redundant but actually helps
22:56:04 <b_jonas> because of how often the decoder and neighbors is the bottleneck, and partly because of the AVX512 mask registers, which admittedly seems a weird design and possibly not the best idea, but might turn out to be good after all for all I know.
22:56:35 <b_jonas> "<ais523> looks like FMA runs on 0/1 nowadays / the same as floating point add, and floating point multiply, individually" => yes, what I mentioned was pre-AVX512
22:57:31 <ais523> this is a non-AVX-512 processor (although after it was invented)
22:57:52 <ais523> AVX-512 looks like something that Intel doesn't want to have as a "default" feature in consumer processors, at least not yet
22:58:18 <ais523> and actually, at some point increasing the size of the vector units isn't going to help much because the bottleneck will be memory/cache speed and not compute speed
23:00:24 <b_jonas> but, just like AVX and AVX2, AVX512 also adds new instructions or instruction modes on shorter vectors, most notably the additional vector registers and the mask registers, not only wider vectors
23:00:34 <b_jonas> it's not only about the width
23:02:02 <b_jonas> that's why it is worth to have low-power cpus that support the AVX2 instruction set but only 16 byte execution units and decoding the 32 byte wide instructions to run on two execution units, and similarly it can be worth to have AVX512 instructions that only have 32 byte wide execution units
23:02:23 <b_jonas> well, partly that's why. the other part is decoding.
23:03:19 <ais523> I think many modern procesors turn off half the vector registers most of the time to save power
23:03:42 <ais523> and will run 32-byte instructions by passing them through the low half of the vector units twice while the high half is busy turning on
23:20:21 -!- arseniiv has quit (Ping timeout: 264 seconds).
23:31:55 -!- delta23 has joined.
23:43:44 -!- tromp has joined.
23:46:51 -!- Arcorann has joined.
23:48:33 -!- tromp has quit (Ping timeout: 264 seconds).