←2021-09-07 2021-09-08 2021-09-09→ ↑2021 ↑all
00:03:06 -!- delta23 has quit (Quit: Leaving).
00:05:04 <esolangs> [[Mogus]] M https://esolangs.org/w/index.php?diff=87930&oldid=87578 * Oshaboy * (+27) Added Unimplemented Tag
00:39:28 -!- sprock has joined.
01:02:09 <nakilon> this 2015 article https://esolangs.org/wiki/Fungeoid says that it's kind of fungeoid's goal to have side-effects
01:05:04 <nakilon> isn't funge-98 stack of stacks going the opposite direction?
01:05:42 <nakilon> also threading with a separate stack
01:06:25 <nakilon> and my idea of having nested isolated funge spaces
01:07:17 <nakilon> also I don't agree with the "Goal" section
01:07:25 <nakilon> unless I'm missing something
01:14:08 <ais523> one (maybe the only/) motivating idea of the original Befunge was to be as difficult as possible to compile
01:14:44 <nakilon> with having a runtime allowing to split code into isolated toroids and giving them own stack (for example, it pops the N number of items to take from the top of the stack into a new zeroed one and then returns M from the new one to the parent)
01:14:52 <ais523> things like the stack stack (which was added later) seem like they would make the code easier to analyse, and thus maybe to compile, but you can use it for obfuscation in addition to using it for purity
01:15:31 <nakilon> it would be possible to build a repository of common functions that won't have stack collisions with each other and would be used as building blocks easily
01:15:58 <ais523> I guess there's two points of view, the point of view of a programmer who is trying to make their code easy to understand and is using the language features for that purpose
01:16:25 <ais523> and the point of view of a compiler, or a programmer who is intentionally making their code hard to compile, where the features might be used in unusual ways to make the code hard to understand
01:16:45 <nakilon> to necessary easy to understand but easy to build
01:17:06 <nakilon> I'm ok with write-only code as long as it does what it have to
01:17:45 <ais523> I think write-only code only really works in environments where you know you'll never have to read it again
01:18:22 <ais523> one-off scripts, for example (and even then I try to do things like use meaningful variable names, just in case they somehow end up not being one-off after all)
01:26:56 <nakilon> "I disagreed, saying that there are some languages out there where and interpreter is easier to write than a compiler."
01:27:04 <nakilon> https://github.com/catseye/Befunge-93/blob/master/historic/bef-1.0rc1/bef.doc
01:28:37 <nakilon> I guess it was a random fact about befunge -- that it had features hard to compile, but it wasn't the goal; the goal was: "It may go forward, or backward, or even LEFT OR RIGHT. : for fun
01:29:25 <ais523> hmm, so it looks like the motivating feature was to be harder to compile than interpret, not necessarily to be impossible to compile
01:29:32 <nakilon> also this https://github.com/catseye/Befunge-93/blob/8fe4065c0415b6f6fa6f699798fa9b64737aadc1/historic/bef-1.0rc1/bef.doc#L27 tells that the self-modification was much earlier than the debate about compilers with his friend
01:29:44 <ais523> there are some programming languages which fulfil this requirement by being trivially easy to interpret
01:30:06 <ais523> but there were other motivations too
01:31:55 <nakilon> I would call the complexity to compile befunge a goal if there was that debate and then he comes up with things made on purpose to prove his debate point; but it was the opposite -- he already had an idea of self-modification and then the debate just motivated him to release something
01:34:24 <Corbin> Interestingly, Futamura conclusively showed that compilers arise from specialized interpreters, and their work implies that an interpreter is always easier than a compiler.
01:34:58 <ais523> I do think that interpreters are generally easier, except in the case where a naive transpilation works
01:35:21 <ais523> (e.g. compiling BF by string replacement of "+" into "(*ptr)++", ">" into "++ptr", etc.)
01:35:45 <ais523> err, I'm using my post- and pre-increments inconsistently, I guess it doesn't matter in this situation but it still looks bad
01:35:46 <nakilon> btw I don't see a problem compiling befunge if you create a 80x25 of pointers to void functions that would occasionally overwrite other pointers to point at the "library of function" that is a list of possible instructions
01:36:44 <ais523> this depends on what you consider a compiler to be
01:36:55 <ais523> you can compile any code by hardcoding the program into an interpreter
01:37:04 <Corbin> ais523: But note that, in that case, we can kind of reverse-engineer ourselves a corresponding interpreter which is even "easier" in the sense that it can delegate even more work to the C compiler.
01:37:06 <ais523> but some people consider that cheating and not really a compiler
01:37:48 <ais523> Corbin: well, running gcc is probably harder than writing the code you want it to compile into a file
01:37:52 <ais523> but, both are pretty easy
01:38:26 <Corbin> ais523: I'm thinking specifically of what's delegated to the toolchain, because Futamura's point was that a good specializer can turn any interpreter into a good compiler.
01:38:46 <Corbin> Hm, not the best way to put it. The specializer and interpreter both contribute to the quality of the compiler?
01:38:57 <ais523> the specializer would have to be /incredibly/ good
01:39:21 <ais523> given that there are optimizations that interpreters can't really do without getting halfway to being a compiler anyway, so given a naive interpreter, those optimizations would have to be done in the specializer
01:39:33 <ais523> and, I know that's the point of specializers, but it's hard to imagine one that's *that* good
01:39:51 <Corbin> Yeah. I use RPython, for example, and there's a lot of work put into making those interpreters "easy" to write.
01:41:33 <ais523> I guess you can apply the same argument to a specializer-virtual machine combination
01:41:39 <ais523> which should in theory give you an optimizing VM
01:41:44 <Corbin> (RPython automatically generates a JIT compiler for a given interpreter. Backend configuration, instruction selection, memory management, etc. are all done with automatic codegen. It takes like 20min to generate a JIT but it's worth the wait.)
01:42:35 <Corbin> Yeah. I think that this is an interesting antipattern in language design: We often pair a low-level intermediate language with a high-level user-friendly language, and require interpreters to compile from the latter to the former.
01:42:37 <ais523> that's interesting
01:42:44 <ais523> how are the interpreters specified?
01:43:29 <ais523> also, I'm not sure that's an antipattern, unless you're suggesting that the interpreters should work with high-level code more directly
01:43:40 <Corbin> RPython interpreters are given in plain Python 2.7, using their support library for libc calls and FFI. The translator interprets the Python and then disassembles it in-memory, so codegen can be done in Python.
01:44:07 <ais523> I was wondering about that (although being locked to Python 2 is something of a problem nowadays)
01:44:40 <Corbin> I just think that it's signing folks up for extra work. Extra context, usually. Desugaring is simpler than an intermediate language.
01:44:48 <nakilon> I think the measure of cheating can be calculated; my compiler idea above results in that the compiled programs would be different only in a place where the starting befunge code was injected as a 80x25 table, and the rest would be the same "interpreter"; while the "true compiler" that you might want to imagine is something that produces absolutely
01:44:48 <nakilon> different program for every different input source code; so the amount of "variation" can be benchmarked on compiling different inputs and measuring the diff
01:46:09 <ais523> Corbin: I agree with making the low-level intermediate language a subset of the high-level language, rather than necessarily a separate language
01:46:32 <ais523> but there are some cases where you want to relax restrictions in the downcompile
01:47:03 <nakilon> but then, there are compilers that produce 1mb binary even for hello world -- all the hello worlds will have the same 99% of "interpreter" that is some static library
01:47:07 <ais523> a common example is when the IR has goto statements, but the high-level language you're compiling from doesn't
01:47:34 <Corbin> ais523: Sure. I guess I'm saying that, in those situations, the IR should be explicit and narrow-waisted, rather than a required glue language. Like, Smalltalks put a tiny but real burden on implementations when they require bytecode compilers, e.g. in Python.
01:47:39 <ais523> nakilon: right
01:47:55 <ais523> in fact, ld-linux.so is even described as an "interpreter" in its documentation, and that of the Linux executable format
01:48:11 <Corbin> LLVM IR, GCC's GIMPLE, QBE, and libfirm's IR; they should all really commit to text formats and narrow-waist tools. This is one area where WebAssembly has done well.
01:48:33 <ais523> so pretty much all binaries on Linux are being run through an interpreter (although what it actually does is just interpret the code for loading, load the code, then jump to it and run it directly)
01:50:47 <keegan> what is a "narrow-waist tool"
01:52:04 <keegan> (I spent so many years being confused about what "waist" means before I realized the garment waist and the anatomical waist can be totally different!)
01:52:12 <zzo38> I want to ask the same question, too
01:52:19 -!- ais523 has quit (Quit: sorry about my connection).
01:52:27 <Corbin> keegan: A narrow-waisted design is a pattern which is meant to counter the Expression Problem by requiring either the input or output of every tool in a toolchain to be a single unified format.
01:52:34 -!- ais523 has joined.
01:52:52 <ais523> one thing that I do think is an antipattern is where important metadata, that the compiler knows, gets removed when going down to lower levels of abstraction
01:53:20 <keegan> Corbin: oh, like netpnm?
01:53:27 <ais523> one example is that it would be very useful to have a way to say "the value in this register isn't needed any more" right down to at least the asm level
01:53:27 <Corbin> e.g. pandoc can handle many different documentation formats. Normally this would evoke the Expression Problem and require a quadratic amount of code, but instead it's a linear amount of code because pandoc's IR acts as a narrow waist.
01:53:35 * keegan nods
01:53:46 <keegan> where you convert anything-to-pnm or pnm-to-anything or manipulate pnm's
01:53:51 <ais523> it even sometimes helps performance at the *machine code* level, despite taking up bytes, because this is information that the processor cares about too
01:53:59 <Corbin> keegan: Yeah. Uncompressed containers are good candidates for narrow waists.
01:54:12 <keegan> or how many programming languages adopt UTF-8 as the one true internal string representation
01:54:18 <keegan> with other encodings handled at the edges
01:54:24 <Corbin> ais523: Yes. I'm sure you've heard me rant about The Mill before; their "belt" concept would be great for that.
01:54:35 <keegan> (Unicode was designed with this use case in mind, and would be substantially simpler without it)
01:54:50 <ais523> which is why we have assembly language commands like VZEROUPPER, and certain special cases of XOR
01:55:17 <ais523> Corbin: I do like The Mill's approach, although it got me thinking about how I would do things and I came to different conclusions on some things
01:55:30 <Corbin> (The Mill's belt is just the most recent eight results from the ALU. By default, values are usable for a few operations and then disappear forever.)
01:55:34 <ais523> e.g. I think it would be more useful to flip the belt the other way round: commands say when their output is going to be used, not when it was generated
01:55:52 <ais523> also I didn't realise it was as short as 8, I was assuming it would be much longer
01:56:27 <zzo38> I think farbfeld is better than pnm, although it is the same kind of idea it is a good one. Fortunately, ImageMagick now supports both formats (and my own Farbfeld Utilities also has encoders/decoders for netpnm formats, too).
01:57:31 <ais523> Corbin: widely-applicable intermediate representations are something we could do with having in a range of fields (and if they're easily interconvertible it probably also doesn't matter if we have too many)
01:57:40 <Corbin> Oh, I think it's eight minimum? ISTR Ivan saying that belt space is very expensive because it's interconnected to so many functional units, so maybe sixteen-ish on high-end chips.
01:58:36 <ais523> Corbin: so the great thing about a "when it's going to be used"-style belt is that the interconnect isn't too bad, because you can safely use a slow path if the value you see there is some high number of instructions
01:58:38 <Corbin> ais523: Yes, although there's still a linear cost to maintaining each representation. I know that I will have to compile a few simple languages into Cammy for demonstration purposes, but I won't maintain optimizing compilers because there's usually no interesting source programs to optimize.
01:58:50 <ais523> and only need to fast-path the low-numbered choices
01:59:00 <ais523> it's information you get to know whether you need a fast algorithm or can do with a slow one
01:59:16 <ais523> what is Cammy?
01:59:22 <zzo38> I dislike UTF-8 (or any Unicode format, or any other encoding) as the one true internal string representation. I think having byte strings is better, and you can still have functions that treat them as UTF-8 if wanted, as well as ones dealing with byte strings. (Converting between text encodings perfectly isn't possible, even though they try to say otherwise.) I dislike many modern programming languages that use Unicode.
01:59:38 <Corbin> Aha, that's clever. Makes sense. Strange how time reversal sometimes dramatically changes what needs to be tracked.
02:00:17 <Corbin> ais523: Oh, [[Cammy]] on the wiki, just my current pet project. It's not yet Turing-complete, because I'm lazy and also Turing categories are hard.
02:00:27 <ais523> zzo38: I believe that it's normally correct to have "bytestring" and "character string" as the two main string types internally
02:00:49 <ais523> and that with a character string, it's generally preferable for the internal encoding to not be user-visible, although UTF-8 makes for a good choice
02:01:07 <nakilon> zzo38 I'll take you to my team of creators of a "cleaner unicode"
02:01:53 <ais523> one thing I dislike about UTF-8 is that it spends a lot of encoding space to get some fairly minor advantages
02:01:54 <zzo38> ais523: Having separate types is better, yes, although there are often the deficiencies that many things will not work properly with byte strings, I find
02:02:07 <ais523> IIRC there are six byte values that can't legally appear in UTF-8 at all
02:03:02 <ais523> actually, I think even better would be to have typed byte streams that were parameterised by encoding, and probably the ability to parameterise character strings by encoding too
02:03:33 <ais523> (where "encoding" here would handle things like the form of escaping that had been used, what metadata beyond normal characters it could contain, etc.)
02:03:57 <zzo38> Actually, yes, I suppose that can be helpful.
02:05:15 <zzo38> (Although, it will not always be relevant, sometimes it is helpful to be able to use such a feature (as an option, perhaps).)
02:05:23 <ais523> one thing I would like to see more languages provide is a standardised type for formatted text
02:05:26 <nakilon> $ irb
02:05:26 <nakilon> irb(main):001:0> "лол".encoding
02:05:26 <nakilon> => #<Encoding:UTF-8>
02:05:26 <nakilon> irb(main):002:0> "лол".b.encoding
02:05:26 <nakilon> => #<Encoding:ASCII-8BIT>
02:05:46 <ais523> that seems wrong?
02:06:35 <ais523> I would say that the /first/ one is a character string, and the /second/ one is the UTF-8 encoding of a character string
02:06:47 <nakilon> String instances in Ruby are a struct of byte sequence on an encoding info to help other functions preprocess the sequence when needed
02:07:10 <nakilon> *and an
02:07:31 <ais523> or if it genuinely is an 8-bit extended ASCII, perhaps ISO 8859-5, but that seems unlikely
02:07:49 <zzo38> Even when it is UTF-8, there are considerations what will be wanted. Sometimes you will want to store unpaired surrogates (which is WTF-8), sometimes you might want characters beyond Unicode range (UTF-8-E, etc), sometimes you might want null characters, and sometimes you might not want these things, depending on the application.
02:07:53 <ais523> nakilon: in that case I think it's wrong
02:08:10 <ais523> if you hand a function a string, it shouldn't have to decode it itself whenever it wants to do anything with it
02:08:50 <nakilon> > "лол".b
02:08:50 <nakilon> => "\xD0\xBB\xD0\xBE\xD0\xBB"
02:08:52 <lambdabot> error:
02:08:52 <lambdabot> • Couldn't match expected type ‘b0 -> c’ with actual type ‘[Char]’
02:08:52 <lambdabot> • In the first argument of ‘(.)’, namely ‘"лол"’
02:08:53 <ais523> zzo38: it'd be nice to have a standardised name for "UTF-8 except NUL is encoded as C0 A0"
02:09:22 <nakilon> String#b here is a method that just changes the encoding attribute, it does not change the data
02:09:33 <zzo38> ais523: Yes, I agree
02:09:52 <nakilon> so this string for example becomes printed in wrong way because ascii can't do that
02:10:24 <ais523> I guess what Ruby's doing is a leaky abstraction
02:10:46 <nakilon> what's wrong with it?
02:11:15 <nakilon> those japanese folks defeated unicode problems 15 years before python
02:11:50 <zzo38> Another encoding is TRON, although it is uncommon and has a different set of problems
02:12:25 <zzo38> What I think is that no character encoding will be good for all purposes, although they can be good for some things (although still, improvements can be possible, sometimes)
02:12:27 <ais523> I guess the problem is that it's using the same type for bytestring and character string, *and* has a way to look at the internal representation of a character string when that shouldn't matter
02:12:48 <ais523> and it wouldn't surprise me if it doesn't enforce that the bytes stored in a string actually match the given encoding
02:13:11 <nakilon> all data are 0 and 1
02:13:12 <ais523> Perl also solved this problem substantially before Python did, and I don't think its solution is perfect, but I prefer it to Ruby's
02:13:21 <nakilon> it does not matter how many types you declare
02:13:26 <ais523> yes, but that doesn't mean you have to present it to the user as 0 and 1
02:13:59 <ais523> and sometimes, semantically there will be a requirement not to have 0s and 1s in particular places, and it's an advantage if the language can enforce that rather than making the user do it
02:14:21 <nakilon> Ruby has only a String and you are free to change the encoding at any moment, just know what you are doing or you'll get an exception unless you pass the flags of ignore/replace explicitely
02:14:49 <ais523> fwiw, Perl's solution is that a string is conceptually an array of numbers, which could represent either codepoints or raw bytes, and for string-related functions you can specify what interpretation to use
02:15:06 <nakilon> when you get the data from network for example it's ASCII by default and the internet really has no idea what encoding is -- it's your application job to say "here we expect this encoding" so you apply it
02:15:22 <ais523> internally, strings that contain values above 255 are stored in a slightly extended UTF-8, and strings that don't are sometimes stored just as raw bytes, but you're not supposed to know or care about that detail
02:15:45 <ais523> nakilon: and if the string isn't valid for its claimed encoding?
02:16:50 <ais523> in Rust, there's &[u8] (byte slice) and &str (string slice – internally UTF-8 encoded); there's a fast-path function to reinterpret a &[u8] as a &str, but it still checks to make sure that the string is valid UTF-8 and refuses to produce output if it isn't (producing an error you have to handle, instead)
02:17:21 <ais523> and of course, if the byte slice isn't supposed to be UTF-8 you can still translate it to a string but there isn't a fast-path for that, as it'd need to be re-encoded into the internal UTF-8 representaiton
02:18:03 <ais523> &str does leak its UTF-8-ness in a few other ways, though, such as measuring length in bytes of UTF-8 for the purpose of taking substrings (I think this is a concession to performance)
02:18:14 <nakilon> ais523 the invalidity is a thing that exists only when you are doing some final rendering, like printing or converting between the passed enc_a and enc_b; in those cases you use flags to ignore the encodnig and there are also some methods to clean up the mess prematurely like https://ruby-doc.org/core-2.4.0/String.html#method-i-scrub
02:19:38 <nakilon> there are separate methods .size and .bytesize
02:19:47 <zzo38> Does Rust have all of the string dealing functions can work with byte strings too, in case you do not want to use Unicode strings (either sometimes or all the time, depending on the program)? Not all text is Unicode text (and not all can be converted to/from Unicode properly either; sometimes it is useful to try anyways, but sometimes it is better not to)
02:20:07 <nakilon> and the fixing flags are here: https://ruby-doc.org/core-2.4.0/String.html#method-i-encode
02:20:08 <ais523> zzo38: sort-of; they aren't polymorphic but they're duplicated between the string and bytestring cases
02:20:15 <ais523> except in cases which don't make sense
02:20:25 <zzo38> One thing I like about C and PostScript is that it doesn't use Unicode.
02:21:12 <ais523> e.g. strings have both to_uppercase and to_ascii_uppercase, wheres bytestrings only have the ASCII version
02:21:22 <zzo38> Yes, sometimes some functions won't make sense for both, is reasonable. That is one example, yes
02:22:35 <ais523> it's surprising how useful the ASCII-specific functions are, actually, they're good for things like handling programming languages with case-insensitive keywords (because the keywords are generally recognised only if written with ASCII characters)
02:22:48 <zzo38> (Although if they are stored internally as UTF-8 and known to be valid UTF-8, then it seems that some of them could be polymorphic, including to_ascii_uppercase since it is doing the same thing whether it is UTF-8 or ASCII presumably)
02:23:04 <ais523> hmm… now I'm wondering if there's a programming language with case-insensitive keywords and one of them contains the substring "ss"
02:23:21 <ais523> it'd be interesting to throw a ß at them to see if it would be recognised, I bet it wouldn't be
02:25:07 <ais523> zzo38: it'd be hard to make to_ascii_uppercase polymorphic in Rust without a special safety override – the compiler would see bytes internally within a string being mutated directly by to_ascii_uppercase, which isn't allowed by default in case you change the internal representation in a way that makes it invalid UTF-8
02:25:16 <ais523> so it's easier to just duplicate the code rather than using unsafe code
02:25:47 <ais523> you definitely *could* write a function to handle both but I don't think the compiler developers would want to
02:25:59 <zzo38> Maybe some might do that with "ss" depending on the implementation, in which case some implementations might be incorrect.
02:26:36 <nakilon> > "лол".b.reverse.force_encoding("utf-8")
02:26:36 <nakilon> => "\xBBол\xD0"
02:26:37 <lambdabot> error:
02:26:37 <lambdabot> Variable not in scope: force_encoding :: [Char] -> a -> [a0]
02:26:47 <nakilon> reversing the bytes broke it )
02:27:17 <ais523> hmm, at least that function has a sufficiently scary name
02:27:36 <ais523> Haskell would probably call it UnsafeForceEncoding or something like that
02:27:38 <zzo38> (Also, maybe the language might need to be specified for Unicode case functions, since e.g. some languages will have dot and dotless "I"/"i" being separately, etc)
02:28:24 <ais523> Rust's super-scary functions generally have very normal/unassuming-looking names, but you need to use a special keyword for calling them to say that you're recognising the danger
02:29:01 <ais523> zzo38: I actually think Turkic case-folding might be the *only* case in which you can't infer the case-folding rules from the codepoints being used, but I'm not sure
02:29:14 <keegan> only for a particular definition of super-scary (might cause undefined behavior)
02:29:27 <nakilon> I suppose there is a segfault thing in rust
02:29:32 <keegan> using 'unsafe' to encode other properties of reasonable API usage is frowned upon
02:29:37 <ais523> it would have simplified things if "Turkic dotless ı" and "Turkic dotted i" were different codepoints from "non-Turkic lowercase i"
02:30:18 <ais523> keegan: I guess, although undefined behaviour is by definition worse than anything else that might go wrong with a program
02:30:22 <keegan> but constructing a &str that points to invalid UTF8 can cause undefined behavior
02:30:36 <keegan> because other 'safe' &str APIs are free to assume that it is valid UTF-8
02:31:11 <keegan> so any function which converts &[u8] to &str (or Vec<u8> to String, etc.) needs to either check that it's valid UTF-8, or be 'unsafe' to leave that up to the programmer
02:31:13 <ais523> yes, e.g. seeing an FD byte means it's safe to read three more bytes
02:31:58 <ais523> which it might not be, if the internal format isn't being statically analysed to be correct
02:32:38 <ais523> something that many UB models do, but Rust doesn't, is distinguish between values that are UB to use and values that are UB to even construct
02:33:07 <ais523> Rust currently has a misencoded &str as something that's UB to even construct, like it does with its other UB values, although It seems unlikely that it would be a problem in practice (still, there's no reason to actually do that)
02:33:54 <ais523> I guess the advantage for the "UB to even construct" model is that it makes it easier to reason about things like signal handlers causing unwinds at unexpected points in the code
02:34:26 <ais523> (although, I believe Rust's model for panics is that it's safe to create regions of code in which a panic would be undefined behaviour as long as you can prove that there can't actually be a panic there)
02:34:35 <zzo38> One thing that Unicode is especially bad for, I think, is arranging text on a grid-based text display. (This doesn't mean it is bad for everything. For some things, Unicode is OK, but could be better for those things.)
02:35:22 <ais523> zzo38: I was writing a program to do that, but didn't get very far
02:35:39 <ais523> the furthest I got was working out how wide each character should be, and even then it isn't really specified in detail in Unicode
02:35:58 <nakilon> in Ruby raw socket data encoding is ASCII while the built-in http retrievers are utf-8 by default; let me check... open("https://www.google.com/", &:read).encoding => #<Encoding:ISO-8859-1> -- oh I'm not sure what it is
02:36:18 <zzo38> Yes, that is the greatest (but not only) problem with using Unicode for arranging text on a grid-based text display. Other encodings will be better.
02:36:33 <ais523> in HTTP, also in HTML, the convention is that the producer of the data (typically the server) states what encoding it is in
02:36:50 <ais523> although the recommendation is always to send UTF-8 with a label stating that it's UTF-8
02:37:17 <nakilon> hm, Chrome says www.google.com responds with content-type: text/html; charset=UTF-8 actually
02:37:25 <ais523> both HTTP and HTML provide separate mechanisms for stating what the encoding is; I think HTML's wins if there's a contradiction
02:37:30 <zzo38> If I was designing to include the encoding with the string, I might use the 16-bit code page numbers, such as: 0 means no encoding, 367 means ASCII only, 1209 means UTF-8, etc. I also made a list of some of things that as far as I know don't have existing code page numbers, so I made up my own for them, e.g. TRON, CLC-INTERCAL EBCDIC, Powerline UTF-8, etc.
02:37:59 <ais523> nakilon: they may both be right, nothing forces Google to respond in the same encoding every time
02:38:25 <nakilon> yeah, ruby sens ruby's user-agent
02:38:28 <nakilon> so idk
02:38:31 <nakilon> *sends
02:38:46 <zzo38> (I might also, in a terminal emulator, use one escape code to select the code page. If Unicode is implemented at all, the standard sequence to select UTF-8 might select the "Powerline UTF-8" code page, although I would think it better to use an encoding that doesn't confuse character widths and that stuff like Unicode does, instead.)
02:39:30 <ais523> huh, I just accessed google.com manually via telnet, asking for its homepage
02:39:49 <ais523> <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>301 Moved</TITLE></HEAD><BODY> <H1>301 Moved</H1> The document has moved <A HREF="http://www.google.com/">here</A>. </BODY></HTML>
02:40:13 <ais523> I was shocked to see a "Google has moved" message, but I guess this what their webserver does for the human-readable part of a 301
02:40:21 <nakilon> it added www. to your request
02:40:31 <ais523> no, it told me to retry with a www.
02:40:32 <ais523> (I sent without)
02:40:53 <nakilon> yeah I mean that, just said it wrong
02:41:14 <ais523> this is some weird HTML, though
02:41:41 <ais523> upper-case tags, apart from one very old-fashioned character set tag (which duplicates an HTTP header)
02:41:48 <ais523> no doctype, so it's quirks mode
02:41:58 <nakilon> this reminds me the crazy guy back in 2006 who wanted to make his own instant messaging application that would send screenshots of text instead of the text
02:42:08 <nakilon> he would defeat the encoding problem
02:42:53 <ais523> either this is a page that hardly ever gets looked at (which wouldn't surprise me – it does its job and is rarely seen by a human), or there's some advantage to writing it in that style (which wouldn't surprise me)
02:43:32 <zzo38> It would, although that approach has different problems (such as taking up more space, and being unable to change the fonts). DVD subtitles work similarly; they store the text as pictures.
02:43:33 <ais523> It looks automatically generated, especially given that this is going to only be seen by the most primitive of Web user agents, and people who access websites without a browser for some reason
02:44:11 <ais523> nakilon: maybe an interesting middle-ground would be to have a library of images of characters, and then encode the image by referencing them
02:44:25 <ais523> this would also solve the encoding problem for human-readability purposes
02:44:36 <ais523> but would be very bad for software accessibility, you're basically sending a CAPTCHA
02:45:01 <ais523> people often want to copy-and-paste out of messages into text fields, for example (and blind people may rely on a screen reader)
02:45:53 <ais523> hmm, I don't think the user agent has the ability to tell the server what character encoding it wants the server to send
02:46:21 <nakilon> "a library of images of characters, and then encode the image by referencing them" -- isn't it a font file? ..D
02:46:23 <ais523> so you'd therefore expect Google to send the same character encoding to both Ruby and Chrome
02:46:50 <ais523> nakilon: almost – font files are supposed to specify which codepoint is used for which of the images they contain
02:46:55 <zzo38> Character names is another thing that may be useful for some purposes and worse for others, such as how PostScript does when rendering fonts.
02:47:04 <ais523> whereas this would just be specifying the image to use by index into the file, which might have nothing to do with codepoints
02:47:24 <nakilon> "people often want to copy-and-paste out of messages" -- now this is about my idea of the universally accepted "chat copypasta" format
02:47:40 <ais523> incidentally, some LaTeX renderers appear to render via using fonts that specify codepoints unrelated to the actual characters being used, which can create weird results if you copy-and-paste from the resulting PDF
02:48:16 <ais523> nakilon: or visit a link in a message, which might not always (or usually) be done by copy-and-pasting but it's essentially the same problem
02:48:38 <nakilon> https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding
02:49:21 <fizzie> ais523: The user agent doesn't *directly* indicate which character encoding the client expects, but empirical tests show that copying my Chrome UA into curl -A (but changing nothing else) does make a difference in the encoding of the result.
02:49:21 <ais523> nakilon: that's the first one I tried
02:49:26 <ais523> but it's about gzip and the like
02:49:41 <esolangs> [[Cammy]] M https://esolangs.org/w/index.php?diff=87931&oldid=87614 * Corbin * (+53) The reference implementation now has its own typechecker, rather than calling OCaml and parsing the type.
02:49:50 <ais523> fizzie: that's interesting, I wonder what its criterion is
02:49:51 <zzo38> Even if it is just a index into the file, supporting code page conversion with it can still be helpful. (Also, some glyphs might correspond to a sequence of code points instead of a single one, depending on the font and on the encoding; this is even true of Unicode, and in the case of Unicode at least it can even be ambiguous. So, even then the index into the file is insufficient even if you do have conversion.)
02:50:05 <nakilon> wait I'm wrong that's wrong header
02:50:09 <nakilon> I think I saw one
02:50:57 <fizzie> Apparently there *was* an Accept-Charset, but it is no more.
02:51:22 <ais523> zzo38: what criteria do you think that programs should use to decide what encodings their inputs are in, and what encodings to produce outputs in?
02:52:06 <ais523> I think the main options are "look at the locale environment variables (or OS equivalent)", "always use/assume UTF-8", and intermediate combinations of those
02:52:08 <zzo38> ais523: Depends on the program. It may be one or more of: the file itself, the file format it deals with, command-line arguments, environment variables (such as LANG), etc.
02:52:15 <nakilon> "specifying the image to use by index" -- reminds me inserting icons via CSS by addressing a shift and crop on a single png asset
02:52:45 <nakilon> so the character addresses don't have to be a natural number but can be even an overlapping region
02:53:17 <ais523> <HTML standard> The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it. Those requirements necessitate that the document's character encoding declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must
02:53:18 <ais523> be UTF-8.
02:54:12 <ais523> knowing HTML, it wouldn't surprise me if it discusses what to do if it isn't, regardless
02:55:05 <zzo38> For example, a program that deals with JSON can assume UTF-8 for input, and using a switch to switch the output between UTF-8 and ASCII. Programs that just write descriptive text (such as status messages, error messages) should either assume ASCII or use the environment variables (and should never just assume UTF-8). For other things, other encodings are useful, or sometimes just ignoring encodings entirely and copying text as-is.
02:55:06 <nakilon> "link in a message" -- maybe this is a thing to be changed in future; we are used to copypaste texts, then lazyness made us want to just click, maybe there are new ways to make a buddy visit your page, like forcing his software doing it without showing you the URL; also deeplinks
02:55:20 <ais523> interestingly, it also states that the encoding needs to be specified via one of a number of ways, but accepts BOM as one of those ways
02:55:41 <ais523> so it's standards-compliant to send an HTML page with just a UTF-8 BOM as the only thing saying it's UTF-8, but it wouldn't surprise me if browsers missed that case
02:56:24 <ais523> nakilon: I guess if you're sending images, you could send a QR code (although that just gets back into encoding issues again)
02:56:58 <fizzie> ais523: FWIW, supplying a Chrome-like user agent changes a lot of other things too. For example, using plain HTTP will serve a redirect to HTTPS to Chrome, but not to an arbitrary user agent. And of course the content of the page will be entirely different too. (All that from external observations. I don't think I can comment on the actual criteria.)
02:57:04 <ais523> zzo38: I was mostly thinking about text intended for humans; if the file format specifies an encoding, then obviously you need to use that one
02:58:21 <zzo38> ais523: For status messages, either assuming ASCII or using the environment variables is the way to do it, I think.
02:58:30 <ais523> yes
02:58:40 <ais523> I don't think I can assume ASCII because that makes my programs impossible to internationalize
02:58:58 <nakilon> qr code is like an analogue radio
02:59:16 <ais523> even in Canada, many users will want non-ASCII characters like œ and á in their output because many Canadians primarily speak French
02:59:19 <nakilon> you send the data all over the place via light rays depicting a black-white thingy
03:00:01 <ais523> that said, I think that if the environment variables ask for ISO-8859-1, the program should output that, at least for stderr output
03:00:51 <nakilon> hm, I now wonder if you can measure a "power of signal" of the qr code as the distance from which the image is readable
03:00:52 <ais523> I know I implemented locale-specific encoding code in ayacc, because I was trying to match the POSIX specification for yacc; although, I think most of the characters that yacc cares about are primarily stable between most commonly used encodings
03:00:58 <zzo38> If you want international status messages, the file with the international text can be for a specific encoding, and if the environment variable specifies that language but with a different encoding than any of the files, it can convert it, I suppose. However, I don't internationalize command-line status messages in my programs (although I will accept documentation and GUI in other languages)
03:01:03 <nakilon> so you could measure qr code in watts _<>
03:01:19 <zzo38> When just copying the input to output though, you could just pass the text "as-is" without worrying about the encoding.
03:01:33 <ais523> right
03:01:55 <ais523> my experience with programs like xterm (which originally used that method) is that it's probably better to have the program do encoding conversion on output, though
03:02:19 <ais523> you shouldn't need separate translation databases for FR_ca.UTF-8 and FR_ca.ISO-8859-1
03:03:20 <ais523> ugh, have my French locale files got uninstalled somehow? :-(
03:03:35 <ais523> apparently not
03:03:43 <zzo38> That helps when you don't have the data in the proper encoding, but when you do, you should just use them, to avoid problems with conversion.
03:04:13 <fizzie> I always generate the fi_FI locale on my systems, even though I never use it.
03:04:30 <ais523> I like to have a non-English locale around for testing locale issues, and partly just for fun
03:05:00 <ais523> zzo38: ah right, so the point is that you don't want to decode into codepoints and back into bytes if the data's in the correct encoding already
03:05:05 <nakilon> wikipedia: "ISO-8859-1 was (according to the standard, at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/""
03:06:08 <ais523> aren't web browsers supposed to interpret "ISO-8859-1" as actually meaning Windows-1252, though?
03:06:41 <nakilon> it says websites then switched to 1252 slowly
03:06:46 <nakilon> as I understood
03:06:49 <ais523> ah, here we go: https://html.spec.whatwg.org/#determining-the-character-encoding
03:07:48 <ais523> the HTML spec actually says that the default encoding for unmarked webpages should be assumed based on the locale within which their web browser is running
03:09:01 <fizzie> MediaWiki has a special language code "qqx" that shows the message name as the translated message content, learned of that recently.
03:09:14 <nakilon> btw
03:09:37 <ais523> fizzie: oh wow, that would be so useful to have known about earlier (assuming it worked back then)
03:09:49 <nakilon> attaching encoding as an attribute to byte sequence is like a header in image formats -- it has orientation, alpha correction, etc.
03:09:55 <fizzie> From 1.18 onwards apparently.
03:10:06 <nakilon> and there are plenty of bugs when software fucks up those headers
03:10:17 <nakilon> *metadata
03:10:21 <ais523> for ages I resorted to (web-browser-equivalent-of) grepping Special:AllMessages, which became a lot more frustrating when it stopped fitting on a page, and would have false positives due to text duplicated between messages
03:10:55 <ais523> although, is "qqx" in the right namespace for private-use language codes?
03:12:26 <ais523> RFC 5646 implies the name should start "x-", but maybe MediaWiki doesn't like the hyphen
03:12:45 <ais523> (amazingly, I actually guessed that the private use prefix was "x-", but wanted to look it up to be sure)
03:35:25 -!- dbohdan has quit (Read error: Connection reset by peer).
03:36:12 -!- dbohdan has joined.
03:36:38 -!- ais523 has quit (Quit: quit).
03:43:33 -!- chiselfuse has quit (Remote host closed the connection).
03:58:13 <zzo38> Do you know which (if any) games other than Escape (by Tom7) can have bizarro world (like Escape does), and what its features are? I wanted to implement bizarro world in Free Hero Mesh, and should have the idea about its design.
04:38:10 -!- Everything has joined.
05:06:05 -!- oerjan has joined.
05:20:55 <shachaf> What is a bizarro world?
05:22:44 <zzo38> When using any cipher in CTR mode (including ChaCha20), does the nonce and counter have to be separate? If it is long enough, then you can do both?
05:24:30 <keegan> I think you can combine them
05:24:46 <keegan> that is, start the counter at a random value and include that with the message
05:25:10 <keegan> assuming that your block size is big enough that you are unlikely to ever use the same keystream block twice in two different messages
05:25:50 <zzo38> Yes, it is how I mean
05:42:23 <zzo38> Does any cipher use CFB or OFB with a counter added?
05:45:53 <keegan> how would you add it
05:47:07 <zzo38> Probably XOR or addition with the input of input of each step
05:59:24 <zzo38> I suppose there is three kind of functions that would be used: fixed->fixed (e.g. ChaCha20), (fixed,key)->fixed (e.g. most block ciphers), variable->fixed (e.g. most hash functions). There is also key whitening (XEX mode), and there is adding input to output like ChaCha20 does to make it difficult to reverse.
06:00:06 <zzo38> It is easy to see with ChaCha20 that if the input is zero, then the output will also be zero. But, even if only one bit is set of the input (I tried this), then the output will be all mixed up.
06:00:35 <nakilon> turns out it was GRPC compilation what was eating all the RAM
06:00:53 <nakilon> it takes like 5 minutes and almost 500mb
06:46:45 <nakilon> or rather 40 minutes already and does not finish, lol
06:49:27 <nakilon> oh specifically it's a Google's OpenSSL: https://github.com/google/boringssl
07:01:30 -!- Sgeo has quit (Read error: Connection reset by peer).
07:01:45 <b_jonas> "<ais523> one (maybe the only/) motivating idea of the original Befunge was to be as difficult as possible to compile" => I'd like to preemptively state that that wasn't my goal when designing Consumer Society, it's just a necessary side effect
07:16:50 <b_jonas> the interesting part is that it also seems hard to translate Consumer Society to the existing high-level non-eso programming languages, even though many of them seem to have all the features necessary. this part is more an accidental side effect than necessary.
07:17:05 <oerjan> ah, spam in finnish. (Although there seems to be an english version at the end.) it's been a while since the last one.
07:22:59 -!- riv has joined.
07:29:51 <nakilon> ok I think the compilation will just never end
07:30:19 <nakilon> because they assign this issue https://github.com/grpc/grpc/issues/26655 to pythonist who commit 3 days a week and already have 60 issues assigned
07:38:52 -!- Corbin has quit (Ping timeout: 245 seconds).
07:39:17 <nakilon> oh looks like he supports all the languages at the same time, and they've already increased CI build timeouts from 60 minutes to 90 https://github.com/grpc/grpc/pull/27230/files
07:58:57 <esolangs> [[Ppencode]] https://esolangs.org/w/index.php?diff=87932&oldid=86861 * YamTokTpaFa * (-2) /* Definition of Perl keywords */ '''I AM SORRY I FORGOT x'''
08:06:09 -!- hendursa1 has joined.
08:06:41 <esolangs> [[Ppencode]] https://esolangs.org/w/index.php?diff=87933&oldid=87932 * YamTokTpaFa * (+96)
08:09:36 -!- hendursaga has quit (Ping timeout: 276 seconds).
08:14:54 <esolangs> [[Ppencode]] https://esolangs.org/w/index.php?diff=87934&oldid=87933 * YamTokTpaFa * (+84) /* Definition of Perl keywords */
08:22:07 -!- Koen_ has joined.
08:24:04 <esolangs> [[Ppencode]] https://esolangs.org/w/index.php?diff=87935&oldid=87934 * YamTokTpaFa * (+222) /* Definition of Perl keywords */ WTF new news
08:38:10 -!- joast has quit (Ping timeout: 240 seconds).
08:44:05 -!- wib_jonas has joined.
08:45:27 -!- joast has joined.
08:51:24 <esolangs> [[Ppencode]] https://esolangs.org/w/index.php?diff=87936&oldid=87935 * YamTokTpaFa * (+118) /* Definition of Perl keywords */
08:55:14 <esolangs> [[Ppencode]] https://esolangs.org/w/index.php?diff=87937&oldid=87936 * YamTokTpaFa * (+132) /* Definition of Perl keywords */
08:56:20 -!- imode has quit (Ping timeout: 252 seconds).
08:56:36 <esolangs> [[Ppencode]] https://esolangs.org/w/index.php?diff=87938&oldid=87937 * YamTokTpaFa * (+20) /* Definition of Perl keywords */
09:00:08 -!- daggy1234[m] has quit (Quit: You have been kicked for being idle).
09:05:15 <wib_jonas> I mostly agree with ais523 here. Python 3 does the string stuff right for a high-level language: there are separate types for byte string and unicode string, and you generally don't need to know how the unicode strings are represented. It's not perfect, ideally you'd want unicode strings to have an internal form where they store utf-8 and decode
09:05:15 <wib_jonas> that only when necessary, since we do a lot of utf-8 IO, but it's still very good. Rust kind of has the approach right for a low-level language where you want to control the representation explicitly in the type system, but the standard library is somewhat lacking in string operations, so you might sometimes want to use extra libraries. But since
09:05:16 <wib_jonas> it's a low-level language, it does expose enough API that you can do this and can still convert to the standard library APIs and call the standard library functions when they make sense.
09:09:07 <wib_jonas> `<ais523> hmm… now I'm wondering if there's a programming language with case-insensitive keywords and one of them contains the substring "ss"' => is "keyword" relevant here? those languages usually also have c-i user-defined identifiers.
09:09:09 <HackEso> ​<ais523>? No such file or directory
09:10:54 <nakilon> there is no need in another class if all strings are just strings
09:12:51 <nakilon> you think about them like about a sequence of codepoints and you don't care about the internal presentation
09:13:07 <wib_jonas> ‘<ais523> zzo38: I actually think Turkic case-folding might be the *only* case in which you can't infer the case-folding rules from the codepoints being used, but I'm not sure’ => that is the only case I know of too, but we'll have to check the sources of libICU to be sure.
09:14:39 <wib_jonas> ‘<ais523> it would have simplified things if "Turkic dotless ı" and "Turkic dotted i" were different codepoints from "non-Turkic lowercase i"’ => I don't think so. The turkish i is one of those cases where there's no good solution, it's all tradeoffs. In particular if you did that, then you'd have problems when you copy a latin script proper
09:14:39 <wib_jonas> name into a turkish text and later try to uppercase it.
09:16:04 <wib_jonas> The only solution that might work is to go back in time and convince/bribe/force Kemal Atatürk to not start that convention, but this might be dangerous or impossible for time-travel-related reasons
09:20:36 <wib_jonas> ‘hmm, I don't think the user agent has the ability to tell the server what character encoding it wants the server to send’ => there's an Accept-Charset request header in HTTP/1.1, but I doubt it does much in practice
09:24:53 <wib_jonas> ‘I think the main options are "look at the locale environment variables (or OS equivalent)", "always use/assume UTF-8", and intermediate combinations of those’ => there's also automatically guessing from the (beginning of) input, and of course explicit command-line options or environment variables, with intermediate stuff between the four.
09:27:01 <wib_jonas> for example, I might write a script with a command-line option to set encoding of the input, a default that's either utf-8 or utf-16-le, and a warning if you keep the default utf-8 but the input is guessed to be utf-16 or vice versa, with the explicit input encoding command-line option silencing that warning.
09:32:06 <wib_jonas> ‘<ais523> I guess if you're sending images, you could send a QR code’ => I recently saw a large size ad poster where most of the poster was printed in high resolution, presmuably from a vector image, but it also contained a QR code that was blown up from a too small resolution bitmap that has antialiasing, and blowing it up to the huge size and
09:32:07 <wib_jonas> high resolution printing of the poster made those artifacts show up as various 0.005 m sized gray squares on most of the borders of a 0.03 m sized QR code grid
09:33:41 <wib_jonas> ‘<fizzie> ais523: FWIW, supplying a Chrome-like user agent changes a lot of other things too.’ => does that depend on just the user-agent, rather than other HTTP stuff such as other request headers?
09:33:53 <wib_jonas> and eg. whether you send a HTTP/2 request
09:37:18 <wib_jonas> "<ais523> the HTML spec actually says that the default encoding for unmarked webpages should be assumed based on the locale within which their web browser is running" => that was the state of art, yes, but with an encoding setting in the menu of the browser client
09:40:16 <wib_jonas> ‘<ais523> although, is "qqx" in the right namespace for private-use language codes?’ => yes, https://en.wikipedia.org/wiki/ISO_639-2#Reserved_for_local_use
09:43:52 <wib_jonas> oerjan: I regularly get a significant part of the spam in hungarian. Some of them in broken badly auto-translated hungarian, some in well-phrased hungarian. I don't find that too surprising.
09:44:18 <oerjan> wib_jonas: the surprise is that i'm not finnish or in finland hth
09:44:57 <wib_jonas> oerjan: yeah. I occasionally get spam in all sorts of random common languages. those are easier to discard because they're obviously spam.
09:45:53 <oerjan> also, that i used to receive finnish spam ridiculously often some years ago.
10:00:08 -!- oerjan has quit (Quit: Later).
11:01:03 -!- arseniiv has joined.
11:26:28 <nakilon> omg future is today
11:26:51 <nakilon> self-driving taxi in Moscow starts operating this autumn
11:27:38 <nakilon> (maybe correct idiom is "future is now")
11:33:07 <int-e> . o O ( yay, killer drones )
12:20:53 -!- Koen__ has joined.
12:23:40 -!- Koen_ has quit (Ping timeout: 260 seconds).
12:27:30 <nakilon> `prefix
12:27:31 <HackEso> prefix? No such file or directory
12:27:54 <nakilon> ?prefix
12:27:54 <lambdabot> Unknown command, try @list
12:36:46 <nakilon> `help
12:36:46 <HackEso> Runs arbitrary code in GNU/Linux. Type "`<command>", or "`run <command>" for full shell commands. "`fetch [<output-file>] <URL>" downloads files. Files saved to $HACKENV are persistent, and $HACKENV/bin is in $PATH. $HACKENV is a mercurial repository, "`revert <rev>" can be used to revert, https://hack.esolangs.org/repo/ to browse. $PWD ($HACKENV/tmp) is persistent but unversioned, /tmp is ephemeral.
12:37:24 <nakilon> is `run the same as ```?
12:37:37 <nakilon> ^help
12:37:37 <fungot> ^<lang> <code>; ^def <command> <lang> <code>; ^show [command]; lang=bf/ul, code=text/str:N; ^str 0-9 get/set/add [text]; ^style [style]; ^bool
12:38:58 <nakilon> ?wiki wiki
12:38:59 <lambdabot> https://wiki.haskell.org/wiki
12:39:03 <wib_jonas> nakilon: not quite. `run is a builtin that's hard or impossible to override by messing up commands in /hackenv/bin . it's probably redundant given that you can also do `/bin/bash -c but it can't hurt and will have to stay for compatibility.
12:39:03 <nakilon> ?gwiki wiki
12:39:04 <lambdabot> No Result Found.
12:39:35 <wib_jonas> nakilon: also ``` pipes the output through a stupid meme filter that we should somehow get rid of but I don't dare to just remove it from the command
12:39:49 <nakilon> meme filter?
12:40:02 <wib_jonas> nakilon: yes. it's called rnooodl or something.
12:40:11 <wib_jonas> check the source codes if you want to know
12:40:36 <wib_jonas> it annoys me because it means any lines that aren't terminated by a newline byte are eaten if the command times out:
12:40:40 <nakilon> `cbt ```
12:40:40 <HackEso> cat: '/hackenv/bin/```': No such file or directory
12:40:52 <wib_jonas> ``` echo foo; echo -n bar; sleep 9999
12:41:08 <wib_jonas> nakilon: try with one less backtick
12:41:15 <nakilon> `cbt ``
12:41:16 <HackEso> ​#!/bin/sh \ export LANG=C; exec bash -O extglob -c "$@" | rnooodl
12:41:21 <wib_jonas> the first one is just the invocation character for HackEso\
12:41:24 <nakilon> oh, sure
12:41:27 <nakilon> one is prepended
12:41:27 <HackEso> No output.
12:41:38 <wib_jonas> um
12:41:48 <wib_jonas> why doesn't that show foo?
12:41:56 <wib_jonas> maybe it's even more buffered than I thought?
12:41:58 <nakilon> `cbt rnooodl
12:42:00 <HackEso> perl -pe 's/([Nn])ooodl/"$1@{[o x(3+rand 7)]}dl"/ge'
12:42:09 <wib_jonas> `/bin/sh -cecho foo; echo -n bar; sleep 9999
12:42:10 <HackEso> ​/bin/sh: 0: Illegal option -h
12:42:22 <wib_jonas> oh yeah, that's why you can't do that
12:42:26 <wib_jonas> `run echo foo; echo -n bar; sleep 9999
12:43:02 <HackEso> foo \ bar
12:43:02 <wib_jonas> `perl -eprint("one\ntwo");sleep 9999
12:43:19 <nakilon> can't set no buffer for perl cmd?
12:43:33 <HackEso> No output.
12:43:52 <wib_jonas> nakilon: that wouldn't be enough. you'd have to implement it properly to only buffer one character, and even that only if it's the "d" at the right context
12:44:59 <wib_jonas> that would be possible but so far I was lazy to do it
12:45:08 <wib_jonas> mostly because I hate rnooodl even besides the buffering
12:46:33 <nakilon> compoundng commands here is a cool feature https://esolangs.org/wiki/Bfbot
12:47:28 <nakilon> not sure why this bot is in [People]
12:47:42 <nakilon> oh fungot too
12:47:42 <fungot> nakilon: also how can i know what you come up with to justify rolling my own... php witch i don't use it
12:48:19 <nakilon> fungot I thought you are b_jonas until I reached the php
12:48:19 <fungot> nakilon: is he a total fnord versions.
12:48:40 <wib_jonas> oh good
12:49:07 <wib_jonas> although "PHP! Witch! I don't use that" does sound kind of like me
12:51:29 <nakilon> why there is no php bot
12:52:31 <nakilon> I remember on Rusnet the main bot on #programming and all related channels was one written in Delphi
13:11:13 -!- oerjan has joined.
13:14:23 <oerjan> <nakilon> (maybe correct idiom is "future is now") <-- needs a "the" in front
13:16:43 <nakilon> thanks
13:17:31 <oerjan> `prefixes (i think you may have been looking for this)
13:17:33 <HackEso> Bot prefixes: fungot ^, HackEso `, EgoBot !, lambdabot @ or ? or > , thutubot +, metasepia ~, idris-bot ( , jconn ) , j-bot [ , bfbot =, velik \.
13:17:50 <nakilon> exactly
13:17:55 <fizzie> There's also an implicit cat in all the commands, if memory serves as a way to ensure the standard output stream does not look like a terminal. (Some programs adapt their output format, and the argument was, output suitable for pipes is more likely to be output suitable for IRC.)
13:17:58 <nakilon> instead I went through wiki article
13:18:54 <nakilon> lambdabot is prefix squatter
13:19:47 <oerjan> technically it also squats :k and :t
13:20:27 <oerjan> as well as its own name in some cases
13:20:32 <oerjan> lambdabot, @run 2
13:20:33 <lambdabot> 2
13:21:05 <oerjan> but those are hard to do by accident and the list was getting too long
13:21:40 <oerjan> :k hm
13:21:41 <lambdabot> error: Not in scope: type variable ‘hm’
13:22:25 <oerjan> :k Monad
13:22:26 <lambdabot> (* -> *) -> Constraint
13:22:37 <fizzie> Full disclosure: I've not removed EgoBot's ! from that list as a sneaky plan to squat ! for "official" commands handled by the `esolangs` bot, in case such commands ever start to exist.
13:23:51 <oerjan> the bfjoust bots also use ! but only with specific commands following.
13:24:38 <oerjan> and it worked because all the bots using ! were silent if there's not a command match
13:24:45 <oerjan> oh and fungot too
13:24:46 <fungot> oerjan: i think that i aren t sarahbot are me? hm.
13:24:56 <oerjan> !bf ,[.,]!Hi
13:25:01 <oerjan> no wait
13:25:14 <oerjan> ^bf ,[.,]!Hi
13:25:15 <fungot> Hi
13:25:24 <oerjan> !logs
13:25:34 <oerjan> that one used to respond in private
13:25:40 <fizzie> fungot: Yeah, sarahbot isn't even a bot on this channel.
13:25:40 <fungot> fizzie: imagine this, means? completely just 1
13:25:42 <oerjan> by ... glogbot, i think
13:26:00 <fizzie> (I think sarahbot was from #scheme.)
13:26:42 <oerjan> well fungot is also silent on non-matching commands, that part was right
13:26:43 <fungot> oerjan: the chan serv answers to all comers at apple stores. :o fnord hours left before the next statement and write a bytechanger interpreter in asm, c, pascal, perl, pntr, refc, roma, and the
13:26:50 <oerjan> ^nosuchcommand
13:28:19 <nakilon> fungot how much is fnord hours?
13:28:20 <fungot> nakilon: anyone knows where i can find out what extras those contain.
13:29:39 <fizzie> Heh, I wasn't aware Apple stores have ChanServ in (on?) them.
13:36:25 -!- hendursa1 has quit (Quit: hendursa1).
13:38:29 -!- hendursaga has joined.
13:43:20 -!- wib_jonas has quit (Quit: Client closed).
13:52:12 -!- Koen__ has quit (Remote host closed the connection).
13:56:27 -!- Sgeo has joined.
13:58:57 -!- wib_jonas has joined.
13:59:24 <wib_jonas> perlbot prefixes
13:59:25 <perlbot> wib_jonas: Bot prefixes: fungot ^, HackEso `, EgoBot !, lambdabot @ or ? or > , thutubot +, metasepia ~, idris-bot ( , jconn ) , j-bot [ , bfbot =, velik \.
14:34:01 <wib_jonas> `` TZ=Pacific/Auckland python3 -c '"Getting the current wallclock time in the local timezone and UTC."; import datetime as g; n = g.datetime.now; [print(t.strftime("%Y-%m-%dT%H:%M:%S%z %Z")) for t in [n().astimezone(), n(tz = g.timezone.utc)]];'
14:34:04 <HackEso> 2021-09-09T02:34:03+1200 NZST \ 2021-09-08T14:34:03+0000 UTC
14:34:19 <wib_jonas> ^ Python's datetime module is so nontrivial to use that I want to get these magic incantations to the channel log
14:35:04 <wib_jonas> the TZ override is there only to make this a better test, since otherwise HackEso default to UTC timezone
14:35:57 -!- sprock has quit (Ping timeout: 265 seconds).
14:41:59 <wib_jonas> by the way,
14:42:01 <wib_jonas> `` TZ=uWIw/Dtca/FlSM date +%Z # esoteric way to strip the part of a filename after the first slash
14:42:03 <HackEso> uWIw
14:43:00 <APic> Cool ☺
14:43:11 <wib_jonas> it doesn't quite work for all filenames
14:45:28 -!- dyeplexer has joined.
15:12:23 <nakilon> $ ruby -e '[Time.now, Time.now.getutc].each{ |n| puts n.strftime "%FT%T%z %Z" }'
15:12:23 <nakilon> 2021-09-08T18:11:32+0300 MSK / 2021-09-08T15:11:32+0000 UTC
15:13:06 <wib_jonas> `` ruby -e '[Time.now, Time.now.getutc].each{ |n| puts n.strftime "%FT%T%z %Z" }' # does that work in this version of ruby?
15:13:08 <HackEso> 2021-09-08T15:13:08+0000 UTC \ 2021-09-08T15:13:08+0000 UTC
15:13:20 <wib_jonas> `` TZ=Pacific/Auckland ruby -e '[Time.now, Time.now.getutc].each{ |n| puts n.strftime "%FT%T%z %Z" }' # does that work in this version of ruby?
15:13:21 <HackEso> 2021-09-09T03:13:21+1200 NZST \ 2021-09-08T15:13:21+0000 UTC
15:13:22 <nakilon> oh I forgot it's installed
15:14:31 <wib_jonas> `` TZ=Pacific/Auckland; for r in "" -u; do date $r +"%Y-%m-%dT%H:%M:%S%z %Z"; done
15:14:32 <HackEso> 2021-09-08T15:14:32+0000 UTC \ 2021-09-08T15:14:32+0000 UTC
15:14:42 <wib_jonas> `` export TZ=Pacific/Auckland; for r in "" -u; do date $r +"%Y-%m-%dT%H:%M:%S%z %Z"; done
15:14:43 <HackEso> 2021-09-09T03:14:42+1200 NZST \ 2021-09-08T15:14:43+0000 UTC
15:14:54 <wib_jonas> looks good
15:17:33 <nakilon> `` ruby -rtzinfo -e 't = TZInfo::Timezone.get("Pacific/Auckland").to_local Time.now; [t, t.getutc].each{ |n| puts n.strftime "%FT%T%z %Z" }'
15:17:34 <HackEso> ​/usr/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in `require': cannot load such file -- tzinfo (LoadError) \ from /usr/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in `require'
15:18:07 <nakilon> this is the same just extracting timezone via standard gem tzinfo (that isn't installed))
15:18:47 <nakilon> so you could adjust timezone by a runtime var
15:30:01 <wib_jonas> `` TZ=Pacific/Auckland perl -e'use Date::Manip::Date; $b = Date::Manip::Date->new(); for $r ([],["setdate","now,UTC"]) { print $b->new("now",$r)->printf("%Y-%m-%dT%H:%M:%D%z %Z\n"); }'
15:30:08 <HackEso> 2021-09-09T03:30:09/09/21+1200 NZST \ 2021-09-09T03:30:09/09/21+0000 UTC
15:30:26 <wib_jonas> `` TZ=Pacific/Auckland perl -e'use Date::Manip::Date; $d = Date::Manip::Date->new("now"); for $_r (0,1) { print $d->printf("%Y-%m-%dT%H:%M:%D%z %Z\n"); $d->convert("UTC"); }'
15:30:28 <HackEso> 2021-09-09T03:30:09/09/21+1200 NZST \ 2021-09-08T15:30:09/08/21+0000 UTC
15:30:37 <wib_jonas> either of these work with the Date::Manip module
15:36:52 <nakilon> weird that perl is this longer here
15:37:39 <nakilon> `` TZ=Pacific/Auckland perl -e'use Date::Manip::Date; $d = Date::Manip::Date->new("now"); for $_r (0,1) { print $d->printf("%FT%T%z %Z\n"); $d->convert("UTC"); }'
15:37:41 <HackEso> Thursday, September 9, 2021T03:37:40+1200 NZST \ Wednesday, September 8, 2021T15:37:40+0000 UTC
15:37:48 <nakilon> oops
15:40:43 <wib_jonas> nakilon: more like it's not golfed. there were shorter ways to print these in perl if I wanted.
15:41:16 <wib_jonas> plus that's just one of the datetime modules available in perl
15:41:41 <wib_jonas> incidentally you can use the format "%O%z %Z" with Date::Manip.
15:42:12 <wib_jonas> but my goal here isn't to print the date in this one format, but to show how to get a date object (with which I could do arithmetic) and then print it in any format I choose
15:42:19 <nakilon> yeah I don't know how much space perl takes when it's not golfed
15:42:34 <wib_jonas> the python statement does that too, if I just wanted to print the current time in that format it could be shorter
15:43:48 <nakilon> `` TZ=Pacific/Auckland ruby -e 't = Time.now; p [t, t.class, t.zone]'
15:43:49 <HackEso> ​[2021-09-09 03:43:48 +1200, Time, "NZST"]
15:45:58 <nakilon> there are so many methods in classes Time, Date and DateTime -- impossible to remember them ..D
15:46:36 <nakilon> not even saying about all the trash that is added by Rails
15:47:43 -!- hendursaga has quit (Remote host closed the connection).
15:47:56 <nakilon> most of it is just the same under another name -- people invent some fancy methods, shortcuts, implement them in Rails, and then when it gets implemented in pure Ruby they don't throw it away from Rails, and it's just like a tumour
15:48:07 -!- hendursaga has joined.
15:48:19 <wib_jonas> if you want something short, then try one of
15:48:26 <wib_jonas> ``` date +%FT%T%z\ %Z; date --rfc-3=s
15:48:27 <HackEso> 2021-09-08T15:48:26+0000 UTC \ 2021-09-08 15:48:26+00:00
15:49:07 <wib_jonas> throw in a -u switch for UTC
15:49:16 -!- oerjan has quit (Quit: leaving).
15:49:20 <wib_jonas> ``` export TZ=Pacific/Auckland; date +%FT%T%z\ %Z; date --rfc-3=s
15:49:21 <HackEso> 2021-09-09T03:49:20+1200 NZST \ 2021-09-09 03:49:20+12:00
15:49:31 <wib_jonas> ``` export TZ=Pacific/Auckland; date -u +%FT%T%z\ %Z; date -u --rfc-3=s
15:49:32 <HackEso> 2021-09-08T15:49:32+0000 UTC \ 2021-09-08 15:49:32+00:00
15:50:39 -!- wib_jonas has quit (Quit: Client closed).
15:50:42 <nakilon> ``` ruby -e '$><<`date +%FT%T%z\\ %Z`<<`date --rfc-3=s`'
15:50:44 <HackEso> 2021-09-08T15:50:43+0000 UTC \ 2021-09-08 15:50:43+00:00
15:50:52 <nakilon> the latter does not work on macos
15:51:02 <nakilon> date: illegal option -- -
15:51:20 <nakilon> i.e. on BSD I assume
16:31:12 -!- Koen_ has joined.
16:55:56 -!- sprock has joined.
17:06:07 <b_jonas> nakilon: ... ok. you might need to install GNU coreutils for that. I don't use an OS X so I can't really help in that. (I could tell for Windows.)
17:37:20 -!- scjosh has joined.
17:38:43 -!- VilgotanL has joined.
17:38:45 <VilgotanL> h
17:45:39 <b_jonas> ais523: re https://logs.esolangs.org/libera-esolangs/2021-09-08.html#l2c ‘I actually think Turkic case-folding might be the *only* case in which you can't infer the case-folding rules from the codepoints being used’, I'm looking at the ICU sources right now.
17:45:46 <b_jonas> there's a enum in ucase.h that defines the constants UCASE_LOC_UNKNOWN UCASE_LOC_ROOT UCASE_LOC_TURKISH UCASE_LOC_LITHUANIAN UCASE_LOC_GREEK UCASE_LOC_DUTCH UCASE_LOC_ARMENIAN which ICU uses internally to know what casefolding rules to apply. I believe this isn't an exposed API. the correct constant is computed from the locale by ucase.cpp:ucase_getCaseLocale
17:45:55 <b_jonas> ucase.cpp also handles (at least some of the) actual locale-dependent case-conversion rules. there are comments like ‘// և ligature ech-yiwn uppercases to ԵՒ=ech+yiwn by default and in Western Armenian, but to ԵՎ=ech+vew in Eastern Armenian.’
17:46:06 <keegan> oh my
17:46:18 <b_jonas> UCASE_LOC_LITHUANIAN has to do something with dots on i and j with accents but I don't quite understand what; UCASE_LOC_DUTCH something with "IJ", and UCASE_LOC_GREEK does some magic with ancient greek accented characters.
17:47:20 <b_jonas> so, at least as libicu is concerned, turkish case folding isn't the only locale-dependent one, but some of this might be an artifact of ICU wanting to implement both practical casefolding and strict conformance to unicode casefolding
17:48:03 <b_jonas> this is from ICU4C version 69.1
17:48:06 -!- VilgotanL has quit (Remote host closed the connection).
17:48:18 <b_jonas> I don't think I want to delve into this deeper than that
17:48:26 <b_jonas> to delve into the source code that is
18:10:46 -!- Koen_ has quit (Quit: Leaving...).
18:16:42 -!- imode has joined.
18:16:52 -!- dyeplexer has quit (Remote host closed the connection).
18:17:48 -!- delta23 has joined.
19:28:20 -!- sprock has quit (Ping timeout: 260 seconds).
20:04:14 -!- sprock has joined.
20:15:11 <zzo38> I don't know how (or if) TRON handles case folding. I also think someone else on the esoteric programming IRC mentioned idea to have split codes with the language and glyph, that way you could also handle case folding properly, too
20:17:52 <zzo38> Also, you can implement Turkish case folding with Turkish character encoding.
20:51:51 <esolangs> [[Talk:Mogus]] N https://esolangs.org/w/index.php?oldid=87939 * ArthroStar11 * (+1621) Created page with "== Attempt at an interpreter == Hi, I attempted to make an interpreter for this language [https://drive.google.com/file/d/1RotVm5i9xgDN94vK47tBUwHFiOGVAmLv/view?usp=sharing li..."
20:53:01 -!- sprock has quit (Ping timeout: 252 seconds).
20:56:20 -!- Lord_of_Life_ has joined.
20:57:23 -!- Lord_of_Life has quit (Ping timeout: 252 seconds).
20:59:00 -!- Lord_of_Life_ has changed nick to Lord_of_Life.
20:59:28 -!- Everything has quit (Quit: leaving).
21:30:02 <fizzie> Bleh. I use this one place's free secondary DNS service, and they've been pretty laggy in the past in terms of responding to notifys to refresh the zone, so I've had to bump up the ACME dns-01 verification delay all the way to 300 seconds. But now they've gone and started to take like half an hour. So the certbot update of esolangs.org fails. :/
21:30:13 <fizzie> Maybe I need to do that thing where I delegate the `_acme-challenge` names as a subdomain that's only hosted by the primary nameserver. Since if it's down, it's not like the dynamic update would work anyway.
21:30:24 <fizzie> Or else I should've just used the HTTP-based challenge like normal people.
21:39:40 -!- sprock has joined.
21:45:01 -!- riv has quit (Quit: Leaving).
22:17:13 -!- chiselfuse has joined.
22:18:47 -!- arseniiv has quit (Ping timeout: 252 seconds).
22:19:43 -!- lambdabot has quit (Remote host closed the connection).
22:20:39 -!- lambdabot has joined.
←2021-09-07 2021-09-08 2021-09-09→ ↑2021 ↑all