00:29:07 -!- LKoen has quit (Quit: “It’s only logical. First you learn to talk, then you learn to think. Too bad it’s not the other way round.”).
00:34:27 -!- Melvar has quit (Ping timeout: 256 seconds).
00:51:34 <esowiki> [[Jasp]] M https://esolangs.org/w/index.php?diff=78742&oldid=40623 * PythonshellDebugwindow * (+49) /* External resources */ cats and dogs
00:57:06 -!- Melvar has joined.
01:31:58 -!- Lord_of_Life_ has joined.
01:33:05 -!- Lord_of_Life has quit (Ping timeout: 240 seconds).
01:44:21 -!- FreeFull has quit.
02:10:40 <esowiki> [[LolKek]] M https://esolangs.org/w/index.php?diff=78743&oldid=74774 * PythonshellDebugwindow * (+0) Correct to better match initial wording
05:42:18 <esowiki> [[NDBall]] https://esolangs.org/w/index.php?diff=78744&oldid=78354 * Razetime * (+10) /* Interpriters */
06:52:50 -!- imode has quit (Ping timeout: 256 seconds).
07:10:54 -!- imode has joined.
07:11:16 -!- imode has quit (Client Quit).
07:11:30 -!- imode has joined.
08:50:25 <esowiki> [[NDBall]] https://esolangs.org/w/index.php?diff=78745&oldid=78744 * Aspwil * (+647) /* Instructions */
08:54:05 -!- sprocklem has quit (Ping timeout: 240 seconds).
08:54:16 -!- Sgeo has quit (Read error: Connection reset by peer).
09:07:22 -!- delta23 has quit (Quit: Leaving).
09:19:11 -!- imode has quit (Ping timeout: 256 seconds).
09:29:01 -!- sprocklem has joined.
10:07:03 -!- Arcorann__ has quit (Read error: Connection reset by peer).
10:29:42 -!- TheLie has joined.
10:35:00 -!- Arcorann has joined.
10:36:02 -!- Arcorann has quit (Read error: Connection reset by peer).
10:51:13 -!- Arcorann has joined.
11:04:56 -!- sprocklem has quit (Ping timeout: 240 seconds).
11:06:45 -!- LKoen has joined.
13:22:58 <esowiki> [[Chem]] M https://esolangs.org/w/index.php?diff=78746&oldid=60867 * PythonshellDebugwindow * (+157) /* Hello World #2 */ catd
13:23:04 <esowiki> [[Lisparser]] M https://esolangs.org/w/index.php?diff=78747&oldid=78587 * Hakerh400 * (+1)
13:24:41 -!- TheLie has quit (Remote host closed the connection).
13:26:46 <esowiki> [[Hashes]] M https://esolangs.org/w/index.php?diff=78748&oldid=42652 * PythonshellDebugwindow * (+52) I suppose this is a language
13:34:55 <esowiki> [[~ATH]] M https://esolangs.org/w/index.php?diff=78749&oldid=74934 * PythonshellDebugwindow * (+1) /* Nested loops */ Add semicolon
13:36:12 <esowiki> [[~ATH]] M https://esolangs.org/w/index.php?diff=78750&oldid=78749 * PythonshellDebugwindow * (-1) /* Examples */ remove semicolon
13:40:08 <fizzie> Somehow the name of ~ATH always makes me think of the AT modem hangup command, ATH0 (or +++ATH0 in some contexts), even though I expect there really isn't a c onnection.
13:40:33 <fizzie> (Just like there apparently isn't a connection between the "c" and "onnection" in "c onnection"...)
14:00:34 -!- LKoen has quit (Quit: “It’s only logical. First you learn to talk, then you learn to think. Too bad it’s not the other way round.”).
14:24:45 -!- Arcorann has quit (Ping timeout: 240 seconds).
14:35:09 -!- LKoen has joined.
15:00:04 -!- LKoen has quit (Read error: Connection reset by peer).
15:00:39 -!- LKoen has joined.
15:11:08 -!- Sgeo has joined.
15:41:30 -!- TheLie has joined.
16:46:24 -!- TheLie has quit (Remote host closed the connection).
16:48:22 <b_jonas> you know how like half a decade ago, suddenly everyone and everything called Isis started to have an unfortunate name for none of the fault of the namer, because of recent news? well a lot of things and even some people are named "Corona" or "Korona" and are now suffering the same fate.
16:49:10 <b_jonas> and there's not much we can do about this besides designing all systems to assume that what was a canonical name can become merely a synonym in the future. YES, I'M LOOKING AT YOU, UNICODE CHARACTER DATABASE.
16:57:25 -!- LKoen has quit (Remote host closed the connection).
17:03:04 -!- LKoen has joined.
18:00:04 <zzo38> There are also other kind of corona viruses; the modern kind isn't the only kind.
18:07:09 <b_jonas> Reputedly the same happened to the given name "Adolf" back in the world war
18:07:38 <b_jonas> I just wish something like this would happen to one of the annoying overused names like "Athene"/"Athena" or "Széchenyi".
18:07:45 <b_jonas> I mean if it has to happen at all
18:14:40 <zzo38> Words/names can have other meanings too so they can still be used.
18:16:21 <int-e> fungot: do you prefer your characters singed or unsinged?
18:16:22 <fungot> int-e: madam president, our debate is timely, and mrs jackson's point on amendment 12 which may change relations with turkey, we are not concerned with the serious environmental problems faced by small and medium-sized enterprises, and, finally, mr president of the fnord newspaper. i should like to thank my group, the liberals, myself included, voted for the report on monitoring the application of interim measures are quite str
18:26:58 -!- FreeFull has joined.
18:27:18 <b_jonas> int-e: I prefer them as possibly neither, like in the C standard: eg. I think it's allowed that they compare unsigned but still cause undefined behavior on an unsigned overflow
18:29:41 <b_jonas> one reason why we can deal with this easily is that we can just use unsigned char or signed char everywhere that it matters, because the standard gracefully guarantees that the pointers char *, unsigned char *, signed char *, and their const versions have the same representation, so you can safely cast a char ** to an unsigned char **, not only a char * to an unsigned char *. you can't just cast a char
18:29:47 <b_jonas> *** to an unsigned char *** safely, but at that point you'd be using structs, and pointers to structs also all have the same representation.
18:29:56 <int-e> but that doesn't answer the question
18:30:13 -!- rain1 has quit (Quit: Leaving).
18:30:38 <b_jonas> alas nothing is guaranteed about functions or pointers to functions, you can't even change a pointer argument to const and call it that way.
18:30:59 <int-e> (note the spelling)
18:31:07 <b_jonas> int-e: ok, in that case, I prefer if char is signed by default, but characters are unsigned
18:34:54 <b_jonas> zzo38: is there some way to represent for unicode characters as integers that isn't just the UCS character code, but is optimized for working with utf-8 strings so that you have to do fewer shifts when you put or get a character to/from a string?
18:35:55 <zzo38> I always specify signed char or unsigned char if it matters, but sometimes I only need numbers 0 to 127 or some smaller range, so it doesn't matter.
18:36:18 <b_jonas> I was wondering about this because rust is specified to be able to use any representation the compiler wants for its built-in char type, and it could be something like this. it still needs to be able to convert between the UCS code and its char type, because it has "as" casts that do that.
18:36:36 <b_jonas> no wait, I think "as" casts don't do that
18:37:04 <b_jonas> so it doesn't even need those, except in some library functions that deal with utf-16 or utf-32
18:37:19 <zzo38> b_jonas: If you are limited to UTF-8-G (standard Unicode range), then you might avoid shifting by just putting the four bytes together in a 32-bit field, I suppose
18:37:23 <b_jonas> hmm I'm not sure, let me check
18:38:16 <zzo38> (or UTF-8-M it is called, not UTF-8-G)
18:39:42 <b_jonas> it looks like the rust "as" operator can cast from char to UCS code point... but possibly not backwards? I don't know
18:40:13 <b_jonas> zzo38: something like that, but which way do you align it and what endianness do you interpret the utf-8?
18:40:39 <b_jonas> utf-8 is natively big-endian, but you could want to interpret it as little-endian if that's the native endianness
18:41:53 <fizzie> b_jonas: I don't think it's allowed (in C) for `char` to be unsigned but cause undefined behavior on overflow.
18:41:59 <fizzie> It's implementation-defined whether plain `char` is signed or unsigned, but once that choice is made (and documented, which is a requirement for implementation-defined behavior too), it's required to behave just like any other signed or unsigned type.
18:42:18 -!- Lord_of_Life_ has changed nick to Lord_of_Life.
18:42:23 <zzo38> Either way you will probably have to copy the individual bytes, although how you do that (and what alignment is required) may depend on the computer, I think
18:42:35 <fizzie> C11 6.2.5p15: "The implementation shall define `char` to have the same range, representation, *and behavior* as either `signed char` or `unsigned char`."
18:43:04 <fizzie> (It remains a distinct type, but I think the "and behavior" will also cover behavior on overflow.)
18:47:11 <zzo38> Some programs expect text to be a sequence of Unicode codepoints encoded as UTF-8, even if the text isn't Unicode. This makes it necessary to implement conversion to allow them to be stored in a format which is not invalid UTF-8.
18:48:39 <fizzie> Although it's also true that `char` is not included in the category of /unsigned integer types/, and the overflow rule explicitly says "a computation involving unsigned operands", so maybe there's a little bit of ambiguity of interpretation there on what exactly "same range, representation, and behavior" means. Since it's clearly not all-encompassing, because the types are still distinct at least in
18:48:45 <fizzie> terms of not being compatible types.
18:51:52 <zzo38> (Some programs furthermore expect there to be no null characters, so this is also necessary to work around similarly in some cases.)
18:53:40 -!- imode has joined.
18:56:09 <b_jonas> fizzie: yes, that's what I was confused about, because paragraph 9 there says "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo [...]."
18:56:33 <b_jonas> so that seems like it doesn't apply to char, but then they throw in that "behaves the same" that I didn't notice
18:58:33 <b_jonas> zzo38: yeah. like embedding binary data into xml. it's ugly.
19:08:14 <b_jonas> I rather like python's solution, where it defines an extension of utf-8 that can decode any byte string to a string of "characters", in a reversible way, where 128 of the surrogates count as "characters" and python allows them in most character string operations. you can even encode those character strings to an extension of utf-16 (which can contain those 128 surrogates unpaired) and back and it's
19:08:20 <b_jonas> still round-trip compatible.
19:10:23 <b_jonas> one drawback of this is that if you want to extend this to one of those unofficial utf-8 variants with a larger range of characters possible, then the extended utf-8 encoding will be necessarily slightly incompatible with the extended utf-8 encoding for smaller range of characters. but even this is an incompatibility that isn't too ugly.
19:11:18 <b_jonas> the best use for this is to manipulate character strings internally only, inputting and outputting utf-8 or utf-16 etc only, to gracefully handle inputs that contain strings that are supposed to be utf-8 but aren't.
19:23:51 -!- sprocklem has joined.
19:39:03 -!- sprocklem has quit (Ping timeout: 260 seconds).
19:39:41 -!- sprocklem has joined.
19:48:32 <zzo38> While this is helpful if the text is expected to be Unicode text, I think that it is not a good idea in general; better would be for most things to assume data is a stream of 8-bit characters, and then have a function to reinterpret it as UTF-8 in case that is what it is.
19:52:55 -!- deltaepsilon23 has joined.
19:53:10 -!- deltaepsilon23 has changed nick to delta23.
19:55:33 -!- zzo38 has quit (Ping timeout: 256 seconds).
19:59:14 -!- zzo38 has joined.
20:05:04 -!- TheLie has joined.
20:16:25 <imode> I agree. I'm struggling to figure out whether I should add string literals to my language, or should I force users to work in lists of numbers. and if I add literals, what encoding.
20:18:16 <zzo38> My own suggestion is to add string literals which are just strings of 8-bit characters. Their interpretation is up to the program; some functions may interpret them as UTF-8, but not all will.
20:19:04 <imode> what happens when someone inserts a multibyte character into their source file, then.
20:19:44 <zzo38> Outside of a string literal or comment, it is an error. Inside of a string literal, it is treated as the sequence of bytes that has been entered.
20:20:22 <imode> eh. I'm almost in favor of just forcing you to use numbers instead.
20:49:51 <imode> the problem is that I want it to be expressive. technically nothing stops you from doing something like ( 72 101 108 108 111 44 32 119 111 114 108 100 33 ) "Hello,\sworld!" define !
20:50:23 <imode> but you can't write the literals directly, you'd need to define them as a symbol first. everything is a function that can be executed, even unknown symbols.
20:51:01 <imode> all things are separated by some kind of whitespace, the brackets are actually defined in terms of repeated composition and equality.
20:53:06 <imode> symbols by themselves don't have structure to them, and shouldn't have a structure to them. so strings should "reduce" to something smaller, like quotations of numbers. this forces you to think about encoding: what if something doesn't use UTF-8? you're going to have to do something like the above anyway.
20:58:19 <esowiki> [[Dot]] M https://esolangs.org/w/index.php?diff=78751&oldid=32444 * PythonshellDebugwindow * (+24) /* Step-by-Step example */ feline
21:04:35 <zzo38> I still think that byte strings would work. If you need to, then I suppose you might have a "utf8" command, so that after the string literal you can write utf8 and then it converts into the list of Unicode code points rather than the raw byte values, I suppose.
21:23:36 <imode> it also means extra work for the parser.
21:25:18 <zzo38> Well, yes, it would have to parse string literals, just as much as, it would also have to parse numbers, comments, etc.
21:28:23 <imode> not really. sorry, should've specified: my language is concatenative. the most I do for parsing is split on whitespace.
21:29:47 <zzo38> The other way is like how Forth is doing, I suppose.
21:30:44 <imode> yeah, define parsing words.
21:43:49 <b_jonas> zzo38: yes, it's only useful when the text is expected to be utf-8, but that is common. if you don't know the encoding, then you treat it as a byte string, or as iso-8859-1 encoded if you wish.
21:45:19 <b_jonas> but when I work with inputs and outputs some of which are encoded utf-8 and others are encoded utf-16-le, I need to be able to read input as utf-8 so I can match strings between the two sorts of input and output it as either encoding.
21:46:33 -!- TheLie has quit (Remote host closed the connection).
22:26:27 -!- Lymia has quit (Ping timeout: 260 seconds).
22:27:21 -!- user3456_ has joined.
22:27:36 -!- Lymia has joined.
22:28:12 -!- user3456 has quit (Ping timeout: 260 seconds).
23:14:45 -!- Arcorann has joined.
23:28:45 -!- delta23 has quit (Ping timeout: 240 seconds).
23:52:22 -!- delta23 has joined.