Hacker Newsnew | past | comments | ask | show | jobs | submit | exebook's commentslogin

Tokenization can be bypassed like this; (mistral large 2)

How many letters R are in the word "s-t-r-a-w-b-e-r-r-y"?

The word "s-t-r-a-w-b-e-r-r-y" contains three instances of the letter "R."

How many letters R contain the word strawberry?

The word "strawberry" contains two instances of the letter "R."


I'd guess these poor teenagers were mentally sick of something else before the social networks thing came to be. And they will be sick of another thing after social networks get replaced.

Important thing is that we have developed rules that parents teach to kids in regards to many things, but we've developed those rules for centuries, now with internet we need generations before kids will be ready to dangers of internet as good as they are prepared for the dangers of the street for example. Damn we still have bullying in schools, because schools are around just recently, like 100+ years, we have not adapted yet.


Sounds like a cross over of The Age of Supernova and Tri-body Problem. In the former Supernova radiation kills all people on earth except kids under 13.


I made my own proportional font Variable.ttf and refined it for over a decade. I am using it daily. Over many iterations I refined space width and each important coding character's shape and size. Added some ligatures in the last few years. It's very good I consider it a work of art, and I cannot think about going back to fixed fonts. Only me and my friend use it.

https://github.com/exebook/variable


When you say community, where exactly you find people who can help with wezterm?


And especially strings, which with all those escape codes are small programming languages in it's own right.


It is also about performance.

I've watched a video recently from a guy who compiled entire Linux kernel in under 1 second I believe, he used TinyC and also noticed that something like 90% of compilation is tokenization of headers that are included many times, like there are headers that are included thousands of times in almost every C file, so he ended up caching tokens. So a big reason to have a separate tokenizer is that tokenization is a simpler task and can be optimized with all low level approaches, like perfect hashes, crafted nested switch/if tries, branchless algs, compiler intrinsics etc.

The good tokenizer is about as fast as a speed of writing the output array of records into memory. Which means it is important to choose right memory layout for your tokenized data so that when parser reads tokens it has as little cache misses and indirect memory access as possible. Tokenization can be thought of as a sort of in memory compression.


Excellent summary. A couple other reasons for a separate tokenizer:

1. Sometimes all you need is a tokenizer - such as for highlighting in a code editor

2. D has a construct called a token string - where a string literal consists of tokens

3. A separate tokenizer means the lexer and parser can run in separate threads


Maybe first thing super civilisation would do is to upgrade their bodies, I'd first thing get rid of 50-100 kg of useless meat, bones and blood, I guess then the entire consciousness, memory, emotional sphere can be fit in something size of a microchip die, next step might be finding a way to base consciousness on something like quarks, that would solve transportation, and each person would need tiniest amount of energy. Then I guess they just gather together and form a star. They normal day in life would be our microsecond. So possibly every star in the universe is already a super advanced civilisation, they just do not care to contact us, who would want to contact someone who is billion times larger, slower and depend on clumsy molecules.


Something like this indeed; imagine us in 1000 years (if we don’t blow ourselves up or something before that); we would have brain up/download, the tinniest and energy efficient chip-equivalent that can run your brain and photorealistic VR.

Even with those simple extrapolations, it’s easy to see that we can stick a few billion ‘humans’ (brains-on-chip) in a solar powered spaceship and just let it hang there for eternity or it could do a journey to somewhere. But what’s the point of a journey as inside this relatively tiny ship, these brains will live in a world (probably a clone of older earth) with a universe around it they can explore.

This seems all rather feasible given time and we would not seem very advanced from the outside; you would hardly be able to detect us at all and yet 100s of billions people live in perpetual paradise (or hell, but again; why would you make it bad if you don’t have to).


Exploration seems like a natural way for civilisation to live, but that's because exploration is needed to expand knowledge and grow more intelligent, we as a civilisation mostly became obsessed with intellect dozens generation ago, if not less, and gathering knowledge and exploring for us seems so important right now. It might be less important if important at all for species who have been intelligent for a long time and gathered orders of magnitude more knowledge than we do.

Related question is do species really need intellect at all and how much. It could be that the main purpose of intellect is to increase chances to survive, but what if civilisation already figured survival, will they even need to stay intelligent? Maybe their goal is to be happier or maybe have billion orgasms per second.

Another more obvious reason to explore the universe is lack of resources, at least Earth was mostly explored to be exploited. If resources are solved, one reason to explore less.


> Exploration seems like a natural way for civilisation to live, but that's because exploration is needed to expand knowledge and grow more intelligent, we as a civilisation mostly became obsessed with intellect dozens generation ago, if not less, and gathering knowledge and exploring for us seems so important right now. It might be less important if important at all for species who have been intelligent for a long time and gathered orders of magnitude more knowledge than we do.

Most of the financial impetus for exploration comes from the promise of access to scare resources. But if you have the technology to traverse the stars then your species is either able to harness unfathomable amounts of energy such that you can bend space-time or you've done some kind of wild genetic engineering/cybernetic enhancements to the point where it's not really bound by the limitations of whatever biology its planetary evolution set it up with.

In both cases, they would have the means to functionally be living in a post-scarcity society. Either their consciousness is stored in some kind of long-term solid-state storage that can survive millennia long trips through space or energy is so cheap and available that it's hard to imagine them needing to keep going and searching to find more stuff. After you've explored and catalogued a hundred planets it's hard to imagine any real impetus to keep going.

Human civilization is already projected to cap out in population at around 11 Billion, and that's driven primarily by cultural and economic factors deriving from technology around access to healthcare, education, and other sources of diversion/entertainment/fulfillment. The idea that an advanced civilization must necessarily keep growing and growing in size doesn't seem to hold up for our own experience on Earth.


Or, of course, if we manage to make infinite energy on earth, we don't even have to shoot anything into space; we can simply stack the earth full with these brainframes and stick it out until the sun starts to fail. Besides we ending humanity too early to get there, I find it hard to believe this will not, inevitably, happen.


>Then I guess they just gather together and form a star. >So possibly every star in the universe is already a super advanced civilisation, they just do not care to contact us, who would want to contact someone who is billion times larger, slower and depend on clumsy molecules.

The Caleban have their own reasons for contacting organic sentient species. But be really careful about getting involved in any contracts with them.


no need to delete the flesh, just make living backups. Monitor them for "dangerous" thought like we do with LLM. Can run them in various environments, have them interact with "aliens" etc Most interesting would be the amount of variety in isolation, you know, the large mostly empty galaxy with everything impossibly far away. They can check out 1 or 2 planets and some moons to prevent excessive boredom in the later stages, if they are some how late extincting themselves. You get that kind of outliers if you run a lot of trials.


> Monitor them for "dangerous" thought like we do with LLM

We do that?


Not very well, given the existence of "DAN" and similar prompt hacks, but yes.


Occam's razor. Climate change is more likely to do us in.


This is an incredible thought experiment. Is there any fictional basis for your thoughts ? I would love to explore fictional material in this area.


Are you an author?

I have read the sci-fi short novel when I was a kid, and that was 80ies. The future earth have found signs of life on some distant planet and the science ship was sent to investigate, they got there but could not find anything on the empty rocky planet, the somewhat detective story leads them to realize the life is on the star of that system, not on the planet, and intelligent creatures are plasma based life forms. Unfortunately I barely remember this story, and cannot remember the name or author, I've read hundreds of sci-fi back then.

I also read some random popular science post about subatomic processes in cells, it gave some examples of which I recall one that fascinated me, it appears that there is a gate in a cell that opens or closes by a single electron to let a water molecule in or out I guess. This ignited my fantasy, what if there are more subatomic stuff going on in out bodies? How far it can go in reality and in fiction? What if our memory is subatomic? Could we build something on subatomic level, a logic gate for example? Can a life form be fully subatomic?

Also there is a galaxy on the Orion's belt, and the Orion is a cat, they briefly and somewhat humorously explored the idea in Men in Black which I watched many times.

There is quantum realm in the Ant-Man series. If it existed I guess super civilisation would try to go there or stay there to be more energy efficient.

There is also dr Manhattan, whose blue body I believe is photons. Imagine he was not the only one in that experiment, and our entire human race followed him, what would our civilisation be? No buildings, no machines, no crops, no transportation, would we even stay on the planet? What we would do freed from servicing our fragile bodies most of our time? I guess we would just gather together, have fun, party, make new life, teach each other endlessly. So maybe that's what stars are, each just have trillion of dr Manhattan offsprings and that's why we cannot see any signs of life cause we're looking for meatballs.


No I’m not. I have always been curious on the origin of these ideas.


Take a look on Bobiverse series. Not quite the same, but similar.


Yes, it's called Psilocybin


I guess while Python interprets this branchless code it still can do some branching? Or am I missing something here?


There are some provisos to the “Branchless” tag that are covered in the article I drew inspiration from. One of which is that a “CMOVE” is not technically a branch :)


You have to do some really weird shit if you want generalised branchless ternary statements.

    begin += (arr[step+begin] < value)?step:0;
Something like:

    int mask = ((arr[step + begin] - value) >> 31);    //Depends on signed shift
    begin += (step & mask) | (0 & ~mask);
(Obviously in this case, it's simplifiable.)


I was considering doing it all in a ternary statement, but I feel that the current form is also branchless because it is simply a multiply and add. The extra bounds-checking condition can probably be omitted, but I haven't tested that.

  for (step >>= 1; step != 0; step >>=1) {
          if ((next = begin + step) < size) {
              begin += PyObject_RichCompareBool(PyList_GetItem(list_obj, next),     value, Py_LT) * step;
          }
      }


My point was, anywhere there's a hidden 'if' can be branching.

If there's no calculation being done, it'll simplify.

    value = (test) ? const0 : const1;
But if calculations are being done, it won't.

    value = (test) ? calc0() : calc1();
If you want non-branching where the ternary options are calculated, you need to calculate both.

This matters most with SIMD operations.

Look at section 2.5.1 (Branch-Equivalent SIMD Processing)

http://ftp.cvut.cz/kernel/people/geoff/cell/ps3-linux-docs/C...


Ah, yeah I see what you mean. If I'm understanding you correctly, the fact that we are calling the Python interpreter internal functions during that calculation makes it branch because it is not pre-calculated?


Pretty much (at least as far as I understand it).

There's probably something at the instruction level which allows the constant ternary expressions to be non-branching.


Except that arr[...] - value may overflow, and is UB in C.

Even removing the UB with something like __builtin_sub_overflow(), if arr[...] is INT_MAX and value is -2, then the difference will have the high bit set!

I haven't tried it, but this should compile to a cmove or seta, not a jump:

  int cond = arr[step+begin] < value;
  begin += step & -cond;


My bad, but still worth mentioning the generalised case:

    int mask = -(arr[step+begin] < value);
    int value = (val0 & mask)|(val1 & ~mask);


Do you want a bug report? The lowest row doesn't fit into width of my phone's Chrome. I can only see few millimeters of the rightmost and leftmost cubies. Pinch zoom seems disabled so I cannot adjust. Tried switching to PC mode, but everything became too tiny. Rotating the phone also doesn't help. So for me it's unplayable.


Thank you, I'm aware that it sometimes looks off on certain Android phones, however I haven't had a case where it is unplayable. Could you tell me your phone model, what size/resolution is the screen?

I can't promise a speedy resolution since I don't have an Android but I'll definitely look into it when I get a chance. You can share your details here, send them to me by email or open an issue in the repo.


Huawei y8p, Android 10, 2400x1080, Chromium 88. Hope this helps.


posted here because I couldn't find the repository. I was looking here https://github.com/memechain


android SDK includes software emulation of phones


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: