Hacker Newsnew | past | comments | ask | show | jobs | submit | bla3's favoriteslogin

> SCP-2521 (also known as ●●|●●●●●|●●|●) is a Keter-level SCP not currently contained by the SCP Foundation. He is a creature who steals every piece of information about his nature, as long as the information is expressed in textual or verbal form. Because of that, nearly everything about him is registered by ideograms and pictures.

from https://villains.fandom.com/wiki/SCP-2521


You don't need so many layers of stuff (or API keys, signups, or other nonsense).

Llama.cpp (to serve the model) + the Continue VS Code extension are enough.

The rough list of steps to do so are:

  Part A: Install llama.cpp and get it to serve the model:
  --------------------------------------------------------
  1. Install the llama.cpp repo and run make.
  2. Download the relevant model (e.g. wizardcoder-python-34b-v1.0.Q4_K_S.gguf).
  3. Run the llama.cpp server (e.g., ./server -t 8 -m models/wizardcoder-python-34b-v1.0.Q4_K_S.gguf -c 16384 --mlock).
  4. Run the OpenAI like API server [also included in llama.cpp] (e.g., python ./examples/server/api_like_OAI.py).

  Part B: Install Continue and connect it to llama.cpp's OpenAI like API:
  -----------------------------------------------------------------------
  5. Install the Continue extension in VS Code.
  6. In the Continue extension's sidebar, click through the tutorial and then type /config to access the configuration.
  7. In the Continue configuration, add "from continuedev.src.continuedev.libs.llm.ggml import GGML" at the top of the file.
  8. In the Continue configuration, replace lines 57 to 62 (or around) with:

    models=Models(
        default=GGML(
            max_context_length=16384,
            server_url="http://localhost:8081"
        )
    ),

  9. Restart VS Code, and enjoy!
You can access your local coding LLM through the Continue sidebar now.

I figured that it was just for illustration, because the author couldn't think of a better example. Some real-life examples that turn up stupidly often:

1. The model uses click-through data as an input. Your frontend engineer moves the UI element being clicked upon to a different portion of the page for a certain category of results. This changes the baseline click-through rate. The model assumed this feature had a constant baseline across all results, so the new feature value now needs to be scaled to account for the different user behavior. Nobody thinks to do this.

2. The frontend engineer removes a seemingly-wasted HTTP fetch to reduce latency. This fetch was actually being used to calibrate latency across different datacenters, and was a crucial input to a data pipeline to a system of servers (feeding the ML model) that the frontend team didn't control and wasn't aware of.

3. The frontend engineer accidentally triggers a browser bug in IE7 (gimme a break, it was 9 years ago) that prevents clicks from registering when RTL text is mixed with LTR. Click-through rates decline precipitously in Arabic-speaking nations. This is interpreted by an ML model as all results being poorly performing in Arabic countries, so it promptly starts cycling through results, killing ones that had shown up before with no clicks.

4. A fiber cable is cut across the Pacific. This results in high latency for all Chinese users, which makes them abandon their sessions. A ML model interprets this as Chinese people being less interested in the news headlines of that day.

5. A ML model for detecting abusive traffic uses spikes in the volume of searches for any one single query over short periods of time as a signal. Michael Jackson dies. The model flags everyone searching for him as a bot.

6. A ML model for search suggestions uses follow-up queries as a signal. The NYTimes crossword puzzle comes out. Everybody goes down the list of clues and Googles them. Suddenly, [houston baseball player] suggests [bird sound] as related.


This is our long-running experiment in story re-upping. I've described it at https://news.ycombinator.com/item?id=10705926, but it might be time for a fresh explanation.

Moderators and a small number of reviewer users comb the depths of /newest looking for stories that got overlooked but which the community might find interesting. Those go into a second-chance pool from which stories are randomly selected and lobbed onto the bottom part of the front page. This guarantees them a few minutes of attention. If they don't interest the community they soon fall off, but if they do, they get upvoted and stay on the front page.

We want to turn this system into something that's open to all users who want to take time to review stories. We'll make it a form of community service that will be a new way to earn karma. However, it's still an open question how to pull this off without simply recreating the current upvoting system under another guise.

There's one glitch that occasionally confuses people. When the software lobs a story, it displays a rolled-back timestamp—not the original submission time, but a resubmission time relative to other items on the front page. If you see a timestamp inconsistency on HN, this is probably why. Edit: if this is the kind of detail that interests you, see https://news.ycombinator.com/item?id=19774614 for a more recent explanation.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: