More

sighingnow · on July 31, 2024

We have unveiled a new webpage that tells the story of GraphScope, charting its course from inception to its future prospects in the field of graph computing. The page explores how Alibaba Cloud has developed GraphScope from its early stages, overcoming challenges and achieving significant milestones to become a crucial tool in graph data processing. This journey highlights not only technological advancements but also practical lessons in scaling, performance, and adaptability.

sighingnow · on Oct 10, 2023

The pipedream-2bw paper[1] and the zero-offload paper[2] both show that 1-step delayed asynchronous gradient update doesn't affect the convergence (and perplexity) while improve the training efficiency (by fully utilize the bubbles in pipeline parallelism) at a large margin.

However, both the Megatron-LM[3] and the DeepSpeed[4] don't use pipedream-2bw scheduling. Could anyone share me some insights or ideas about why such an efficient scheduling scheme doesn't get popular in the LLM pretraining community? Does it suffer convergence/accuracy issue in practice? Or are there any other concerns that blocking it become the default / most popular pipeline parallelism scheduling?

[1]: https://arxiv.org/abs/2006.09503

[2]: https://arxiv.org/abs/2101.06840

[3]: https://github.com/nvidia/Megatron-LM

[4]: https://github.com/microsoft/DeepSpeed/issues/1110

sighingnow · on May 16, 2023

Optimizing distributed sampling and feature lookup looks really attractive. It's really challenging to deploy GNN training at an industrial-scale for a large graph.

Will GLT be part of graphscope[1] and replacing the current graphscope-for-learning implementation?

[1]: https://github.com/alibaba/GraphScope

sighingnow · on April 3, 2023

On March 30th, the Transaction Processing Performance Council (TPC) released its latest rankings: Tencent Cloud's TDSQL set a new record with a performance of 814 million tpmC and a cost of 1.27 RMB per tpmC. This achievement placed TDSQL at the top spot for both tpmC performance and cost-effectiveness in the world.

sighingnow · on April 2, 2023

More language bindings, e.g., Python and Rust, will be added later. Don't hesitate to open issues for feature request if you are struggling with handle various different I/O and its dependency hell trouble and find this library might be helpful in your application!

sighingnow · on Aug 26, 2022

Allowing "zero changes" state for a merge request does make sense, as during the development workflow developers may doing some rebase & force push work and they may forget commit changes after reset to main and before push to the pull requests.

Closing the "zero changes" merge request have confused in my previous experience when I continue to push re-added commits to the original pull request branch.

sighingnow · on Aug 26, 2022

The "zero-commit-merges" should be treated as a "Closed" event better than being treated as a "Merged" event.

tadfisher · on Aug 26, 2022

From Git's perspective, the two "forks" share the same HEAD commit, e.g. they are merged. That is what a merge does. A branch is a name that exists independently of the commit and points to it, so the commit history is not actually a property of the branch, despite what Git UIs tell you.

Thus, in the underlying data structure behind the GitHub interface, there really isn't an "event" here to identify. The PR branch points to the same commit as the base branch, therefore both branches are in the state of existing as "merged" with one another.

So GitHub would have to track changes to the PR branch that result in this state separately from the existing "merge from GUI" and "push PR branch to master" changes, which I could imagine is fraught with edge cases that could result in what you consider a merge event ending up as a closed event.

sighingnow · on Aug 25, 2022

We recently noticed some strange email notifications from Github with content "Merged #948 into main.", which told that pull requests on our repo has been merged by stranger who actually doesn't have the write permission to our repo!

After further inspection, we found that the merge event is triggered by the creator of the pull request pushing current main commits to the PR's "from" branch.

Moreover, when pushing current main to a pull request,

- The pull request is displayed as "Merged"

- A "PR merged into main" email is sent to all subscribers (mainly the repo owners)

- A "PR merged" contribution is displayed on the creator's Github profile

Closing dangling pull requests is a quite resonable design, but mark it as "Merged" rather than "Closed" would confuse people to let them think they are hacked at the first glance (note that there even an email notification "Merged #xxxx into main" to repo's owners).

If such a feature is misused, it may lead to chaos to more open source repos in the futuer, especially those famous ones.

See some example links in

- https://github.com/v6d-io/v6d/pull/948

- https://github.com/alibaba/GraphScope/pull/1931

chrisfosterelli · on Aug 25, 2022

This does strike me as kind of weird but in the same way it's weird that github lets you author commits with other people's email address and it shows up with a reference to their github account. A lot of github has very strange UI edge cases which come from the fact their features are a leaky abstraction over core git operations. Since the PR diff shows no changes were made and the merged commit is one that's authored by an actual contributor, I'm not sure it's as much a security vulnerability as a curiosity.

aliqot · on Aug 25, 2022

Git allows this. Github is just doing what it's told with the data it has. If you don't like this, ignore unsigned commits.

majewsky · on Aug 25, 2022

Unsigned commits won't help in this specific case though, no?

henryfjordan · on Aug 25, 2022

Git and Github both allow you to put whatever email you want. If you care about being certain who is committing to your repo, you should ignore the email and only look at the commit signature.

chrisfosterelli · on Aug 25, 2022

If you're talking about the issue OP is discussing, it should still be possible even if it's a signed commit. 61f3741 is a signed commit in the linked PR.

This just re-uses existing commits on the repository. The commits can be signed and github will still show "merged by X" if neither X nor the author of the signed commit merged the PR.

So really it's "if you care about being certain who is committing to your repo, you should ignore who github says is committing to your repo", which, to my earlier point, is technically understandable when you dig into it but nonetheless a little weird from a UX perspective.

If you're talking about forging the commit author, that's also weird. It makes sense in the decentralized context of git, but not in how most people use github. Nobody is saying that it isn't allowed, but the fact that github allows it is really an artifact of the fact that git allows it. In the github web app, your account is email verified, so it's weird that someone can generate commits which (in the UI) link to your email verified github account that were not actually created by you. Most people don't expect webapps to work this way, even if git might. It'd be similarly weird if facebook allowed people to create posts on your behalf and we told users "oh that's not weird, you should really verify the GPG key of your posts".

sighingnow · on Aug 26, 2022

The concern here is not about who "commit" this, the concern is who "click the merge button" in this case.

judge2020 · on Aug 25, 2022

https://news.ycombinator.com/item?id=10005577

jerrysievert · on Aug 25, 2022

GitHub ties "things" to a commit hash - which causes all sorts of interesting/unwanted behaviors.

    * create an action that only runs on branches with a specific pattern
    * create another action that only runs on branches with a different pattern
    * push a commit to a branch with the first pattern, create a PR
    * create a new branch with the second pattern
    * push the same commit on the second branch
    * watch things get confused on the PR that was originally created

mook · on Aug 25, 2022

I had seen similar things, but slightly different: a user forked our repo (with open PRs), then merged one of the existing PRs from their fork into their fork. GitHub "helpfully" sent mail to the original PR author about it, with the link pointing to the fork (so it shows up (correctly) as _merged_ — in the fork).

gus_massa · on Aug 25, 2022

Just to be 100% sure:

1) Is it a real problem that allows anyone to push into any repository?

2) Is it just a very confusing message?

sighingnow · on Aug 25, 2022

> 1) Is it a real problem that allows anyone to push into any repository?

It is not a real problem that allows anyone to push into any repo, but is a real problem that shows a incorrect/unexpected "Merged" pull request status on any repository.

> 2) Is it just a very confusing message?

From the links in paste in the comments you could see that github shows the unauthorized user "merged" the pull request into main, and, the repo's owner received a email says:

FROM: XXXXX Content: Merged #xxx into main.

It is exactly the SAME as email notification of a normal authorized merge event.

masklinn · on Aug 25, 2022

It's the latter.

Basically, aside from github's merge button (which does magic inaccessible to mere mortal[0]) the signal github looks for to know whether a PR is merged is whether the PR's head commit is in the target branch.

So if you reset the PR's branch to the target (or any of its commit), as far as github is concerned it's as if the PR had been merged.

[0] the ability to close PRs as merged was requested 3 years ago on the old discourse forum, which was deleted when github deployed the new community thingie, the request was reposted on the new site https://github.com/community/community/discussions/12437

sighingnow · on Aug 26, 2022

+1 for the feature of marking as merged, as in many projects, e.g., apache arrow, the pull requests are merged in a different way and all pull requests are showed as "closed" rather than "merged".

It would cause some confusion for project management, I guess.

stonemetal12 · on Aug 25, 2022

So the whole Github is my resume can be gamed to look like I have contributed to high profile open source projects?

gleenn · on Aug 25, 2022

Only if you get something actually accepted on a big OSS project can you make it look like someone else did it. You can't just magically get stuff committed to Linux or Chrome or whatever, that project would need to accept the PR. You could make people look bad/weird by making and accepting PRs for your own project though. But I think most people seem aware of this nature of unsigned git commits having any email address that the author wants

chrisfosterelli · on Aug 25, 2022

This issue is specifically that you can make github think you merged a PR to a project for which you do not have merge permission. Your PR would be empty but github would show that you were a contributor (potentially anyway, I don't know it's been verified it gives you contributor status).

EDIT: Since bifenglin didn't get marked as a contributor in the second example from GP above I wonder if it's not actually possible. It could require a commit physically in the repo with an email linked to your account.

sighingnow · on Aug 26, 2022

You don't need to get accept to "merge" a PR. When pushing the main branch to your PR, the PR would get closed as "Merged" and there will be a "Merged" contribution badage showed in your github profile.

judge2020 · on Aug 25, 2022

https://github.com/gelstudios/gitfiti

sighingnow · on Dec 6, 2021

We are glad to announce the landing of GraphScope on Colab: https://colab.research.google.com/github/alibaba/GraphScope.

GraphScope is a one-stop graph computing systems from Alibaba aimed to address challenges in large-scale graph computation in real production environments. GraphScope releases v0.9, enabling data scientists to develop graph computing workflows for analytical, interactive query and GNN workloads on small graphs in jupyter notebooks in a interactive manner. Once finishing the development and debugging, users can easily deployed their workflows to Kubernetes with one-line change!

To try GraphScope, you could find it on Colab[1], Jupyter Hub[2], or install GraphScope to your environment using pip by:

pip3 install graphscope

For more details of our v0.9 release, please refer to https://github.com/alibaba/GraphScope/releases/tag/v0.9.0

[1]: https://colab.research.google.com/github/alibaba/GraphScope/...

[2]: https://try.graphscope.app/

sighingnow · on Nov 15, 2021

Vineyard, an open-source immutable in-memory data-manager that is designed for optimize the data exchange between tasks inside a bigdata analytical workflow, now supports airflow and can serve as the data backend for efficient data sharing between tasks.

See how vineyard helps for ETL jobs that large volume of complex data need to be shared between tasks: https://registry.astronomer.io/dags/v6d-etl-pandas

There's also a tutorial to get started: https://v6d.io/notes/airflow.html