We have unveiled a new webpage that tells the story of GraphScope, charting its course from inception to its future prospects in the field of graph computing. The page explores how Alibaba Cloud has developed GraphScope from its early stages, overcoming challenges and achieving significant milestones to become a crucial tool in graph data processing. This journey highlights not only technological advancements but also practical lessons in scaling, performance, and adaptability.
The pipedream-2bw paper[1] and the zero-offload paper[2] both show that 1-step delayed asynchronous gradient update doesn't affect the convergence (and perplexity) while improve the training efficiency (by fully utilize the bubbles in pipeline parallelism) at a large margin.
However, both the Megatron-LM[3] and the DeepSpeed[4] don't use pipedream-2bw scheduling. Could anyone share me some insights or ideas about why such an efficient scheduling scheme doesn't get popular in the LLM pretraining community? Does it suffer convergence/accuracy issue in practice? Or are there any other concerns that blocking it become the default / most popular pipeline parallelism scheduling?
Optimizing distributed sampling and feature lookup looks really attractive. It's really challenging to deploy GNN training at an industrial-scale for a large graph.
Will GLT be part of graphscope[1] and replacing the current graphscope-for-learning implementation?
On March 30th, the Transaction Processing Performance Council (TPC) released its latest rankings: Tencent Cloud's TDSQL set a new record with a performance of 814 million tpmC and a cost of 1.27 RMB per tpmC. This achievement placed TDSQL at the top spot for both tpmC performance and cost-effectiveness in the world.
More language bindings, e.g., Python and Rust, will be added later. Don't hesitate to open issues for feature request if you are struggling with handle various different I/O and its dependency hell trouble and find this library might be helpful in your application!
Allowing "zero changes" state for a merge request does make sense, as during the development workflow developers may doing some rebase & force push work and they may forget commit changes after reset to main and before push to the pull requests.
Closing the "zero changes" merge request have confused in my previous experience when I continue to push re-added commits to the original pull request branch.
From Git's perspective, the two "forks" share the same HEAD commit, e.g. they are merged. That is what a merge does. A branch is a name that exists independently of the commit and points to it, so the commit history is not actually a property of the branch, despite what Git UIs tell you.
Thus, in the underlying data structure behind the GitHub interface, there really isn't an "event" here to identify. The PR branch points to the same commit as the base branch, therefore both branches are in the state of existing as "merged" with one another.
So GitHub would have to track changes to the PR branch that result in this state separately from the existing "merge from GUI" and "push PR branch to master" changes, which I could imagine is fraught with edge cases that could result in what you consider a merge event ending up as a closed event.
We recently noticed some strange email notifications from Github with content "Merged #948 into main.", which told that pull requests on our repo has been merged by stranger who actually doesn't have the write permission to our repo!
After further inspection, we found that the merge event is triggered by the creator of the pull request pushing current main commits to the PR's "from" branch.
Moreover, when pushing current main to a pull request,
- The pull request is displayed as "Merged"
- A "PR merged into main" email is sent to all subscribers (mainly the repo owners)
- A "PR merged" contribution is displayed on the creator's Github profile
Closing dangling pull requests is a quite resonable design, but mark it as "Merged" rather than "Closed" would confuse people to let them think they are hacked at the first glance (note that there even an email notification "Merged #xxxx into main" to repo's owners).
If such a feature is misused, it may lead to chaos to more open source repos in the futuer, especially those famous ones.
This does strike me as kind of weird but in the same way it's weird that github lets you author commits with other people's email address and it shows up with a reference to their github account. A lot of github has very strange UI edge cases which come from the fact their features are a leaky abstraction over core git operations. Since the PR diff shows no changes were made and the merged commit is one that's authored by an actual contributor, I'm not sure it's as much a security vulnerability as a curiosity.
Git and Github both allow you to put whatever email you want. If you care about being certain who is committing to your repo, you should ignore the email and only look at the commit signature.
If you're talking about the issue OP is discussing, it should still be possible even if it's a signed commit. 61f3741 is a signed commit in the linked PR.
This just re-uses existing commits on the repository. The commits can be signed and github will still show "merged by X" if neither X nor the author of the signed commit merged the PR.
So really it's "if you care about being certain who is committing to your repo, you should ignore who github says is committing to your repo", which, to my earlier point, is technically understandable when you dig into it but nonetheless a little weird from a UX perspective.
If you're talking about forging the commit author, that's also weird. It makes sense in the decentralized context of git, but not in how most people use github. Nobody is saying that it isn't allowed, but the fact that github allows it is really an artifact of the fact that git allows it. In the github web app, your account is email verified, so it's weird that someone can generate commits which (in the UI) link to your email verified github account that were not actually created by you. Most people don't expect webapps to work this way, even if git might. It'd be similarly weird if facebook allowed people to create posts on your behalf and we told users "oh that's not weird, you should really verify the GPG key of your posts".
GitHub ties "things" to a commit hash - which causes all sorts of interesting/unwanted behaviors.
* create an action that only runs on branches with a specific pattern
* create another action that only runs on branches with a different pattern
* push a commit to a branch with the first pattern, create a PR
* create a new branch with the second pattern
* push the same commit on the second branch
* watch things get confused on the PR that was originally created
I had seen similar things, but slightly different: a user forked our repo (with open PRs), then merged one of the existing PRs from their fork into their fork. GitHub "helpfully" sent mail to the original PR author about it, with the link pointing to the fork (so it shows up (correctly) as _merged_ — in the fork).
> 1) Is it a real problem that allows anyone to push into any repository?
It is not a real problem that allows anyone to push into any repo, but is a real problem that shows a incorrect/unexpected "Merged" pull request status on any repository.
> 2) Is it just a very confusing message?
From the links in paste in the comments you could see that github shows the unauthorized user "merged" the pull request into main, and, the repo's owner received a email says:
FROM: XXXXX
Content: Merged #xxx into main.
It is exactly the SAME as email notification of a normal authorized merge event.
Basically, aside from github's merge button (which does magic inaccessible to mere mortal[0]) the signal github looks for to know whether a PR is merged is whether the PR's head commit is in the target branch.
So if you reset the PR's branch to the target (or any of its commit), as far as github is concerned it's as if the PR had been merged.
[0] the ability to close PRs as merged was requested 3 years ago on the old discourse forum, which was deleted when github deployed the new community thingie, the request was reposted on the new site https://github.com/community/community/discussions/12437
+1 for the feature of marking as merged, as in many projects, e.g., apache arrow, the pull requests are merged in a different way and all pull requests are showed as "closed" rather than "merged".
It would cause some confusion for project management, I guess.
Only if you get something actually accepted on a big OSS project can you make it look like someone else did it. You can't just magically get stuff committed to Linux or Chrome or whatever, that project would need to accept the PR. You could make people look bad/weird by making and accepting PRs for your own project though. But I think most people seem aware of this nature of unsigned git commits having any email address that the author wants
This issue is specifically that you can make github think you merged a PR to a project for which you do not have merge permission. Your PR would be empty but github would show that you were a contributor (potentially anyway, I don't know it's been verified it gives you contributor status).
EDIT: Since bifenglin didn't get marked as a contributor in the second example from GP above I wonder if it's not actually possible. It could require a commit physically in the repo with an email linked to your account.
You don't need to get accept to "merge" a PR. When pushing the main branch to your PR, the PR would get closed as "Merged" and there will be a "Merged" contribution badage showed in your github profile.
GraphScope is a one-stop graph computing systems from Alibaba aimed to address challenges in large-scale graph computation in real production environments. GraphScope releases v0.9, enabling data scientists to develop graph computing workflows for analytical, interactive query and GNN workloads on small graphs in jupyter notebooks in a interactive manner. Once finishing the development and debugging, users can easily deployed their workflows to Kubernetes with one-line change!
To try GraphScope, you could find it on Colab[1], Jupyter Hub[2], or install GraphScope to your environment using pip by:
Vineyard, an open-source immutable in-memory data-manager that is designed for optimize the data exchange between tasks inside a bigdata analytical workflow, now supports airflow and can serve as the data backend for efficient data sharing between tasks.