GitHub's supply code leaked on GitHub final evening … sort of
The source code leak quickly disappeared from GitHub itself – and didn't stay on web.archive.org long after that.
Yesterday evening developer and privacy activist Resynth1943 announced that GitHub's source code on GitHub itself leaked into GitHub's own DMCA repository. It'll take some unpacking to talk about, but first things first – this isn't as big a deal as it may seem.
GitHub Enterprise Server! = GitHub.com
Shortly after Resynth1943 – who apparently broke the news and described the code as "just leaked" by an unknown person – re-shared the announcement on Hacker News, GitHub CEO Nat Friedman appeared on HN to provide context.
According to Friedman, the upload in question was actually GitHub Enterprise Server, not the GitHub website itself. While the two share a significant amount of code in common, the difference is significant. Part of that meaning is that GitHub itself wasn't actually hacked.
While neither GitHub nor GitHub Enterprise Server are open source code, GitHub Enterprise Server source code is routinely delivered to customers, but usually in a stripped-down and disguised format. According to Friedman, a few months ago GitHub accidentally delivered a full and unveiled tarball from GHES to some customers. This is the code that was copied into GitHub's public DMCA repository. advertising
Sharpening a DMCA related ax
It is likely that the "unknown person" referred to by Resynth1943 uploaded the leaked source code largely out of anger at the recent shutdown of Youtube-dl.
The code itself has been copied to GitHub's DMCA repository, which serves as a history of the DMCA deactivation requests that GitHub received upon receipt, similar to the Chilling Effects notifications you may have seen on Google searches over the years.
Inspired by Lumen (formerly Chilling Effects) and Google, this repo contains the text of DMCA deactivation and counter notices that we received here on GitHub. We publish them upon receipt, whereby only personal data is edited.
The Resynth1943 announcement also criticizes Microsoft as hypocritical for not deliberately opening the GitHub source, and suggests that after the code leaked, it may be less secure.
How do I shoot fake commit?
The commit itself has been tagged as apparently by user Nat – aka Nat Friedman, the current CEO of GitHub. Similar to the content of the commit, this is misleading – Git itself, the source code versioning system on which GitHub is based, does not provide much protection against user impersonation. The commit in question was not marked as "verified" which means it was not signed with Friedman's GPG key.
With Git commits, similar to e-mail messages, users can enter any information in the user.name and user.email fields. This makes spoofing this information trivial. Unless the commit is actually signed with a GPG key associated with that email address, there is no real confirmation that it originated from where it is specified.
This leaves the problem of how a random user commit shows up in GitHub's DMCA repository in the first place – but the answer there doesn't involve any actual account compromises either.
Moving a commit to a Git repository gives you a hash that represents that commit and can be used to look up the tree. GitHub – part of which is the web application that provides access to this underlying Git structure in the browser – stores all of the forks of a Git repository in a single underlying repository, although it generally doesn't show that way in the URL structure.
Use the forks, Luke
To create the illusion that GitHub CEO Nat Friedman committed to the GitHub DMCA repo, the unknown person first had to clone the DMCA repository. After branching out to the repository – making a copy to which they could set permissions – the next step was to commit the leaked source and forge Friedman's name and email address in user.name and user.email.
This would result in a forked repository with the wrong commit. But it still wouldn't have looked quite right – the URL would still point to both the fork and the real GitHub username and account of the attacker. Under the hood, however, both parent and fork are part of the same repository at the underlying Git level. This enabled the attacker to create a URL that would appear to have committed in the main repository rather than the fork.
To complete the deception, the attacker started with https://github.com/github/dmca and then added tree / $ hash to the end, where $ hash was the hash of the commit made on his own fork – and presto! The result was a URL used by CEO Nat Friedman as a commit to GitHub's own DMCA repository.
GitHub wasn't "hacked" – but there is plenty of room for improvement
On the plus side, there is no real compromise here. The source code, if accidentally passed on freely to customers, was not filtered out by a compromised server. Likewise, Friedman didn't lose control of his own account, and GitHub didn't lose control of its DMCA repository. In Friedman's own rather funky words in the Hacker News: "Everything is fine, the situation is normal, the lark is on the wing, the snail is on the thorn and everything is fine with the world."
While all of the gimmicks documented here are within expectations – if you want to verify your identity you should sign your commits with a GPG key – those expectations themselves may be much lower than they should be. Managing GPG is still onerous enough to present a significant barrier to entry for many developers. More importantly, GitHub doesn't offer any controls to highlight the presence or absence of such signatures. advertising
We saw many suggestions for tooltips such as: B. "This user normally signs their commits and this commit is unsigned," where appropriate. We also believe that the time has come to address the issue that is allowing an attacker to spoof the repository for which they have committed to using the fork and manual URL creation technique described above.
Finally, it is probably time to seriously discuss whether unsigned commits should even be a standard. We live in a world where even simple web surfing with authentication and encryption is expected – which makes the kind of occasional spoofing all the more surprising and unsettling.