Tuesday, September 27, 2005

Analyst: The nature and size of the open source community

I'm doing an analysis of commonalities between successful open source projects (If you can point me to good research on the topic, please email me!!!), and came across an interesting academic paper (Requires purchase) related to the subject. Here are some of its findings (none of which will be surprising to those who have followed Joel West and Siobhan O'Mahony's work, but surprising if you still believe in the magical open source community, numbered in the millions, anxiously waiting to contribute code to your project):

  1. It is interesting to compare horizontal applications (applications used to build other software, the end user is required to program and is, likely, a software professional) with vertical ones (applications used by an end user, no programming is required). Horizontal applications (categories Internet, System, Software development, Communications, Database, Security) account for 72%. The researchers interpret this data as evidence that the OS [open source] community is largely oriented to produce applications for the same community.

    [No news here. The difference only comes when we add commercial open source to the mix. Once we bring in the commercial entities, the only thing bounding open source development is the success of the companies' business models.]

  2. Open source projects [at least, as housed on FreshMeat (the source for the researchers' data set), which tends to host newer and, hence, smaller projects] tend to be small (82% - suitable to one or two developers) and young. 60% of open source projects (as measured in February 2001 - admittedly, an ancient data set) had been in development less than a year, 22% from one to two years, 15% two to three years, and around 2% more than three years.

  3. The GPL license prevails, at 77% of projects. The LGPL is second at 6%, and BSD trails in third at 5%. All other licenses account from 3% to 1%.

  4. C is the most used programming language (41.5%), followed by C++ and Perl (~14% each), then PHP, Java, and Python (5% - 8%).

  5. 49% of projects have only one person developing the application; 15% have two to three developers; 20% have four to 10; 9% have 11 to 20; and 6% have more than 20. Clearly, this calls into question the ideal of "community" in open source. Last time I checked, even with my multiple personalities, I'm not a community.

  6. The researchers assumed that larger projects would have more developers. Wrong. "Instead we find that there is no meaningful increase of size with developers." Apparently, "[fewer] developers produce the same amount of code...." This isn't surprising - I take it as a given that a minority of people in any company/project/etc. will produce a majority of code/product/whatever. The interesting thing to note is that an open source project can be wildly successful without a massive community contributing code to it. The key is code quality and contributor productivity.

  7. Related to the above, 73% of projects have only one stable developer. 10% more projects have two stable developers (defined as a developer with a "prolonged collaboration with the project"). That leaves just 17% of projects that have more than two committed developers.

  8. Added to the above, the researchers found that 55% of projects have no transient developers at all ("Transient" defined as those providing at most one patch in the development of any section of a project or up to three patches to the same part of the code base). Of the remainder, 9% have one transient developer, 8% have two, and 20% have between two and 10.

  9. How does a project attain a larger status, such that it can sustain 10 or more developers? The researchers find that such conditions include "a defined and clear architecture and an adequately appealing function offered; both conditions require a meaningful size of code." This means that the initial developer(s) must be committed to see the project through its young, immature phase. But this isn't surprising, as the same principle holds true for religious movements, political uprisings (the United States is one example), and various other projects. Bluntly put, you need a fanatic or two (in the nicest sense of the word) at the beginning to blindly push forward against all odds. Open source software development appears to be no different.

  10. 80% of projects have less than 11 users (measured in terms of those who "subscribe" to a project - i.e., those who register and download a piece of code).

  11. 15% of projects are actively developed - the remainder (85%) wither and die on the vine or are, at best, "lethargically" developed. Over the six months measured, 90% of the projects on FreshMeat did not change.
Undoubtedly, there are problems with the data set. But regardless of how many holes one can point out in these researchers' work, it corroborates very well with other academic work on the subject. The myth of a global, expansive open source development community is just that: a myth. The reality is more like severe clumping of development around Linux, Apache, and very few other projects. (Even JBoss and MySQL, as I've written before, are overwhelmingly developed by those respective companies, and not by a crowd of outside developers. 95% and 85%, respectively, I believe.)

Does this mean open source is a sham? Not at all. It is still a great way to engage prospective customers, incorporating them into one's development. And it's a great way to replicate Google's "perpetual beta" development methodology, which allows them to innovate and deliver code faster, because it artificially sets expectations low.

It's also a reminder that companies engaging in open source should not delude themselves into thinking that some amorphous community will do their work for them. There is no community to do this. Whether one is a company or an individual developer, the onus of code production is on you. The community only comes when the project initiator has done the grueling, constant work to make the project worthwhile.

In this way, open source really isn't so different from closed source software.

9 comments:

real matt said...

Hmm, doesn't seem too surprising. I remember Marten Mickos said that the ratio of users/bug reporters/patch submitters was something like 1000/10/1 (MYSQLUC 2005 Keynote).

The most surprising points to me were 3 and 4. I would think there is a lot more LGPL and BSD licensed stuff.

Is all of this data from 2001? And just from freshmeat?

Regarding point 5. I think the community involves a lot more than developers. From what Marten said his community would be 1011 (users+bug reporters+patchers/devs) (Assumming MYSQL only had 1000 users) whereas this report says the same community would be size 1 (only counting developers)....

Vinod Khare said...

Yeah I believe the community should also include the end users who actually use the software. I mean, whats the use of a software if its not used. Not everyone can write the code. Not everyone can compose music. But audience are as important as (perhaps a little bit less) the composer!

Anonymous said...

This article is very typical of the FUDL (fear, uncertainty, doubt and lies) spread by folks with ulterior motives againts OSS. Its plain obvious that "open source" is not just a bunch of projects registered on freshmeat.net. There are tons of them outside of freshmeat.net and their number is growing. For example, http://sourceforge.net/softwaremap/trove_list.php alone lists more than 100,000 projects. So you can do the math, after including all the developers in these projects (including contributing users). Most important of all, it is about a philosophy of generating/sharing information that has already taken grass roots in many parts of the world. The author of this article has done the lazy job of visiting a single website to regurgitate some outdated statistics and put out some half-baked crap on a blog. So just move on....

/mna said...

With all due respect, Anonymous, I've been involved with open source since 1998, and every piece of data I've ever seen on open source corroborates this study's findings. Linux and Apache are the exceptions to the rule (in terms of size of the project).

But I think you misunderstood me, anyway, immediately jumping to the conclusion that small projects must equal bad code. If so, that's an insecurity on your part, but not on mine (or on the community's). I think open source does just fine without mythologizing it into this multi-millions-strong community of developers who spend all day long altruistically giving away code. It's mostly not that, so why pretend? But what it is, and what it does exceptionally well, is turn out great code. Just not exactly the way we've sometimes pretended.

So, please re-read my comments and the study itself. I think you'll find friends of open source, not enemies, in both.

Matt

Anonymous said...

Matt, I think the percentages you found are fairely accurate. You're leaving out one curcial component though. Just look at SF.net website and you can easily find the top projects. Bit torrent, which has been the top project God only knows for how many months/years, has only 15 developers. This is for a project that gets 5,000,000 web hits a day and nearly 200,000 downloads a day. Not a lot of people contributing code, but certainly many thousands contributing to the success of the project. Reporting bugs, requesting features, donating money, giving credit to authors, etc, etc...

I'm involved in my 3rd OS project. With the first one I just picked the wrong thing to produce, someone already did it and did it much better. I gave up, the second project, much more successfull and still going strong (http://jnetstream.sf.net). I have many releases of it, many downloads, some help and even registerd a couple of developers of which only one contributed some build-system type setup which was extremely valuable to me since I knew very little about that side of development at the time.

I think its all about interest. If you pick the topic of your project wrong, you will not get any help. On the other hand if you build something a lot of people are interested in, you can bet that help will come.

My 3rd project (http://jnetflow.sf.net) just got kicked off. There is already a huge user community out there interested in NetFlow tools. I've picked my mark. I evaluated all the OS and commercial competition with the project and their capabilities. I will build a much better thing they those guys can and I will make it easy for users to install on their systems. Free ofcourse, this is OS. I'm betting that my 3rd OS venture is going to be wildly successful in terms of downloads and help from the community.

So out of 3 projects, 2 successful, and other 1 developer. Yes you do have to by crazy to get through the initial part of the project.

One last comment, and sorry I know this is long, but this is important to consider. I don't really want other developers at this stage of my project. I really want to make my own design decisions using my 15 years of experience in the area. I like suggestions, I don't mind bouncing off ideas. I certainly write about them in BLOG entries, it helps me think. But don't really want anyone mucking around in there but me. This is probably another contributor to low external developer count. Either the project succeeds after which the ground work has been layed out, my 2nd project. Or you screw it up and the project dies as a statistic, 1st project.

In my opinion the community involvement is definately there, I see it every day. Its got to be the right mix that makes the project survive and elevate to next level for other people to get involved.

Interesting article.....

Cheers,
voytechs - Mark Bednarczyk

Anonymous said...

Hi Matt,

Good article - do you have any pointers to more recent studies on Open Source "Nature and Size of Community" - the academic article you cited had 2001 datasets, and I was wondering if there was a good and more recent study (including your own, if you have one). My email address is: verap1@earthlink.net

Thanks in advance for any help - I am trying to get a handle on what the OS community really looks like today, and how it is consolidating and/or fragmenting.

Shubbert said...

@Anonymous
Wikinomics is a book that might be better for stats, though I am only just beginning to read it now.
--
thanks
http://www.cmyos.com

Anonymous said...

Does anyone even use Freshmeat?

Take a look at Ohloh. Those stats are much more interesting.

And what about KohanaPHP? That is a PHP framework project with over 7 active developers, most of whom have been there for over a year.

They are all volunteer and not paid for by a corporation.

Matt Asay said...

Ohloh wasn't around when this research was done, but it's a valid point (and I'll try to find time to update the research).

As for KohanaPHP, I haven't heard of it, but it proves my point about core contributions coming from a small, dedicated team. I was by no means making the point that every open-source project is staffed by corporations, though all of the big projects are.