Citizens for a Better uPortal Tomorrow Tomorrow

In which I articulate some code quality steps I'd like uPortal to take.


These are steps about being a better environment for development of the open source product, not steps about being a better open source product for adopters.

Though of course these are related: the better an open source product development environment uPortal provides, the faster and better it will develop to serve its adopters.

These are in rough order, with later agenda items benefiting from earlier items.

Start from Needs

This is what I'd like uPortal to enable, the purpose of these steps.

  • Development of plugins coded to uPortal APIs that work across a predictable span of uPortal product versions.
  • Better GitHub presence and more natural usage of the GitHub tooling for monitoring, reporting, and reviewing.
  • More supportive automation of testing of changesets to enable higher-value code review and leading to greater code quality.
  • Comprehensive unit test coverage to reduce the defect and regression rate.

The theme of these needs are higher code and tooling quality enabling faster development at a lower defects and regression rate.

The Steps

  • Semantic Versioning
  • Modularize the codebase
  • Separate API from implementation
  • uPortal becomes its own GitHub organization
  • The separate .jars become separate GitHub repos
  • Adopt Google Style
  • Increase unit test coverage

Semantic Versioning

Adopt Semantic Versioning.

Bring the how-much-change-in-what releases into alignment with other Semantic Versioning using projects.

Modularize the codebase

More .jars of cohesive units of Java code.

Enabled by Semantic Versioning to coordinate amount of change across modules, factor the monolithic uportal-war uPortal Java codebase into smaller, cohesive chunks that build to .jars published in Maven Central.

This makes it more feasible to be deliberate about what parts of the uPortal open source product one pulls into a local uPortal implementation. My uPortal instance uses that jar and this jar and that other jar, but not the yet-another-thig.jar that is irrelevant to me. Local uPortal exposure to change and to defects tightens to just the necessary exposure to achieve locally desired functionality.

Separate API from implementation

Some of the .jars of cohesive units of Java code are "API" whereas others are "implementation".

More deliberate factoring of what constitutes "API" that must be semantically versioned and what constitutes "implementation detail" that can vary more aggressively.

This separation makes Semantic Versioning more palatable.

Exposing API .jars also makes it much more pleasant to code plugins for the uPortal platform. If there's an API .jar for the groups API, then I can code a new group store against that API, it can be a Maven project that depends upon that API, my new group store can itself publish to Maven Central, and adopters can adopt it declaratively by depending upon it.

This creates a better path for experimentation, incubation, surfacing of new implementations, sharing code, that doesn't require slamming all the source code of everything into one monolithic source tree.

uPortal becomes its own GitHub organization

Instead of everything piled into the Jasig GitHub organization alongside CAS stuff and other stuff, uPortal grabs its own namespace.

It becomes cleaner to understand what code is "part of the uPortal project", for which the uPortal committers bear responsibility and ownership, and what is other stuff.

This new organization is Apereo-branded, no doubt. uPortal doesn't become its own Foundation, heaven forfend. No. This is just a matter of clean organization of code repositories in GitHub.

The separate .jars become separate GitHub repos

This one is going to be a tough pill to swallow, but it's important and worth it.

Each and every separate module becomes its own GitHub repo in the uPortal GitHub organization. Separately semantically versioned.

This step is enabled by Semantic Versioning and the separation of API from implementation. By doing Semantic Versioning and separating API from implementation, then a given bit of code defines what other bits of code it works with by its formal dependency graph, not by what code exists in the monolithic Git repo at that same snapshot state with it. One doesn't, indeed shouldn't, change multiple units of code in one commit. Any given commit should be the atomic changeset within one unit of code, bumping the exposed version number (upon release) to reflect the semantics of that change. No doubt that change might be being made to enable wonderfulness in another module taking advantage of that changing module, but here's the truly wonderful thing about semantic versioning: that other module starts its exposure to that change only when it deliberately bumps its own dependency version declaration.

Why have N GitHub repos when we could put All The Things in just one monolithic repo instead?

Because separate GitHub repos give the projects much, much better tooling. Separating repositories allows:

  • Paying more attention to the changes and Pull Requests that affect code you're more interested in.
  • Smaller units of code for natural actuation through commit-driven continuous integration solutions. As in, a Pull Request is opened against uportal-layout-impl , a Travis-CI job kicks off to continuously build and test that module, and because the unit of code is of managable size, that job has sufficient resources, it returns fast, and it can do more, going ahead and checking the style and licensing headers and unit test coverage and... .
  • Using GitHub Releases to document what is gained in each (semantically versioned) release of each module.
  • Potentially using GitHub's issue tracker.
  • More natural usage of GitHub reporting on activity on each module.
  • The potential to model more clearly in GitHub who it is who is a committer over which module.

It's a change, no doubt, but think about it: one opens a Pull Request against uportal-grouper-groups-impl, the CI automation actually succeeds at trying out merging that changeset, running the unit tests, computing test coverage, and annotates the PR with the success or failure of all this. Awesome. Achieving continuous integration automation gets easy when the repos get smaller.

Adopt Google Style

Adopt Google Style.

Separating the modules enables doing this one module at a time when modules are ready rather than all at once.

Separating the repos makes it feasible to offload style checking onto Travis-CI, such that style adherence is checked on every changeset, but developers don't have to locally pay the cost of style checking on every build.

Projects will succeed in adhering to coding conventions style guides only where automation enforces as much of the style guide can be enforced automatedly. That's why this step is so far down the list of steps, it comes when the code factoring is modularized enough to support that automation.

There are a few reasons to adopt Google Style.

  • Arguing about style is an amazing waste of time. Don't do it. Let Google have all those arguments for you and come to some answer.
  • Arguing about style is an amazing waste of time. Don't do it. Note that you can argue about style in code, not just in email and on Pull Request comments. Whenever developer 2 re-styles the code of developer 1, even in passing, that's noise in the changeset that makes it harder to understand what is really changing. This will happen inadvertently by well-intentioned good developers in a codebase without a style guide. The way to make it stop is to adhere to a style guide.
  • Consistent style makes code easier to read and helps developers get more in a mindset of writing code for a product, "in the voice of the product", as the open source product, rather than as egoizing indviduals.
  • Some styles are actually better than others. I'll resist the urge to describe my pet style conventions here and why they're better than yours, because, see above, Don't Argue About Code Style. Nonetheless, Hypothesis: to the extent that there are actual advantages to better style vs worse style (in expected defect rate, say), the Google folks are getting this mostly right and riding along with them on their choices is a fine way to go.
  • You might want to work for Google someday. That'll probably be, like, hard work. It'll be ever so slightly less hard if you have experience with Google Style. (Yes, this bullet point is a bad joke.)

Increase the test coverage

I don't believe software needs to have 100% unit test coverage. There's some point of diminishing returns before 100%.

Hypothesis: that point of diminishing returns comes somewhere far after uPortal's current 21% test coverage.

So, increase the test coverage, enabled by the steps that have come before. At this point we've got small, cohesive modules of code semantically versioned in tight GitHub repos with a working Travis-CI build process. With Coveralls.io integration that build process can provide feedback on every changeset as to its effect on test coverage rate. With the codebase factored into smaller focused repos, that difference in test coverage will be more like "+5.9%" and less like "+0.006%", which rounds to a rounding error.

Increasing test coverage

  • Improves code quality, because of the way it makes you think about factoring code for testability.
  • Improves code quality, because you just might find a bug.
  • Improves code quality, because of course you don't ever write any bugs, but hey, that other developer might catch his or her subsequent bug thanks to your unit test.
  • Enables more bold future development and refactoring, buoyed up by unit test coverage and catching regressions faster.
  • Documents the code tested through literate test cases.

Conclusion

There are non-functional steps uPortal can take towards a higher code quality, cleaner, more productive development environment for uPortal developers and adopters. uPortal should take these steps. Looking to an eventual next MAJOR release of uPortal, well, some of these steps would be a lot better done in the course of a MAJOR release because they're a lot of structural change. The project should bite off some amount of this for uPortal 5. I suggest all of it.

Of course the purpose of this attention to code quality is to develop better adopter-benefiting features faster and with more panache, benefiting from the clarity afforded by the improved code quality.

Code quality isn't much value on its own, of course. It's a non-functional aspect in support of better delivering on the functional aspects.

Maybe you're not buying this vision. That's probably okay. Talking about vision is how we get to a better shared vision and to the inspiration to progress. What steps do you see uPortal taking?

Post cover photo credit: pagedooley : CC-BY-2.0.