Bowling together

We celebrated the shipment of Domino V10 at Brunswick Zone Lowell Lanes, about a 15 minute drive from our office in Chelmsford. Remembering the lavish R5 ship party, where everyone and his/her family was flown to the Bahamas for several days, personally, I actually put this on par with or even above that. Not because I’m a good bowler, though I am not bad – okay, I won the first string with 3 consecutive strikes but ended up hurting my knees when I forgot all about the approach (it’s been a while). No, I liked it because we were together.

One of my subjects I read – until it gets too dry – is sociology. I almost minored in anthropology in college and let’s just say I just find people – humankind – fascinating. Some years ago I read Bowling Alone published 2000 and written by Robert Putnam, a statistics-based sociologist who lives only 35 miles away from me. In it, he proves beyond debate the loss in the US of something they call “social capital” and how it’s returning. That loss is devastating to both society and individuals, but its return is equally therapeutic. One fact I always remember vividly is that joining a group – I mean a bridge club or knitting group – statistically has the same positive effect on health as ceasing smoking. The title comes from the fact that more people are bowling than ever but bowling leagues have sharply declined.

Well, Thursday afternoon we were bowling together. And it felt good. As one spends more time with a group of people one of course becomes acquainted with more and more of them. That is perhaps why I found the bowling excursion so enjoyable, for among the large group of R5 Iris folks at Paradise Island, I may have known 10. Oh yeah, my wife sprained her ankle badly too which was a bummer.

Aside for the obvious, victorious, “in your face” achievement of shipping a new, full-point version of Domino in 11 months, which has never been done before (we’re getting better at this), I like to celebrate the “soft” human attributes that shined brightly during this time, the people things that no one takes note of. So here are some of those:

  1. There’s a unity of purpose and vision going on that’s infectious. Though it’s taken some time and it exists in different degrees, the larger team knows what it’s about and what it needs to do. I can’t go into detail about the gargantuan yeoman’s work it took to even get us operational outside IBM; suffice it to say that once people saw that, they realized how real and how alive both this group and this product are. And we have acted in kind.
  2. We are in closer contact with our “family” of business partners and champions than ever before. For, a family it is. The work of Richard Jefts, Barry Rosen and a host of others to engage the human beings that remain faithful to the fruit of our labors has born fruit not only in those relationships but in the focus of the inner team. It’s always been a difficult balancing act to synthesize and vet the conflicting demands of future direction, support issues and business opportunity.  Be assured that among the thousands of people who use our software, sometimes there are people insisting they need “obvious”, polar opposite functionality and pathways forward.  Yet that way forward is clearer than ever.
  3. We’re getting better at this. I won’t play the age card, but people can do the math and guess that demographic, given the length of service of individuals working with this technology. But, testament to its (and their) malleability, the age of the dog has had the opposite effect on the tricks s/he can do. No, we’re not dogs and these aren’t tricks, but the overlooked quality of wisdom (vs. knowledge which at times has been demonstrably distracting) has us renovating and optimizing practices and systems.
  4. “The report of my death was an exaggeration” – said Mark Twain. Domino is not going away and neither is this team. Bowling together, the nay-sayers and negative Nancys literally did not exist. Destructive agendas, even self-destructive ones, yield to creativity and the grit of execution. Nothing dead about it.

I joked to Richard Jefts that we should build a bowling alley in the Chelmsford space for use when our “family” comes to visit. We laughed but he said might just take me up on it. Then you can all bowl together with us.

Advertisements

Concerning the Design Catalog …

Efficiency in programming languages is achieved by avoiding bad practices like excessive looping, too many I/O or network operations and data movement. Modern processors have moved the bar of unacceptable performance concerning CPU-consuming inefficiencies but of course you can still loop yourself into a hole. Please don’t take that as a challenge.

With languages that have the power to execute very extensive (and therefore expensive) operations to achieve their ends, the same bottlenecks exist. I/O (even with SSD) and network access can still predominate cost and time, and those two resources are related but not the same. Query and other high-level languages like Pig Latin, Hive, SQL, and MongoDB® query language all work well by optimizing (minimizing and cutting the cost of) underlying data and network access required to satisfy the requests they process.

Classic query optimization

Therefore it becomes very important to plan how to do the work. The first order of business is to make sure the request makes sense, that the participating objects exist and are configured to perform what they have been asked to do. To use an example most people are somewhat familiar with, consider the SQL statement:

SELECT order_number, order_origin FROM orders WHERE part_count > 250 AND back_order = 1

the orders table must be checked to see if it exists and the columns order_number, order_origin, part_count and back_order exist.

You may know that relational databases all have system catalogs. These are sets of tables that can be queried like any other. The actual design of those tables varies by database, but most have something like a TABLES table, where every table ever CREATEd has a row or rows. Now, it is cost prohibitive for those engines to perform queries to compile queries, so they use a memory-resident, highly optimized copy of the TABLEs table (and all other catalog tables) to do that work.

Concerning Domino®, the internal knowledge about design elements resides in intricate and related field values on design documents. Since the same problem of runtime access to even validate queries exists, we have to create different, optimized instances of the design data. We call it the “Design Catalog”.

The second order of business in planning a query/request is to find any helpful optimizing strategies to solve the problem at hand. These are combinations of data structures like indexes and fast-path execution means like pre-seeded query terms or classic approaches like nested loop or sort-merge joins.

But something that seems to be simple yet is remarkable complex like how to order the work is the first decision to be made. In general, equality terms are cheaper than range ones. Index-satisfiable terms are cheaper than those requiring direct data access. And for sharded and distributed databases, getting results for single terms on single nodes is the first order of work for map-reduce processing.

In relational engine system catalogs there is virtually always an INDEX table to be consulted for this part of the problem. And to finish the calculations to perform optimization, system catalogs contain COLUMN tables with gathered and sampled  numbers of values (aka cardinality) and other statistical data.

What about Domino’s indexes and DQL optimization?

Across its history, Domino’s indexes have been foundational to its market value. The Notes Indexing Facility (NIF) is a many-splendored thing with its trees of trees and optimized ordinal retrieval capabilities (“get me document 129093 in order by a given key” requires index walking in most engines). Domino’s indexes also house persistently-indexed computed values. Though there may be other engines with something akin to this power there are certainly none more robust. And available today.

So the Design Catalog needed to have quickly-available descriptions of available indexes in a database, meaning that design data needed to be extracted from its normal residence and itself indexed for quick lookup and use in optimization. However, this is complicated business.

For one thing, Domino’s industry-best security model allows for privileges to be applied to design elements. Not all views (or their indexes) are available to all users. For V10.0.0 of Domino we have had to punt on that, and remove all views or folder with readers fields on their design documents from consideration in DQL.

Secondly, since views have implicit document restrictions. So given the Pending view’s selection criteria:

select form = ”order” & order_state = “pending”

any use of those indexes would apply those selection criteria on top of the criteria in the “free form” query term (vs specifying the view to be used like below). So

order_origin = ‘Los Angeles’

using the Pending view would actually mean the following threesome terms:

form = ‘order’ and order_state = ‘pending’ and order_origin = ‘Los Angeles’

and that is not what the user intended. So we need to NOT use views with anything except “Select @all” selection criteria in that general case, and if application developers want to use the Pending view, we opened the syntax

‘Pending’.order_origin = ‘Los Angeles’

which is much more optimal than the fully spelled out threesome since the index persists.

Further considerations

Given the multiply-occurring value data model in Domino, we also restricted free form query terms to only use indexes that exploded those multiply-occurring values into individual index entries. And we had to restrict to using non-Categorized indexes as well.

Query-ability

So in comparison with the relational model above, what of the query-ability of the Design Catalog? Well, we have put the system catalog data into a non-replicated database called GQFDsgn.cat. And by doing so, we have removed the database context of the design elements and that is a liability. So at this writing I cannot guarantee the forward existence of GQFDsgn.cat; it is at this point stopgap. That means any querying of its contents is very risky if attempted. No doubt people will do it anyway and that’s fine.

For now, the Design Catalog gets the job done.  Further instructions on its use will appear of its formal documentation.

DQL roots

A few years ago, I and 3 of my colleagues were drafted for a skunkworks effort, a throwaway project. Prove concept, save relics and go back to your regular job. We were interested in taking a quite functional REST API that was serviced by much more expensive technology and have it instead use native Domino services. We worked for a few months, over the Christmas holiday season, to show a cheaper way to give the API what it needed to function.

Part of that work involved data transformation. JSON is the format of all REST payloads so it was something we needed to supply and consume. Fortunately, for the most part we had some built-in libraries for that problem. But another part was query solving. And pulling together Domino services to satisfy the different query terms, it worked! We delivered a demonstrable, cheap prototype that inspired later work.

I have a long history in query processing going back my 1986 work on the mainframe database, Model 204®, now owned by Rocket Software. Its language, unceremoniously called “User Language” thrives by using 2 kinds of indexes and direct-data access in a way that was at least 40 years a precursor of Lucene and Hadoop sharding and map-reduce engines. Its Boolean processing is both stingy in avoiding I/O via partitioning and optimal in actual low-level operations using the machine-level instruction set to AND, OR and NOT bitmaps.

Image result for and or not

Later, the same technology was ported to the C language and Unix/Windows and I was part of an effort to support the full SQL 1992 language. It ported well, and specialized in the same area – high speed complex Boolean processing.

I also have a long history with SQL. I appreciate its strengths and its standardized publication of the very well defined relational algebra. But, working on Notes/Domino and diving deeply into the unique and valuable properties and capabilities of semi-structured and unstructured data, I have observed that the mapping to the SQL standard has always been a forced one and the success of each attempt varied at best. Enter NoSQL and its pundits. Indeed, enter the internet, where relational data plays a subsidiary role in the extensive unstructured data corpus.

Image result for internet data corpus

Earlier this year (2018) we began working in earnest on providing NoSQL capabilities using Node.js to access Domino. We surveyed the landscape and found it populated by engines that had invested heavily in JSON as their native data format. Now, one of the most beautiful attributes of Domino has always been its malleability to support any number of front (and truth be told, back) ends. Node.js and JSON are no exception, though there is work to do. And they comprise what can only be described as a new standard.

The challenge for us in developing this new front end is to map and make valuable the data, processing and everything else possible in Domino in the new (well, new to Domino) format. Though I pledge to write a LOT about the work in such a way that seeks both input and advertises the incumbent power of the underlying engine, one early deliverable was quickly identified as a query capability.

Domino has had the underlying structures to support a general query facility for a long time. It is NOT a relational engine, which is a very good thing for a NoSQL database. And its deep underpinnings in unstructured, relationally denormalized data are formative in this work.

Now, much of the Domino engine was built in support of the Notes® client and its browser-based ancestors. That is not a liability; there is very rich and useful functionality at our collective disposal. But in Node.js and a query facility, the usage of the indexes and document data has a different footprint. For instance, a call to render 100 index entries at a time while scrolling an inbox or view is a small increment of that needed to find the results for 5000 entries across the same view. And we need to take care not to overwhelm one kind of processing with the other.

But using the indexes of the Notes Indexing Facility (NIF – the part of Domino that comprises views) was an obvious approach in the aforementioned skunkworks and it has born fruit in the current effort. Given the semantics of a database-level query, and the Domino data model, certain restrictions in view and view column design have been needed to have a working engine.

Set-based terms connected with Boolean primitives are the building blocks of any query engine. And in that skunkworks we also identified the Domino IDTable functions as the avenue of choice for Boolean processing. Their speed is tremendous. The one restriction they bring is that NoteIDs are not portable to other replicas, but that affected no early user story or requirement is worth living with for the performance benefit. IDTables are the currency of the query engine and as such, all data manipulation will be done via efficient post processing, at least for now.

We also needed to define the language. Early on we identified the existing engines in the document-based NoSQL world. They were MongoDB®, and CouchDB®, both well established and adopted in the field They each had JSON query interfaces that have users building Boolean trees. So that was the first interface we built, DQL 1.0 if you like. But when we looked at it, and read developer reviews of those interfaces, we concluded it was not way to go. That decision forged DQL in its current, shipping form.  We didn’t focus on the language so much as the engine.  So we called it DGQF (Domino General Query Facility) because Domino is a collection of facilities working together.  But the language acronym, DQL, won the day to the praises of many (If it’s ok, we’ll still call it DQF internally).

There isn’t space here to go into all the variants and power of the language already. The formal documentation is undergoing its final editions and I will provide pointers once it’s available. The approach we took is sound and will yield newfound power in the hands of application developers even into a new generation. We did our best – and will continue in that – to bring existing capabilities into innovative use and expose components such as IDTables that exist in views and folders, into the syntax. We think it hangs together pretty well.

So .. enjoy. And here’s to Domino V11. You ain’t seen nothin’ yet!

The Iris bloodline

He came by helicopter. No one was sure where it landed, but they heard it fly in. He brought several of his direct reports and a company-wide meeting was held to announce something. IBM’s CEO then announced that he had just spent $3.5 billion to buy the 70 (!) people in the room. Well, not them, their company. Along with them came thousands of others working for the parent company Lotus, whose funds had been used to sustain the development of the product those 70 were so proud of.

Lou Gerstner told those present that he was amazed that such a small group of people could have built R3 of Lotus Notes and he pledged that he and all IBM management would stay far away from running their operation but that they would see a huge influx of capital to expand it. This meeting happened at 1 Technology Park Drive in Westford, Massachusetts, home of Iris Associates, a subsidiary of Lotus Development which was now a subsidiary of IBM.

Image result for "iris associates" notes domino logo

Lou kept his pledge and yours truly was hired as part of that expansion, in 1998 to aid in the final phases of R5. After it shipped, and according to some unpublished agreement and schedule, Iris ceased to exist as a company in 2001 and became part of IBM. The investment in Notes/Domino continued for several years, as IBM made their money back several fold.

I don’t want to bite too deeply the hand that fed me, but “fed” is past tense so some of that will come out in this post. I want to make plain what happened, not to practice resentment or articulate any schadenfreude; there is none. But I need to be a bit of an historian so I can really celebrate all that’s happened. And celebrate is the operative word in this blog entry.

We learned why Lou Gerstner was so impressed at the accomplishment of such a small group. In the years to follow the initial purchase of Lotus/Iris, several of the projects that we saw happening around us (and with us), were so large they could not succeed. Agile was adopted to cut the waste.  But there’s nothing new here. Such seminal accounts as the Mythical Man-Month chronicle best what software project life can be like at Big Blue.  The software that survives and even thrives is generally that which is needed to move iron and keep it operational and modern.

We saw the formation of product line city states within brand-based “nations”, first jockeying for market- and mind-share then for survival as cuts ensued. People were doing good work but it wasn’t seeing the light of day.  And, sadly, cutting is arguable one of IBM’s greatest skills.  Layoffs (sorry, “resource actions”) have project names and are carefully planned and executed.

Image result for ibm cuts

I and my surviving colleagues are grateful for being employed these years – I mean that – for permanent employment is promised nobody. And to be completely fair, the way that Notes/Domino hit the market is an unusual phenomenon.  Engineer/market visionaries may boast of their acumen after such good fortune, but the confluence of so many factors involves a degree of luck and timing out of the control of the inventor.  Many are the start-ups with seemingly workable products that for one reason or another fell short of their sales targets.  Not so Notes/Domino.

I need to say strongly that IBM is a great place to work.  In many ways.  There are great people there that I love and with whom I have loved to work.

But of course every developer wants his/her efforts to meet market success.  And the personal fulfillment that comes from that was extremely rare – in all honesty far too rare – for a number of reasons I will discuss some day under different cover.  I do presently hold out hope for Watson and the current efforts into Blockchain, I know some very good people working on those technologies.

So in stark contrast to how things began – and I do not only speak for myself – the environment became a progressively depressing, downward spiral.  Yet, many of the original Iris people nonetheless stayed around, still working on the software they knew and loved. They shifted their work to the cloud offering, SmartCloud Notes®.  Others moved on to other positions in the company.

Suddenly, in September, 2017, a pens-down, work stoppage was declared. There was complete silence by management and those affected counted the possible scenarios that could produce this first-only move of its kind. Most were very bad, but there was one good one – our business was being sold. And that one good scenario was the one that carried the day. HCL Technologies, an Indian high tech services firm, was purchasing several under-valued products from IBM with hopes to shore up the customer base and integrate them into their offerings. The Products and Platforms division (pnp for short – it’s in my e-mail address) was a reasonable rebirth of what had been IBM Software Group (SWG) so many years before under Lou Gerstner and Steve Mills.

The reaction of the engineers varied. Personally I was ecstatic, even giddy. As a friend and former Iris engineer (still at IBM) said “You guys just had your white horse come to your rescue”.

The group of people developing Notes/Domino at HCL consists of MANY of the original Iris engineers, some very talented newbies and a very motivated management team that has helped this whole venture work. And work it does and work it will.

And Lou Gerstner’s comment about the 70 people? They’re baaaack.

The software has grown greatly, which greatly spreads the efforts of those remaining, but the same spirit that started it all is alive and well. This feels like a startup even though its initial aims are to stop customer (oops, sorry Ginnie, “client”) erosion.

Watch this clip from Disney’s Hook to get full effect. It’s like that, complete with Rufio.