Prepared for Warpstock 2003 by Lynn Maxson (lmaxson@pacbell.net)

Opening Open Source

 Foreword

The following describes a current project of the SCOUG Programming SIG.
SCOUG, the Southern California OS/2 User Group, invites the reader to
visit their website at www.scoug.com.  Should the reader also want to
follow the progress of this Programming SIG project more actively he can
subscribe to the scoug-programming mailing list from the website as
well.

 An Open Source Project

The SCOUG Programming SIG discussed how best to participate in open
source development for OS/2.  Obviously it could have chosen to join in
any of the currently ongoing open source projects for OS/2.  Instead it
has decided to focus on the more general problem relative to OS/2 open
source development:  increasing the number of OS/2 programmers of open
source.  This means overcoming the perceived barriers or inhibitors
currently facing those in the OS/2 community interested in
participating.  

SCOUG membership itself reflects that of the OS/2
community as a whole.  The Programming SIG as a subset of that
membership reflects it as well.  We have a range of highly skilled
programmers down through several levels to those who rate their
programming skills at or near zero.  So how then do we raise the average
skill level of most, if not all, to a personal comfort zone to engage
actively in open source programming?  How do we use our own people
resources to effect the necessary skill level?

The reader will note that the focus relies on raising individual skill
levels, a "smarting up", instead of a "dumbing down" of the skill level
to that of the individual.  In effect we propose to lower the bar by
raising the people.

 The Inhibitors: Mass versus Inertia

Not surprisingly that which many highly skilled programmers, so-called
techies, enjoy with such relish, the abundance and variety of tools, has
just the opposite effect on those considering increasing their skill
levels.  It just seems such an overwhelming mass in terms of sheer
number that it dampens the desire to proceed:  inertia.

At the core of that number lies multiplicity.  The multiplicity of
programming languages.  Within a programming language the multiplicity
of implementations.  Within an implementation the multiplicity of
utilities.  For example the GCC package comes with nearly four dozen
utilities, each with its own language and each with its own source to
maintain.

So how do we present this mass in a way that encourages those who need
to gain some mastery, some increase in their skill levels, to start the
momentum to overcome their initial inertia?

 The Strategies: Short Term, Long Term, and Merging

We cannot avoid the current situation with its multiplicities as the
starting point, point A. We need to define a situation without such
multiplicities, point B. We need a way that starts with point A that
eventually leads into a seamless fit with point B. This means executing
two concurrent strategies, one short and one long term, along with one
that at some point joins the short seamlessly with the long.

The SCOUG Programming SIG (SPS) has taken this three strategy approach.

 Short Term Strategy

  Multiplicity of Programming Languages

In tackling this one head-on the SPS takes advantage of member's
expertise in various programming languages to essentially support the
languages in parallel.  For those without either a language or
programming experience we have to offer the necessary tutorials.

We take a literate programming approach.  Literate programming involves
two languages, the informal descriptive language of the user and the
formal language of the source code.  In this instance we intend to use
the same informal description as far as possible in all sample code.  In
effect we will provide a basic form of comparative linguistics, allowing
the user to see the differences and similarities among the languages as
well as nuances within various implementations of the same language.

The sample code will range from single statements to different control
structures to complete algorithms to entire programs.  In that manner we
expect the user to more rapidly gain a sense of "construction" for
assembling code sequences in any language.

  Multiplicity of Implementations

For any given programming language we frequently find open source code
written for one implementation, e.g.  Watcom C/C++, giving errors when
compiled under another, e.g.  GCC.  It occurs also among the several
different assembly language implementations.

The SPS expects to take several different actions to assist in reducing
the impact of these errors.  First, we will deal with compiler options,
listing equivalent forms from each implementation.  Secondly, we will
report such errors, their cause, and means of correction.  This will
occur as part of tutorial support from the website.  In addition we will
use the scoug-programming mailing list to function in part as a help
desk.

  Multiplicity of Versions

Within implementations we have different versions not fully backward
compatible from a newer version to an older one.  Where this occurs the
SPS will either engage in modifying the source to use the newer version
error free or will describe the source changes, their cause and
correction, again as part of the tutorial description on the website.
Also we will support the error detection and correction through the
scoug-programming mailing list.

  Changes to the C/C++ compiler(s)

The SPS believes that we can modify the C/C++ compilers, specifically
the GCC compiler, to enhance programmer productivity.  Moreover we can
do this without affecting existing C/C++ source, ensuring in this manner
backward compatibility.

   Change 1: Change compiler from single- to multi-pass

This change eliminates the need for the "void" statement to provide for
forward references.

   Change 2: Allow same naming convention for "main" procedures as for subroutines

This requirement, a holdover from the use of OCL in UNIX, does not apply
on any other platform.  Technically it exists as a UNIX-specific
implementation restriction, not a language one.

   Change 3: Eliminate the need for nested, i.e. internal, procedures

This leaves only the use of external procedures.  This change will allow
an unlimited number of unordered procedures (source) on input.  In
effect it eliminates the need for "make" and "build" while simplifying
the "link" process.

   Change 4: Allow unlimited "main" procedures on input

This comes as an extension to change 3. Effectively it allows global
source changes across program boundaries.  It allows multiple object
modules, i.e.  "main" procedures, to result from a single compile.  The
process works from a single "external" procedure, a current restriction,
to compilation of an unlimited number.  This offers a single
synchronization point to effect global source changes.

   Change 5: Allow for either interpretive or compiled execution

This takes advantage of the fact that both interpreters and compilers
have the same four stages:  (1) syntax analysis, (2) semantic analysis,
(3) proof theory, and (4) meta theory.  Interpreters and compilers
engage in the first two stages, syntax and semantic analysis,
identically.  They differ in terms of executable results, interpretive
or compiled, in proof theory.

To make this change means incorporating both interpretive and compile
functions as part of data entry, i.e.  an editor function.  The editor
then becomes the single, necessary user interface whose menu options
make it a complete IDE.  As we currently have "smart" editors, e.g.
LPEX, which do syntax analysis this simply means making them even
"smarter" all the way through code generation (proof theory).

   Change 6: Allow for automatic test data generation and testing of interpretive output

This testing occurs at three levels:  (1) the individual source
statement, (2) a control structure (sequence, decision, iteration), and
(3) procedure.  This takes advantage of the fact that we can consider a
sequence as one or more statements or control structures.  Each
(statement, control structure, procedure) have definite boundaries.
Basically these boundaries have a single input and a single output.  As
such we can consider them as "pluggable" or "reusable" components.

Within the different boundaries we have data variables whose values
determine results in an assignment statement or within an "if" or "do"
clause.  In interpretive mode the software can present us with a list of
variables, assumptions about their default value ranges, and the
possibility to substitute for those default values.  Once we have set
the values the software as part of an exhaustive true/false proof, the
same as occurs in logic programming, can enumerate all possible
combinations of values for all variables tested.

The reader needs to understand the implication of the exhaustive
true/false proof using enumerated sets of values for variables.  When
"properly" understood and implemented it eliminates the need for "alpha"
and "beta" testing along with the need for "alpha" and "beta" testers.
The reader needs to understand that this automatic software testing
process occurs millions of times faster, millions of times cheaper, and
millions of times more accurate than current "accepted" practice.

This changes represent a form of "paradigm busting", of attempting to
look at "old", i.e.  current, things in a new way.  Each of them
illustrates how implementations and their associated methodologies
operate to set an upper limit on individual productivity.

Somewhere in the process of implementing this short term strategy the
SPS will reach a consensus of when to leave this to focus more fully on
the long term strategy.  The SPS will then merge the two strategies.

 Long Term Strategy

Interpreters have always offer an IDE based upon an editor interface.
Because they only differ from compilers in the form of the executables
produce, a meta theory option during the proof theory, it makes little
sense to continue to pursue the historical edit-compile-test, mulitple,
separate step process.  We need only one tool, one interface.

Moreover we need only one language.  One language, one tool, one
interface can lower the skill bar to participating in open source
programming.

Three pre-1970 programming languages (LISP, APL, and PL/I) contain all
of the data elements and aggregates and operators found in all other
third generation (imperative) programming languages combined.  Three.
Equal to thousands.

Moreover all programming languages are specification languages.  On the
other hand not all specification languages are programming languages:
they have neither an interpreter or compiler.  We make this distinction
in order to make clear...or clearer...the relationship between
imperative programming languages (1st, 2nd, and 3rd generation) and the
Software Development Process (SDP)

  Software Development Process (SDP)

The SDP consists of five stages:  (1) specification, (2) analysis, (3)
design, (4) construction, and (5) testing.  Now if every programming
language is a specification language, why is it that coding appears
first in construction and not specification?  The answer lies in the use
of imperative languages like C, C++, and JAVA (also Python, PHP, Perl,
PL/I, etc.).  In imperative languages each of these stages have their
own form, some text, some graphical, of manually prepared source.  In
short five stages, five separate manual sources to maintain in synch.

Declarative languages (AI, neural nets, Prolog, SQL, etc.)  on the other
hand allow the software to accept specifications as input, automatically
performing analysis, design, and construction upon it.  In some
instances as suggested in change 6 (automated test data generation and
testing) in the short term strategy the software will also perform
automatic testing.  Thus we have the comparison of imperative versus
declarative languages of five manual SDP stages in the former and one in
the latter.

               Imperative  DeclarativeSpecification  Manual      ManualAnalysis       Manual      SoftwareDesign         Manual      SoftwareConstruction   Manual      SoftwareTesting        Manual      Software

Obviously our insistence to stay with imperative languages in the
production of open source code places another limit on individual
productivity.

  A Big Challenge: Matching Solution Set to Problem Set

Sometimes in the search for better solutions we venture far from the
KISS principle.  We get so wrapped up in some elegant or esoteric form
that we forget why we came here in the first place.  We have programming
languages to describe, i.e.  communicate, real world events.  We use
programming languages basically for the same reason we use our native
languages:  to provide a linguistic map of the territory.

This linguistic map represents our solution set to a real world
situation, our problem set.  To the degree that our solution set matches
the problem set determines its rationality, how closely our map fits the
territory.  In this instance we can only map the logical processes, the
data and the operators describing them.

We engage in an irrational act when we attempt to make the territory fit
the map.  Perhaps no better example of this exists than in the post-1970
use of 'int' (binary integer only) and 'float'(real arithmetic).  This
disallows the broader occurences in reality, in fact in real computers,
of fixed decimal integers (as well as binary) and fixed decimal real (as
well as binary real).  It disallows the variable precision available for
binary and decimal variables as well as the choice of either binary or
decimal for floating point.

Fortunately at least one pre-1970 programming language, PL/I, has native
data types for all arithmetic and string variables and constants.  It
supports native fixed and variable length, bit and character strings as
well as native support for string operators.  In short PL/I provides a
closer match, i.e.  a better map, to machine architecture or assembly
language than that erroneously claimed for C.

In addition PL/I and APL support aggregate operands, e.g.  the ability
to add, substract, multiple, divide, and, or, and compare (equal to,
less than, greater than, equal to or greater than, equal to or less
than, not equal) arrays and structures.  If we add the operator richness
of APL to PL/I and then to that mix the list aggegate and operators of
LISP, we can in this synthesis do anything with equal ease in terms of
writing effort and expression possible in any other imperattive
language.

This synthesis allows our solution set, our map, to directly correspond
to events, data and operations, in the problem set, the territory.  No
other combination of languages can provide a better mapping solution, a
better match of the solution set to the problem set, of the map to the
territory, than this single one.

We build here on capabilities operating now for over 50 years.  Moving
it to a single syntax, a single form, a single language occurs without
any loss in translation.  If we use a simple syntax like PL/I where
every program element is a statement and every statement ends in a
semi-colon, we reduce the learning curve significantly.

  A Bigger Challenge: Matching Dynamics of Solution Set to Problem Set.

Assuming that we have now provided for the best possible fit of our
solution set to our problem set, a challenge we have met, we now face an
even bigger challenge:  having our solution set match the dynamics of
our problem set.  Changes occur in the problem set.  We need to
implement them in the solution set.  Moreover we need to implement them
at rate at least equal to their occurrence in the problem set.  Failure
to do so means creation of a backlog.

Historically the persistant presence of a increasing backlog has brought
more than one promise of a "silver bullet" to the forefront.  In fact
the resolution of the backlog situation explicitly asserted by its
advocates brought about the current emphasis on object-oriented
technology.  That it has failed to live up to its promise and in fact
even worsened the situation has lead to recent efforts in extreme
programming and agile modeling as well as a consideration of
aspect-oriented programming.

At its core resolving the backlog situation, the ability to sustain a
change rate in the solution set, the software, at least equal to that of
the problem set lies in improving individual productivity.  As people
make up the primary cost as well as the delay in software development
and maintenance the resolution lies in doing two things concurrently:
(1) reduce the number of people necessary and (2) minimize the remaining
people effort.

We can offer the following guideline in achieving this:  "Let people do
what software cannot and software what people need not."  This means to
minimize the amount of clerical work by shifting it, by automating it in
software.  We have already seen an example of this in the earlier
comparison in the SDP between the use of imperative and declarative
programming languages.

  Impediments to Increased Productivity

We have already discussed one:  the continued reliance on imperative
programming languages.  We need to shift to a greater use of declarative
languages based on logic programming.  However, we need to recognize we
have imperative languages because machine architectures through their
instruction set have an imperative basis.  Thus any declarative language
needs also to include imperative within its scope.  Otherwise as a
specification language it cannot specify itself down to the machine,
i.e.  instruction set, level.

A second impediment lies in our reliance on file systems, on files and
directories, for source code storage and maintenance.  We should shift
to a use of a data repository/directory using a database manager.  This
allows a manufacturing approach for source maintenance where we
separately store our raw materials, source statements and source data.
This allows the software to maintain assemblies of statements and
assemblies as ordered lists of names.

A third impediment lies in using multiple libraries instead of a single
source library, a single specification pool.  The use of a single source
library does not eliminate incompatibilities, but in conjunction with
logic programming identifies them.  That identification allows the user
the full range of choices as well as the possibility of modifying the
source to eliminate them.

  The Data Respository/Directory

The Data Repository/Directory uses a database management approach to
automate the creation, retrieval, and maintenance of source code, source
text, and source data.  While the user explicitly names source data
elements and aggregates the software can provide a content-based name
for every source statement.  This means that it stores each source
statement separately.  It also means that all statement assemblies exist
as an ordered list of names of other source statements or assemblies.
Thus we never replicate the source itself, only its name.

This allows a "pure" manufacturing approach supporting a
bill-of-material explosion of any assembly of all lower level assemblies
and source statements.  It also supports a "where-used" capability for
source statements and assemblies.  This simplifies source change
management when change to source has a global, cross-program,
cross-application, effect.  In conjunction with the suggested change to
interpreter/compiler to allow production of multiple object modules we
can synchronize all global effects of any change in a single unit of
work, i.e.  a single compile.

This makes reuse available from the statement level on up through all
higher level assemblies.  It does the same for data elements up through
all higher level aggregates.  This allows then any examination of any
use of any statement or data throughout all applications.

 In Summary

We need a way to have more in the OS/2 community comfortable and
competent in contributing open source.  We have ways of assisting this
in the current environment as well working toward a more ideal, more
productive one.  We do not take the approach of dumbing down, but rather
one of reducing what needs mastering.  This reduction comes down to a
single specification/programming language, covering the entire range of
imperative and declarative capabilities.  It comes down to a single
software tool written in that language based on a data
repository/directory also written in that language.

That means simplifying the current software environment and its
multiplicities to oneness:  one language, one tool, one source.  This
offers more comprehensive support than available in the current software
environment along with increased productivity as the use must learn less
and do less yet achieve as much or more.

The SCOUG Programming SIG has started on this path.  Obviously it has a
long way to go.  Just as obviously it welcomes anyone interested in
bringing this to fruition as early as possible.
