This is pgin.tcl/INTERNALS, notes on internal implementation of pgin.tcl.
Last updated for pgin.tcl-2.0b1 on 2003-10-30
The project home page is: http://gborg.postgresql.org/project/pgintcl/
-----------------------------------------------------------------------------
INTERNAL IMPLEMENTATION NOTES:

This information is provided for maintenance, test, and debugging.

A connection handle is just a Tcl socket channel. The application using
pgin.tcl must not read from or write to this channel.

Internal procedures, result structures, and other data are stored in a
namespace called "pgtcl". The following namespace variables apply to
all connections:

    pgtcl::debug      A debug flag, default 0 (no debugging)
    pgtcl::version    pgin.tcl version string
    pgtcl::rn         Result number counter
    pgtcl::fnoids     Function OID cache; see FAST-PATH FUNCTION CALLS
    pgtcl::errnames   Constant array of error message field names

The following arrays are indexed by connection handle, and contain data
applying only to that connection:

    pgtcl::nulls()    Value to return for NULL results
    pgtcl::notice()   Command to execute when receiving a Notice
    pgtcl::xstate()   Transaction state
    pgtcl::notify()   Notifications; see NOTIFICATIONS

Additional namespace variables are described in the sections below.
Result structure variables are described next.

-----------------------------------------------------------------------------
RESULT STRUCTURES:

A result structure is implemented as a variable result$N in the pgtcl
namespace, where N is an integer. (The value of N is stored in pgtcl::rn
and is incremented each time a new result structure is needed.) The result
handle is passed back to the caller as $N, just the integer. The result
structure is an array which stores all the meta-information about the
result as well as the result values.

The result structure array indexes in use are:

  Variables describing the overall result:
    result(conn)      The connection handle (the socket channel)
    result(nattr)     Number of attributes (columns)
    result(ntuple)    Number of tuples (rows)
    result(status)    PostgreSQL status code, e.g. PGRES_TUPLES_OK
    result(error)     Error message if status is PGRES_FATAL_ERROR
    result(complete)  Command completion status, e.g. "INSERT 10101"
    result(error,C)   Error message field C if status is PGRES_FATAL_ERROR
                      C is one of the codes for extended error message fields.

  Variables describing the attributes (columns) in the result:
    result(attrs)     A list of the name of each attribute
    result(types)     A list of the type OID for each attribute
    result(sizes)     A list of attribute byte lengths or -1 if variable
    result(modifs)    A list of the size modifier for each attributes
    result(formats)   A list of the data format for each attributes
    result(tbloids)   A list of the table OIDs for each attribute

  Variables storing the query result values:
    result($irow,$icol)  Data for tuple (row) $irow, attribute number $icol.
       (irow goes from 0 to result(ntuples)-1.
        icol goes from 0 to result(nattr)-1.)

The pg_exec call creates a new result structure. The pg_result call
retrieves information from the result structure and also frees the result
structure with the -clear option.  The result structure innards are also
directly accessed by some other routines, such as pg_select and pg_execute.
Result structure arrays are unset (freed) by pg_result -clear, and any
left-over result structures associated with a connection handle are freed
when the connection handle is closed by pg_disconnect.

The entire result of a query is stored before anything else happens (that
is, before pg_exec and pg_exec_prepared return, and before pg_execute and
pg_select process the first row).  This is also true of libpq and libpgtcl
(in their synchronous mode), but Tcl can be slower.

Extended error message fields are new with PostgreSQL-7.4. Individual parts
of a received error message are stored in the result array indexed by
(error,$c) where $c is the one-letter code used in the protocol. See the
pgin.tcl documentation for "pg_result -errorField" for more information.

-----------------------------------------------------------------------------
BUFFERING

PostgreSQL protocol version 3 (PostgreSQL-7.4) uses a message-based
protocol.  To read messages from the backend, pgin.tcl implements a
per-connection buffer using several Tcl variables in the pgtcl namespace.
The name of the connection handle (the socket name) is part of the variable
name, represented by $c below.

       pgtcl::buf_$c    The buffer holding a message from the backend.
       pgtcl::bufi_$c   Index of the next byte to be processed from buf_$c
       pgtcl::bufn_$c   Total number of bytes in the buffer buf_$c.

For example, if the connection handle is "sock3", the variables are
pgtcl::buf_sock3, pgtcl::bufi_sock3, and pgtcl::bufn_sock3.

A few tests determined that the fastest way to fetch data from the buffers
in Tcl was to use [string index] and [string range], although this might
not seem intuitive.

-----------------------------------------------------------------------------
PARAMETERS

The PostgreSQL backend can notify a front-end client about some parameters,
and pgin.tcl stores these in the following variable in the pgtcl namespace:

    pgtcl::param_$c    Array of parameter values, indexed by parameter name

where $c is the connection handle (socket name).

Access to these parameters is through the pg_parameter_status command,
a pgin.tcl extension.

-----------------------------------------------------------------------------
PROTOCOL ISSUES

This version of pgin.tcl speaks only to a Protocol Version 3 PostgreSQL
backend (7.4 or later). There is one concession made to Version 2, and
that is reading an error message. If a Version 2 error message is read,
pgin.tcl will recognize it and pretend it got a Version 3 message. This
is for use during the connection stage, to allow it to fail with a
proper message if connecting to a Version 2-only backend.

-----------------------------------------------------------------------------
NOTIFICATIONS

An array pgtcl::notify keeps track of notifications you want. The array is
indexed as pgtcl::notify(connection,name) where connection is the
connection handle (socket name) and name is the parameter used in
pg_listen. The value of an array element is the command to execute on
notification. Note that no data is passed - just the fact that the
notification occurred.

-----------------------------------------------------------------------------
LARGE OBJECTS

The large object calls are implemented using the PostgreSQL "fast-path"
function call interface (same as libpq). See the next section for more
information.

The pg_lo_creat command takes a mode argument. According to the PostgreSQL
libpq documentation, lo_creat should take "INV_READ", "INV_WRITE", or
"INV_READ|INV_WRITE".  (pgin.tcl accepts "r", "w", and "rw" as equivalent
to those respectively, but this is not compatible with libpgtcl.) It isn't
clear why you would ever create a large object with other than
"INV_READ|INV_WRITE".

The pg_lo_open command also takes a mode argument. According to the
PostgreSQL libpq documentation, lo_open takes the same mode values as
lo_creat.  But in libpgtcl the pg_lo_open command takes "r", "w", or "rw"
for the mode, for some reason. pgin.tcl accepts either form for mode,
but to be compatible with libpgtcl you should use "r", "w", or "rw"
with pg_lo_open instead of INV_READ, INV_WRITE, or INV_READ|INV_WRITE.


-----------------------------------------------------------------------------
FAST-PATH FUNCTION CALLS

Access to the PostgreSQL "Fast-path function call" interface is available
in pgin.tcl. This was written to implement the large object calls, and
general use is discouraged. See the libpq documentation for more details on
what this interface is and how to use it.

It is expected that the Fast-path function call interface in PostgreSQL
will be deprecated in favor of using the Extended Protocol to do
separate Prepare, Bind, and Execute steps. See PREPARE/BIND/EXECUTE.

Internally, backend functions are called by their PostgreSQL OID, but
pgin.tcl handles the mapping of function name to OID for you.  The
fast-path function interface in pgin.tcl uses an array pgtcl::fnoids to
cache object IDs of the PostgreSQL functions.  One instance of this array
is shared among all connections, under the assumption that these OIDs are
common to all databases. (It is possible that if you have simultaneous
connections to multiple database servers running different versions of
PostgreSQL this could break.) The index to pgtcl::fnoids is the name
of the function, or the function plus argument type list, as supplied
to the pgin.tcl fast-path function call commands. The value of each
array index is the OID of the function.

PostgreSQL supports overloaded functions (same name, different number
and/or argument types). You can call overloaded functions with pgin.tcl by
specifying the argument type list after the function name. See examples
below. You must specify the argument list exactly like psql "\df" does - as
a list of correct type names, separated by a single comma and space. There
is currently no provision to distinguish functions by their return type. It
doesn't seem like there are any PostgreSQL functions which differ only by
return type.

Before PostgreSQL-7.4, certain errors in fast-path calls (such as supplying
the wrong number of arguments to the backend function) would cause the
back-end and front-end to lose synchronization, and the channel would be
closed. This was true about libpq as well.  This has been fixed with the
new protocol in PostgreSQL-7.4.


Commands:

   pg_callfn $db "fname" result "arginfo" arg...

     Call a PostgreSQL backend function and store the result.
     Returns the size of the result in bytes.

     Parameters:

       $db is the connection handle.

       "fname" is the PostgreSQL function name. This is either a simple
       name, like "encode", or a name followed by a parenthesized
       argument type list, like "like(text, text)". The second form
       is needed to specify which of several overloaded functions you want
       to call.

       "result" is the name of a variable where the PostgreSQL backend
       function returned value is to be stored. The number of bytes
       stored in "result" is returned as the value of pg_callfn.

       "arginfo" is a list of argument descriptors. Each list element is
       one of the following:
           I    An integer32 argument is expected.
           S    A Tcl string argument is expected. The length of the
                string is used (remember Tcl strings can contain null bytes).
           n (an integer > 0)
                A Tcl string argument is expected, and exactly this many
                bytes of the string argument are passed (padding with null
                bytes if needed).

       arg...   Zero or more arguments to the PostgreSQL function follow.
                The number of arguments must match the number of elements
                in the "arginfo" list. The values are passed to the backend
                function according to the corresponding descriptor in
                "arginfo".

  For PostgreSQL backend functions which return a single integer32 argument,
  the following simplified interface is available:

   pg_callfn_int $db "fname" "arginfo" arg...
      
       The db, fname, arginfo, and other arguments are the same as
       for pg_callfn. The return value from pg_callfn_int is the
       integer32 value returned by the PostgreSQL backend function.

Examples:
    Note: These examples demonstrate the command, but in both of these
    cases you would be better off using an SQL query instead.

       set n [pg_callfn $db version result ""]
    This calls the backend function version() and stores the return
    value in $result and the result length in $n. 

       pg_callfn $db encode result {S S} $str base64
    This calls the backend function encode($str, "base64") with 2
    string arguments and stores the result in $result.

       pg_callfn_int $db length(text) S "This is a test"
    This calls the backend function length("This is a test"). Because
    there are multiple functions called length(), the argument type
    list "(text)" must be given after the function name. The length
    of the string (14) is returned by the function.

-----------------------------------------------------------------------------
PREPARE/BIND/EXECUTE

Starting with PostgreSQL-7.4, access to separate Prepare, Bind, and Execute
steps are provided by the protocol. pgin.tcl currently provides partial
support for this with pg_exec_prepared. This allows execution of a prepared
SQL statement after binding parameters. It supports binary data parameters,
and is the only way besides Fast-path calls to pass binary data to the
database server. It also supports returning binary data to the client,
like using binary cursors but with control over the format for each result
column.

There is no support for the protocol-level statement preparation; SQL
statements must be prepared using the SQL "PREPARE" command. The main
reason for this is that the protocol-level PREPARE requires the client
to translate parameter types to OIDs, which is undesirable. The
SQL PREPARE command accepts the usual data type names.

-----------------------------------------------------------------------------
MD5 AUTHENTICATION

MD5 authentication was added at PostgreSQL-7.2. This is a
challenge/response protocol which avoids having clear-text passwords passed
over the network. To activate this, the PostgreSQL administrator puts "md5"
in the pg_hba.conf file instead of "password". Pgin.tcl supports this
transparently; that is, if the backend requests MD5 authentication during
the connection, pg_connect will use this protocol. The MD5 implementation
was coded by the original author of pgin.tcl. It does not use the tcllib
implementation, which is significantly faster but much more complex.

-----------------------------------------------------------------------------
