Thursday, June 20, 2013

The truth about SAP ABAP cluster tables




Hi folks,
Todays argument is cluster tables.
In the internet you find mainly a lot of confusion, starting from the sap help portal which states:
"Table Pool: A table in the database in which all records from the pooled tables assigned to the table pool are stored corresponds to a table pool."
And "Cluster TABLE: Several logical data records from different cluster tables can be stored together in one physical record in a table cluster.

LOL: It might be my poor english but imho it is more or less like saying a table cluster is a cluster of tables and a table pool is a pool of tables.

I'll try to stick with the cluster tables.

To keep it simple assume you are a bookkeeper for a veterinarian  ( the doctor who cares for the health of animals ) suppose working with dogs only and you have to keep track of his patients data .

You can have several booklets (tables) where you write different things:
  • One book for the money paid by each person for its dog (key dog name, chip-id)
  • One book where you write name and adrress of the person the dog belongs to (key dog name, chip-id)
  • One book where you write the illnesses (key dog name, chip-id)
  • One book for the medicines (key dog name, chip-id)
  • One book to keep track the individual visits (key dog name, chip-id)

An alternative could be to have a single book where you enter for the key (dog name,chip-id) all the data you need, placing sequentially after the name / chip id tag the different data related to the single dog.

In the first case you have different places ( transparent tables ) where you put data with the same key.
In the second case (cluster table) you have a single table where under a single key you save several different data stored. Even tables structures etc.

Technically speacking, from a programming point of view, and logically it is quite a big difference.

One might say that it is quite confortable to have all data available for a given key without selecting from many tables.
On the other hand, once one knows that the cluster table cannot be (SQL) selected on other then using the main key ( dog-chip id ) because it is compressed and saved in a row compressed (binary) format, one realizes that informations like "dogs sick this month" "medicines prescribed this month" etc. are not really accessible at a glance.
Maybe this is the reason why normal tables are called "transparent" (i guess)... the cluster tables could well be called "obscured tables" or "dark tables" because you do not really know what is inside, and if you do not really know what is inside ( i mean the abap structure definitions for the import export statements the exact sequence of the fields and tables in the export) you have really few chances of extracting this data.

someone uses this as an argument pro-cluster: the data is compressed and not readable at the database level. This means security for them.
Someone else who faced an upgrade where a ddic data structure changed faced the dark side of the security of cluster tables: corrupt data that cannot be retrieved. Let alone the case where someone switched unicode on on a system... i dont' dare thinking of what happens. this is because the data is saved in a binary form in the cluster. if the size of the ddic structures you read from the cluster changes and the cluster is not converted before you read corrupt data.

it is like writing a 16 bit integer ( to keep it simple ) and later reading 32 bit. You are likely to read 16 correct bits and 16 bits of junk.

To be more precise a cluster table contains a compressed set of binary data which reflect the internal representation of the corresponding abap data structures at the time of writing the cluster.

If the binary structure of this binary representation changes later on for whatever reason ( change in the programming language, endianess of the machine, unicode on/off, different codepages, structure boundaries,  new fields in table of in the export statements etc etc) you get corrupt data.

The impression is that as soon as SAP realizes the problems that arise they go for a solution at least they did in the past. For example now there is a check on the data type stored in the cluster. probably now they implemented conversion mechanism in case something changes. Whenever they change something in the cluster tables contents they release xpra programs to massively update cluster contents.

So there are many possible pitfalls when using cluster. And many attentions to use in practice.
Ill'come back to this topics in later posts.

Now let's try a definition:

Cluster tables are database tables that combine the data from several tables with identical keys into one physical record on the database.

    Data is written to a database in compressed form by means of special commands of the abap programming language.
    Retrieval of data is very fast (your mileage might vary).
    Cluster tables are defined in the data dictionary as transparent tables.
    External programs can NOT interpret the data in a cluster table.
    Special language elements EXPORT TO DATABASE, IMPORT TO DATABASE and DELETE  FROM DATABASE are used to process data in the cluster tables.

In HR there are well know cluster tables:

PCL1 - Database for HR work area;
PCL2 - Accounting Results (time, payroll);
PCL3 - Applicant tracking data
PCL4 - Documents  (change logs, etc)

Here is a typical cluster table ( in the abap dictionary )


Field    Data Element    Data Type    length    Checktable    Description
MANDT    MANDT    CLNT    000003    *    Client
RELID    INDX_RELID    CHAR    000002        Area
SRTFD    INDX_SRTFD    CHAR    000022        ID
SRTF2    INDX_SRTF2    INT4    000010        Counter
LOEKZ    SYCHAR01    CHAR    000001       
SPERR    SYCHAR01    CHAR    000001       
AEDAT    SYDATS    DATS    000008       
USERA    USERNAME    CHAR    000012        User name
PGMID    PROGNAME    CHAR    000040        Program Name
BEGDT    SYDATS    DATS    000008       
ENDDT    SYDATS    DATS    000008       
CLUSTR    INDX_CLSTR    INT2    000005        Data length
CLUSTD    INDX_CLUST    LRAW    002886        Data

Your key is saved typically in the srtfd field
Your data is saved in the "raw field" CLUSTD.
If the data is too much for a single raw field the abap language automatically creates as many lines with the same key (incrementing field srtf2) as needed.
Relid represent is the main key. In one cluster table you can have several different data clusters saved.

For example in table PCL1 you have cluster RELID = TE and cluster RELID = PC. This means that in the same table SAP stores data for travel exenses under the main key "TE" and personal calendar data under the index "PC".

I understand that cluster tables are a complex topic. Instead of trying to read write data from a new relid in the indx table we have tried understanding the real implications of creating a cluster table and the risks involved. SAP has troops of developers to take care of their cluster tables. In a company with custom developments based on a cluster the situation might be different. The developers might not be aware of the pitfalls and changes in systems settings/upgrades/SPs might cause unwanted devastating side effects on your data in the cluster.

Conclusion: I would suggest that unless you are not really sure of what you are doing and you know how to handle its maintainance during the years i would discard the idea of creating a new cluster table and i would go for a transparent database table design.