[The link bar feature is not available in this web]

Portland Chapter

Data Administration Management Association
P.O. Box 2343
Portland, OR 97208-2343
www.dama-pdx.org

Board Members
  
President 
Wayne Little

VP Membership
Krishna Moorthy

VP Education
James Lee

VP Communication
Geof Clark

VP Financial Services
Nancy Hallock

VP Online Services 
Atul Borkar

VP Secretarial Services

Don Rognlien

Past President
Hanteng Dai

Thursday, October 18th 2007 Meeting

"What's Wrong with ER Modeling?"

Presented by Dr. Gordon C. Everest

Time:  Registration/Refreshments -   8:30 - 9:00 a.m.
           Presentation                      -   9:00 - 11:00 a.m.
           Announcements                 - 11:00 - 11:30 a.m.

Location: 
    Standard Insurance Auditorium
                

Entity Relationship (ER) data modeling goes back to a paper by Peter Chen in 1976.  He proposed a data model diagraming scheme which would transcend thinking about physical records, by focusing on the entities and their interrelationships in the users' real world domain of interest being modeled in a database.  When relational DBMSs came on the scene, ER Diagraming was a good and natural scheme for designing relational databases.  Today, ER Diagrams and Relational DBMSs are widely used throughout the world in the development of computer-based information systems.

Data modeling is the foundation of information systems development - if you don't get the database design "right" then the systems you build will be like a house of cards, collapsing under the weight of inevitable future forces for revision, enhancement, integration, and quality improvement.  Therefore, we need a scheme to guide our data modeling efforts which helps produce data models that are a clear and accurate representation of the users' domain of discourse, and which facilitate human communication, understanding, and validation.

Many popular data modeling schemes and supporting design (CASE) tools in use today are essentially variations on ER Diagrams, including Teorey's Extended ER, Finkelstein's Information Engineering (IE, and adapted by Martin), Appleton's IDEF1X used by the U. S. Government, Barker in Oracle Designer*2000, PowerDesigner from Sybase, ERwin now from Computer Associates, Kroenke's Semantic Object Model (SOM), and even UML Class Diagrams.  This talk could equally be entitled, "What's Wrong with them?"

To answer the "What's Wrong..." question, we need to begin by asking:


- WHAT are we trying to do in Data Modeling? 
- WHY do we do data modeling? - the purpose and objectives.
- HOW do we do data modeling? - what drives or guides the process?
- By what criteria do we judge or evaluate a data modeling scheme?
- What is the essential, distinguishing characteristic which underlies all ER modeling schemes?
- How well does ER modeling satisfy the criteria for a good data modeling scheme?

Think about these questions.  Then we will try to come to some consensus before launching into a discussion of what's wrong with ER modeling, backed up with several examples.  Having convinced you that there is a problem, we will point you in the direction of a solution -- a better way of doing data modeling.

Also in preparation for this discussion, please do this simple, little exercise.  Given the following ER Diagram:

                         ENTITY
                        ---------------------------
                        |   X   |  A   |  B   | ...
                        ---------------------------

* What does a person understand about a data model when presented with an
ER diagram?
First of all, seeing an entity box labeled with the name of an entity  type, one generally presumes there are attributes contained within the box.  So it is reasonable to presume that one is really thinking about an entity type (as shown above) having an identifier, say X, and described by some attributes, A, B, etc.
    - What semantics are conveyed or presumed by this single-entity schema diagram?  by this record structure?  Assume you have ALL the information about X, A, and B, and it is ALL shown in this ER diagram.
    - It is helpful to frame the response as follows: 
    What does the diagram say (or presume)...
        * about X alone?
        * about A alone(and similarly about B, etc.)?
        * about the relationship between X and A (and similarly X-B)?
        * about the relationship between A and B?
The response should be in terms of: existence, attributes, relationships, and the characteristics of dependency/optionality (mandatory or NULLS ALLOWED), and multiplicity/?exclusivity (1:1, 1:many, M:N).  The response needs to go beyond the obvious and state all the semantics implied by this little diagram.  No counting observations which are true by construction, e.g., column names are unique, column ordering is immaterial, or attribute values may change over time.  Use only positive statements, that is, do NOT express what semantics are NOT implied by the diagram (there are hundreds of things you could say that the diagram does not say!).  Also, state only what is, not what semantics could be implied by the diagram.  You should be able to come up with at least 15 distinct semantic statements implied by the above single-entity schema diagram -- number your statements. 
 

As a start, (1) there exists a domain of values called X .
Furthermore, what does the diagram say about normalization, the Achilles heel of data modelers?  Is it normalized?  How do you know?

This talk argues that the distinction between entities and attributes is artificial, hence troublesome for the data modeler, and ultimately unnecessary for conceptual data modeling.  Prematurely clustering attributes into entity records gets the data modeler into
trouble.  Too much clustering can produce a data model with storage redundancies and processing anomalies.  Normalization is the test which helps to identify the problems stemming from incorrect clustering, and record decomposition is always the remedy to correct a violation of one of the normal forms.  Unfortunately, normalization is based upon semantics which are not explicitly represented in ER diagrams.

Data modeling schemes which use two basic constructs do not require the data modeler to a priori determine whether a piece of data represents an entity or an attribute.   They do not require the data modeler to cluster attributes into records.  Hence, normalization becomes unnecessary and irrelevant.  Clustering attributes into records should not be done in developing the conceptual data model.  Conceptual data modeling is done at a stage before "logical" data modeling.

This talk describes some problems that arise with a data modeling scheme, such as ER Diagrams, which uses three basic constructs (entity, attribute, relationship) and implicitly or explicitly clusters attributes into entity records.  Hopefully such a discussion will motivate the reader to explore and seriously consider using a data modeling scheme with two basic constructs, such as Object-Role Modeling (ORM).

Dr. Gordon C Everest is Professor Emeritus of MIS and Data Management in the Carlson School of Management at the University of Minnesota (but continues to teach as an adjunct at Univ. of Minnesota, Metropolitan State University, and others).  His Ph.D. dissertation from the University of Pennsylvania Wharton School entitled "Managing Corporate Data Resources" became the textbook entitled:  "Database Management:  Objectives, System Functions, and Administration" (McGraw-Hill, 1986, and remained in print until 2002!). He is also a contributing author of the CODASYL Systems Committee technical report entitled: "A Framework for Distributed Database Systems: Distribution Alternatives and Generic Architectures", and of the final technical report of the ANSI ASC X3 SPARC DBSSG Object-Oriented DBMS Task Group, released in 1991.  He participated in the ANSI standards community on developing a Common Unified Data Modeling Scheme and investigating Object-Oriented Database Management Systems. 

    Gordon has been teaching all about databases, database management systems, database administration, and data warehousing since he joined the University in 1970. Students learn the theory of databases, gain practical experience with real data modeling projects, and with hands-on use of data modeling tools and DBMSs.  Besides teaching about databases, he has helped many organizations and government agencies design their databases.  His approach transfers expertise to professional data architects within those organizations by having them participate in and observe the conduct of database design project meetings with the subject matter experts.

    Gordon's lecturing and research interests include advanced conceptual data modeling, logical database design methods and diagraming conventions, high-level data languages, selection and use of database management systems, object-oriented databases, data warehousing, metadata acquisition and management, CASE tools and the repository, data-centered systems development, client-server architectures, organization and functions of database administration, data privacy and security, and the legal aspects of computing.

[ Back ]

[The link bar feature is not available in this web]
Site contents copyright ? 2006 DAMA Portland Metro Chapter.