Sunday, February 24, 2008

SQL
A subset of SQL's instructions form another DDL. These SQL statements define the structure of a database, including rows, columns, tables, indexes, and database specifics such as file locations. DDL SQL statements are more part of the DBMS and have large differences between the SQL variations. DDL SQL commands include the following:


CREATE statements
Create - To make a new database, table, index, or stored query. A CREATE statement in SQL creates an object inside of a relational database management system (RDBMS). The types of objects that can be created depends on which RDBMS is being used, but most support the creation TABLEs, INDices, USERs, and DATABASEs. Some systems (such as PostgreSQL) allow CREATE, and other DDL commands, inside of a transaction and thus they may be rolled back.

CREATE TABLE
Perhaps the most common CREATE command is the CREATE TABLE command. The typical usage is:

CREATE [TEMP[ORARY]] TABLE [table name] ( [column definitions] ) [table parameters].

Column Definitions: A comma-separated list consisting of any of the following

Column definition: [column name] [data type] {NULL NOT NULL} {column options}
Primary key definition: PRIMARY KEY ( [comma separated column list] )
CONSTRAINTS: {CONSTRAINT} [constraint definition]
RDBMS specific functionality
For example, the command to create a table named employees with a few sample columns would be:

CREATE TABLE employees (
id INTEGER PRIMARY KEY,
first_name CHAR(50) NULL,
last_name CHAR(75) NOT NULL,
date_of_birth DATE NULL
);

DROP statements
Drop - To destroy an existing database, table, index, or view.

A DROP statement in SQL removes an object from a relational database management system (RDBMS). The types of objects that can be dropped depends on which RDBMS is being used, but most support the dropping of tables, users, and databases. Some systems (such as PostgreSQL) allow DROP and other DDL commands to occur inside of a transaction and thus be rolled back.

The typical usage is simply DROP objecttype objectname. For example, the command to drop a table named employees would be:

DROP TABLE employees;
The DROP statement is distinct from the DELETE statement. For example, a DELETE statement might delete some (or all) data from a table, whereas a DROP statement might remove the entire table from the database.


ALTER statements
Alter - To modify an existing database object.

An ALTER statement in SQL changes the properties of an object inside of a relational database management system (RDBMS). The types of objects that can be altered depends on which RDBMS is being used.

The typical usage is ALTER objecttype objectname parameters. For example, the command to add a column named bubbles to an existing table named sink would be:

ALTER TABLE sink ADD bubbles INTEGER;

Online transaction processing

Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. The term is somewhat ambiguous; some understand a "transaction" in the context of computer or database transactions, while others (such as the Transaction Processing Performance Council) define it in terms of business or commercial transactions (see also Transaction data).[1] OLTP has also been used to refer to processing in which the system responds immediately to user requests. An automatic teller machine (ATM) for a bank is an example of a commercial transaction processing application.
The technology is used in a number of industries, including banking, airlines, mailorder, supermarkets, and manufacturing. Applications include electronic banking, order processing, employee time clock systems, e-commerce, and eTrading. The most widely used OLTP system is probably IBM's CICS.
Requirements
Online transaction processing increasingly requires support for transactions that span a network and may include more than one company. For this reason, new OLTP software uses client/server processing and brokering software that allows transactions to run on different computer platforms in a network.
In large applications, efficient OLTP may depend on sophisticated transaction management software (such as CICS) and/or database optimization tactics to facilitate the processing of large numbers of concurrent updates to an OLTP-oriented database.
For even more demanding decentralized database systems, OLTP brokering programs can distribute transaction processing among multiple computers on a network. OLTP is often integrated into service-oriented architecture and Web services.
Benefits
Online Transaction Processing has two key benefits: simplicity and efficiency.
Reduced paper trails and the faster, more accurate forecasts for revenues and expenses are both examples of how OLTP makes things simpler for businesses. It also provides a concrete foundation for a stable organization because of the timely updating. Another simplicity factor is that of allowing consumers the choice of how they want to pay, making it that much more enticing to make transactions.
OLTP is proven efficient because it vastly broadens the consumer base for an organization, the individual processes are faster, and it’s available 24/7.
Disadvantages
It is a great tool for any organization, but in using OLTP, there are a few things to be wary of: the security issues and economic costs.
One of the benefits of OLTP is also an attribute to a potential problem. The worldwide availability that this system provides to companies makes their databases that much more susceptible to intruders and hackers.
For B2B transactions, businesses must go offline to complete certain steps of an individual process, causing buyers and suppliers to miss out on some of the efficiency benefits that the system provides. As simple as OLTP is, the simplest disruption in the system has the potential to cause a great deal of problems, causing a waste of both time and money. Another economic cost is the potential for server failures. This can cause delays or even wipe out an immeasurable amount of data.

Online analytical processing

Online Analytical Processing, or OLAP (IPA: /ˈoʊlæp/), is an approach to quickly provide answers to analytical queries that are multidimensional in nature. OLAP is part of the broader category business intelligence, which also encompasses relational reporting and data mining. The typical applications of OLAP are in business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).
Databases configured for OLAP employ a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and hierarchical databases that are speedier than their relational kin.
Nigel Pendse has suggested that an alternative and perhaps more descriptive term to describe the concept of OLAP is Fast Analysis of Shared Multidimensional Information (FASMI).
The output of an OLAP query is typically displayed in a matrix (or pivot) format. The dimensions form the row and column of the matrix; the measures, the values.
Functionality
In the core of any OLAP system is a concept of an OLAP cube (also called a multidimensional cube or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.
Types
1. Multidimensional
MOLAP
MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP uses database structures that are generally optimal for attributes such as time period, location, product or account code. The way that each dimension will be aggregated is defined in advance by one or more hierarchies.
2. Relational
ROLAP
ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design.
3. Hybrid
HOLAP
There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.
Comparison
Each type has certain benefits, although there is disagreement about the specifics of the benefits between providers.
Some MOLAP implementations are prone to database explosion. Database explosion is a phenomenon causing vast amounts of storage space to be used by MOLAP databases when certain common conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data. The typical mitigation technique for database explosion is not to materialize all the possible aggregation, but only the optimal subset of aggregations based on the desired performance vs. storage trade off.
MOLAP generally delivers better performance due to specialized indexing and storage optimizations. MOLAP also needs less storage space compared to ROLAP because the specialized storage typically includes compression techniques.
ROLAP is generally more scalable. However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer.
Since ROLAP relies more on the database to perform calculations, it has more limitations in the specialized functions it can use.
HOLAP encompasses a range of solutions that attempt to mix the best of ROLAP and MOLAP. It can generally pre-process quickly, scale well, and offer good function support.
Other types
The following acronyms are also used sometimes, although they are not as widespread as the ones above
WOLAP - Web-based OLAP
DOLAP - Desktop OLAP
RTOLAP - Real-Time OLAP
SOLAP - Spatial OLAP


Data mart
A data mart (DM) is a specialized version of a data warehouse (DW). Like data warehouses, data marts contain a snapshot of operational data that helps business people to strategize based on analyses of past trends and experiences. The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data. A data mart configuration emphasizes easy access to relevant information.
In practice, the terms data mart and data warehouse each tend to imply the presence of the other in some form. However, most writers using the term seem to agree that the design of a data mart tends to start from an analysis of user needs and that a data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. A data warehouse is a central aggregation of data (which can be distributed physically); a data mart is a data repository that may or may not derive from a data warehouse and that emphasizes ease of access and usability for a particular designed purpose. In general, a data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need.
One writer, Marc Demerest, suggests combining the ideas into a Universal Data Architecture (UDA). In practice, many products and companies offering data warehouse services also tend to offer data mart capabilities or services.
There can be multiple data marts inside a single corporation; each one relevant to one or more business units for which it was designed. DMs may or may not be dependent or related to other data marts in a single corporation. If the data marts are designed using conformed facts and dimensions, then they will be related. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data. This enables each department to use, manipulate and develop their data any way they see fit; without altering information inside other data marts or the data warehouse. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions like customer, product, etc.
Design schemas
star schema or dimensional model is a fairly popular design choice, as it enables a relational database to emulate the analytical functionality of a multidimensional database.
snowflake schema
Reasons for creating a data mart
Easy access to frequently needed data
Creates collective view by a group of users
Improves end-user response time
Ease of creation
Lower cost than implementing a full Data warehouse
Potential users are more clearly defined than in a full Data warehouse
Dependent data mart
According to the Inmon school of data warehousing, a dependent data mart is a logical subset (view) or a physical subset (extract) of a larger data warehouse, isolated for one of the following reasons:
- A need for a special data model or schema: e.g., to restructure for OLAP
- Performance: to offload the data mart to a separate computer for greater efficiency or to obviate the need to manage that workload on the centralized data warehouse.
- Security: to separate an authorized data subset selectively
- Expediency: to bypass the data governance and authorizations required to incorporate a new application on the Enterprise Data Warehouse
- Proving Ground: to demonstrate the viability and ROI (return on investment) potential of an application prior to migrating it to the Enterprise Data Warehouse
- Politics: a coping strategy for IT (Information Technology) in situations where a user group has more influence than funding or is not a good citizen on the centralized data warehouse.
- Politics: a coping strategy for consumers of data in situations where a data warehouse team is unable to create a usable data warehouse.
According to the Inmon school of data warehousing, tradeoffs inherent with data marts include limited scalability, duplication of data, data inconsistency with other silos of information, and inability to leverage enterprise sources of data. as of now this is very impotant tool.

DATA WAREHOUSE
A data warehouse is a repository of an organization's computer generated data set up in a way to facilitate fast retrieval of correct data without slowing down the operational systems of the organization. Besides speed, an oft-stated goal is to make data retrieval easier than it would be if data were to be retrieved directly from the operational systems.

Inmon and Kimball data warehouse definitions
Bill Inmon, one of the first authors on the subject of data warehousing, has defined a data warehouse in the terms of the characteristics of the data repository
Subject-oriented
The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together.
Time-variant
The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time.
Non-volatile
Data in the database is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting.
Integrated
The database contains data from most or all of an organization's operational systems and this data is made consistent.
Ralph Kimball, another well known author on data warehousing, defines a data warehouse as "a copy of transaction data specifically structured for query and analysis."[1]
These two influential experts represent the traditional views on one aspect of data warehousing - whether it should be in one physical repository. Kimball, in 1997, stated that "...the data warehouse is nothing more than the union of all the data marts". In other words he advocates a "bottom-up" data warehousing methodology in which individual "data marts" store subsets of an organization's data. The data marts could be later combined into an all-encompassing data warehouse.

Inmon responded in 1998 by saying, "You can catch all the minnows in the ocean and stack them together and they still do not make a whale," indicating the opposing view that the data warehouse should be designed from the top-down to include all corporate data. In this methodology, data marts are created only after the complete data warehouse has been created.

Data warehouses versus operational systems
Through use of database normalization and an entity-relationship model, operational systems are optimized for preservance of data integrity and speed of recording of business transactions (see OLTP) . Operational systems store data one time. Data warehouses are optimized for speed of data retrieval. Frequently data in data warehouses are denormalised via a dimension-based model. Also, to speed data retrieval, data warehouse data are often stored multiple times - in their most granular form and in summaries often called data aggregates.


Data warehouse architecture
Data warehouse architecture consists of the following interconnected layers:
Operational and external database layer
The source data for the data warehouse
Informational access layer
The tools for end user reporting and analyzing of the data
Data access layer
The interface between the operational and informational access layer
Metadata layer
The data directory (which is often much more detailed than an operational system data directory).

Figure

No comments: