Showing posts with label database. Show all posts
Showing posts with label database. Show all posts

DAY 5 : DISTRIBUTED DBMS



What is Distributed Database Management System ?

A distributed database management system is a software system that permits the management of the distributed database and makes the distribution transparent to the users. A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. Sometimes distributed database system is used to refer jointly to the distributed database and the distributed DBMS.
Distributed database management systems can be architected as client-server systems or peer-to-peer ones. In the former, one or more servers manage the database and handle user queries that are passed on by the clients. The clients usually have limited database functionality and normally pass the SQL queries over to the servers for processing. In peer-to-peer systems, each site has equal functionality for processing.

A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.
Collections of data (eg. in a database) can be distributed across multiple physical locations. A distributed database is distributed into separate partitions/fragments. Each partition/fragment of a distributed database may be replicated (ie. redundant fail-overs, RAID like).
Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does definitely depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database. And hence the price the business is willing to spend on ensuring data security, consistency and integrity.

Basic architecture

A database server is the software managing a database, and a client is an application that requests information from a server. Each computer in a system is a node. A node in a distributed database system act as a client, a server, or both, depending on the situation.
Horizontal fragments
subsets of tuples (rows) from a relation (table).
Vertical fragments
subsets of attributes (columns) from a relation (table).
Mixed fragment
a fragment which is both horizontally and vertically fragmented.
Homogeneous distributed database
uses one DBMS (eg: Oracle).
Heterogeneous distributed database
uses multiple DBMS's (eg: Oracle and MS-SQL and PostgreSQL).
Users access the distributed database through:
Local applications
applications which do not require data from other sites.
Global applications
applications which do require data from other sites.

 

Important considerations

Care with a distributed database must be taken to ensure that:
  • The distribution is transparent — users must be able to interact with the system as if it was one logical system. This applies to the systems performance, and methods of access amongst other things.
  • Transactions are transparent — each transaction must maintain database integrity across multiple databases. Transactions must also be divided into subtransactions, each subtransaction affecting one database system.

Advantages of distributed databases

  • Reflects organizational structure — database fragments are located in the departments they relate to.
  • Local autonomy — a department can control the data about them (as they are the ones familiar with it.)
  • Improved availability — a fault in one database system will only affect one fragment, instead of the entire database.
  • Improved performance — data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won't affect other modules of the database in a distributed database.)
  • Economics — it costs less to create a network of smaller computers with the power of a single large computer.
  • Modularity — systems can be modified, added and removed from the distributed database without affecting other modules (systems).

Disadvantages of distributed databases

  • Complexity — extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database — for example, joins become prohibitively expensive when performed across multiple systems.
  • Economics — increased complexity and a more extensive infrastructure means extra labour costs.
  • Security — remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (eg: by encrypting the network links between remote sites).
  • Difficult to maintain integrity — in a distributed database enforcing integrity over a network may require too much networking resources to be feasible.
  • Inexperience — distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice.

DAY 4 : What is Data Warehouse

What is Data Warehouse ?



A Data warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated.This makes it much easier and more efficient to run queries over data that originally came from different sources".Another definition for data warehouse is : " A data warehouse is a logical collection of information gathered from many different operational databases used to create business intelligence that supports business analysis activities and decision-making tasks, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favour efficient data analysis and reporting (especially OLAP)". Generally, data warehousing is not meant for current "live" data, although 'virtual' or 'point-to-point' data warehouses can access operational data. A 'real' data warehouse is generally preferred to a virtual DW because stored data has been validated and is set up to provide reliable results to common types of queries used in a business. History of data warehousing
In the 1990's as organizations of scale began to need more timely data about their business, they found that traditional information systems technology was simply too cumbersome to provide relevant data efficiently and quickly. Completing reporting requests could take days or weeks using antiquated reporting tools that were designed more or less to 'execute' the business rather than 'run' the business.
From this idea, the data warehouse was born as a place where relevant data could be held for completing strategic reports for management. The key here is the word 'strategic' as most executives were less concerned with the day to day operations than they were with a more overall look at the model and business functions.
As with all technology, over the course of the latter half of the 20th century, we saw increased numbers and types of databases. Many large businesses found themselves with data scattered across multiple platforms and variations of technology, making it almost impossible for any one individual to use data from multiple sources. A key idea within data warehousing is to take data from multiple platforms/technologies (As varied as spreadsheets, DB2 databases, IDMS records, and VSAM files) and place them in a common location that uses a common querying tool. In this way operational databases could be held on whatever system was most efficient for the operational business, while the reporting / strategic information could be held in a common location using a common language. Data Warehouses take this even a step further by giving the data itself commonality by defining what each term means and keeping it standard. (An example of this would be gender which can be referred to in many ways, but should be standardized on a data warehouse with one common way of referring to each sex.)
All of this was designed to make decision support more readily available and without affecting day to day operations. One aspect of a data warehouse that should be stressed is that it is NOT a location for ALL of a businesses data, but rather a location for data that is 'interesting'. Data that is interesting will assist decision makers in making strategic decisions relative to the organization's overall mission.

Design of data warehouses

Data warehouses often hold large amounts of information which are sometimes subdivided into smaller logical units called dependent data marts. Dependent Datamarts allow for easier reporting by keeping relevant data together in one location.
Usually, two basic ideas guide the creation of a data warehouse:
  • Integration of data from distributed and differently structured databases, which facilitates a global overview and comprehensive analysis in the data warehouse.
  • Separation of data used in daily operations from data used in the data warehouse for purposes of reporting, decision support, analysis and controlling.
Since OLTP databases contain large volumes of data, it is very critical to unload data quickly without adding significant overhead to production database. Periodically, one imports data from enterprise resource planning (ERP) systems and other related business software systems into the data warehouse for further processing. It is common practice to "stage" data prior to merging it into a data warehouse. In this sense, to "stage data" means to queue it for preprocessing, usually with an ETL tool. The preprocessing program reads the staged data (often a business's primary OLTP databases), performs qualitative preprocessing or filtering (including denormalization, if deemed necessary), and writes it into the warehouse.

Dimensions and measures

A data warehouse is created by analyzing ways to categorize data using dimensions and ways to summarize data using measures. Dimensions can be used to filter and navigate summarised data by excluding results or by displaying data in different reporting styles (cross-tabbing). Measures are performance metrics which a business is interested in following up, these are mainly sum & averages of figures collected by OLTP systems. There seems to be some misunderstanding as to how data warehouses should be designed, since in most cases, technical individuals do not really understand the broader scope of the business of their organisations.

Building blocks or Components

  1. Source Data
  2. Data Staging
  3. Data Storage
  4. Information Delivery
  5. Metadata
  6. Management and Control

Reporting

Business Intelligence reports (e.g., MIS reports) may then be generated from the data managed by the warehouse. In this way the data warehouse supplies the data for and supports the business intelligence tools that an organization might use.

DAY-1: DEFINITION



What is Database?

A database is a collection of information organized into interrelated tables of data and specifications of data objects. Databases are designed to offer an organized mechanism for storing, managing and retrieving information. They do so through the use of tables. Database tables consist of columns and rows. Each column contains a different type of attribute and each row corresponds to a single record. For example, imagine that we were building a database table that contained Student_id, Name, Program_id, City, Division and Country.

Then we have simply start adding rows underneath those columns that contained the data we were planning to store. See the below schema and data table as per above discussion. This example will clear your concept of the relation between schema and data tables.

Schema:
Entity/Field Name
Data-Type
Student_id
Number(10)
Name
Varchar2(30)
Program_id
Varchar2(15)
City
Varchar2(15)
Division
Varchar2(15)
Country
Varchar2(15)







                                                            Schema: 1

The above schema explains that the student_id must be number data type and it will take up to 10 (ten) digit. So, greater then or equal to 11 (eleven) digit numeric value will not accept as a Student_id.  The Name field will accept maximum 30 (thirty) character. And all the remaining entity will follow rules as shown in the Data-Type Column in the above schema. Schema represents the physical infrastructure of a database table where we may keep our necessary data.

Data Table:

Student_id
Name
Program_id
City
Division
Country
200819222
Adina
English
Noakhali
Chittagong
Bangladesh
200819223
Nishi
English
Noakhali
Chittagong
Bangladesh
200819224
Iasha
Physics
Bogura
Rajshahi
Bangladesh
200819225
Asha
Chemistry
Dinajpur
Rajshahi
Bangladesh
                                                            Table: 1

The above table contains some sample data. We may easily understand that the all the data kept in above table are interrelated. The data belongs to students of an Institute. We may easily understand and get some information of a particular student by examining the records. Click here to see more examples of Insert Command, which will help to you learn of insert/store Value to a table.


What is Table?

A table in a relational database is a predefined format of rows and columns that define an entity. Database tables are composed of individual columns corresponding to the attributes of the object. A database may consist of many tables. In the above schema, student is an object. Student_id, name and etc are the entity. Entities are represents in column wise.  All the values of the entity are represents row wise as like above Data-Table. So, we may say finally that table is used to keep the records.

What is Record?

A database record consists of one set of tuples (variable/entity) for a given relational table. In a relational database, records correspond to rows in each table. In a relational database, a row consists of one set of attributes (or one tuple) corresponding to one instance of the entity that a table schema describes. The Schema 1 describes the tuples of student table and the table 1 shows the data of student table which correspondence the record of each student. Each record at table 1 illustrates row wise. In simply, a collection of fields is called a record.

What is Object?

A single data item related to a database object. The database schema associates one or more attributes with each database entity.

What is Field?

In database systems, fields are the smallest units of information you can access. In spreadsheets, fields are called cells. In database management systems, a field can be required, optional, or calculated. A required field is one in which you must enter data, while an optional field is one you may leave blank. A calculated field is one whose value is derived from some formula involving other fields. You do not enter data into a calculated field; the system automatically determines the correct value.

What is Attribute?

In database management systems, the term attribute is sometimes used as a synonym for field. In database systems, a field can have various attributes. For example, if it contains numeric data, it has the numeric attribute. Most fields have certain attributes associated with them. For example, some fields are numeric whereas others are textual, some are long, while others are short. In addition, every field has a name, called the field name.

What is data?

Data is a distinct piece of information, usually formatted in a special way. All software is divided into two general categories: data and programs. In Table 1, “200819222” is found in a row (cell). It is actually data. It indicates that is an Id of a student.

What is data files ?

In database management systems, data files are the files that store the database information, whereas other files, such as index files and data dictionaries, store administrative information, known as metadata.

Now, let us concentration on the  parts of SQL, which will be path of understanding SQL commands. Click Here to Visit Day-2 for these.



Popular Posts