•Information systems process data and convert them into information .Data can be stored, read, updated/modified, and deleted. At run time of software systems, data is stored in main memory, which is volatile .Data should be stored in non-volatile storage for persistence
•There are two main ways of storing data
➤Files
➤Databases
•There are many formats for storing data
Plain-text, XML, JSON, tables, text files, images, etc…
⟹Persistence is "the continuance of an effect after its cause is removed". In the context of storing data in a computer system, this means that the data survives after the process with which it was created has ended. In other words, for a data store to be considered persistent, it must write to non-volatile storage.
Data
➱Data are row facts
➱Can be processed (by application components) and converted to meaningful information
Database
➱Databases are created and managed in database servers ➱SQL is used to process databases
•DDL –CRUD databases
•DML –CRUD data in databases
A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.
Database Server
The term database server may refer to both hardware and software used to run a database, according to the context. As software, a database server is the back-end portion of a database application, following the traditional client-server model. This back-end portion is sometimes called the instance. It may also refer to the physical computer used to host the database.
Database Management System
The database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS software additionally encompasses the core facilities provided to administer the database.
➛database is a collection of data organized in a manner that allows access, retrieval, and use of that data
➛file is a collection of related records stored on a storage medium such as a hard disk or optical disc
Discuss different arrangements of data, giving examples for each
Data
•Data are row facts
•Can be processed (by application components) and convert to meaningful information
Data arrangement
•Un-structured
•Semi-structured
•Structured
Data arrangement
•Data warehouses
•Big Data
•Volume
•Variety
•Velocity
Explain different types of databases, providing examples for their use
Database
•Databases are created and managed in database servers •SQL is used to process databases
•DDL –CRUD databases
•DML – CRUD data in databases
Database type
•Hierarchical databases
•Network databases
•Relational databases
•Non-relational databases (NoSQL)
•Object-oriented databases
•Graph databases
•Document databases
Compare and contrast data warehouse with Big data
Data Warehouse is an architecture of data storing or data repository.
Big Data is a technology to handle huge data and prepare the repository .
Data warehouse only handles structure data(relational or not relational), but big data can handle structure, non-structure, semi-structured data.
Explain how the application components communicate with files and databases
•Files and DBs are external components
•They are existing outside the software system
•Software can connect to the files/DBs to perform CRUD operations on data
• File – File path, URL
•DB – connection string
•To process data in DB
•SQL statements
•Prepared statements
•Callable statements
Differentiate the SQL statements,
Prepared statements, and Callable statements
SQL statements
➤Execute standard SQL statements from the application
(Used to execute normal SQL queries.)
Statement stmt= con.createStatement();
stmt.executeUpdate(“update STUDENT set NAME =”+ name + “ where ID =”+ id + “)”;
Prepared statements
➤The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters.
➤PreparedStatement extends Statement Interface to provide extra feature. It(PreparedStatement) is useful when firing the same SQL statement is required. PreparedStatement Interface accepts input parameter at runtime. PreparedStatement Objects are percompiled hence there execution is faster.
(Used to execute dynamic or parameterized SQL queries)
PreparedStatementpstmt= con.prepareStatement("update STUDENT set NAME = ? where ID = ?");
pstmt.setString(1, "MyName");
pstmt.setInt(2, 111);
pstmt.executeUpdate();
Callable statements
➤Execute stored procedures
CallableStatementcstmt= con.prepareCall("{call anyProcedure(?, ?, ?)}"); cstmt.execute();
Argue the need for ORM, explaining the development with and without ORM
- If you’re going to use ORM, you should make your model objects as simple as possible. Be more vigilant about simplicity to make sure your model objects really are just Plain ol’ Data. Otherwise you may end up wrestling with your ORM to make sure the persistence works like you expect it to, and it’s not looking for methods and properties that aren’t actually there.
- If you’re not going to use ORM, you should probably define DAOs or persistence and query methods to avoid coupling the model layer with the persistence layer. Otherwise you end up with SQL in your model objects and a forced dependency on your project.
- If you know your data access patterns are generally going to be simple (like basic object retrieval) but you don’t know all of them up front, you should think about using an ORM. While ORMs can make building complex queries confusing to build and difficult to debug, an ORM can save you huge amounts of time if your queries are generally pretty simple.
- If you know your data access pattern is going to be complex or you plan to use a lot of database-specific features, you may not want to use an ORM. While many ORMs (like Hibernate) let you access the underlying data source connection pretty easily , if you know you’re going to have to throw around a lot of custom SQL, you may not get a lot of value out of ORM to begin with because you’re constantly going to have to break out of it.
- map data in objects to the tables -
- Object Relational Mapping (ORM)
ORM implementations in JAVA
•Java Beans
•JPA
PROS
- Facilitates implementing domain model pattern.
- Huge reduction in code.
- Takes care of vendor specific code by itself.
- Cache Management — Entities are cached in memory thereby reducing load on the DB.
CONS
- Increased startup time due to metadata preparation( not good for desktop applications).
- Huge learning curve without ORM.
- Relatively hard to fine tune and debug generated SQL.Not suitable for applications without a clean domain object model.
Discuss the POJO, Java Beans, and JPA, indicating their similarities and differences.
POJO
POJO stands for Plain Old Java Object. It is an ordinary Java object, not bound by any special restriction other than those forced by the Java Language Specification and not requiring any class path. POJOs are used for increasing the readability and re-usability of a program. POJOs have gained most acceptance because they are easy to write and understand. They were introduced in EJB 3.0 by Sun microsystems.
A POJO should not:
•Extend pre-specified classes.
•Implement pre-specified interfaces.
•Contain pre-specified annotations.
Beans
•Beans are special type of Pojos. There are some restrictions on POJO to be a bean.
•All JavaBeans are POJOs but not all POJOs are JavaBeans. •Serializable i.e. they should implement Serializable interface. Still some POJOs who don’t implement Serializable interface are called POJOs because Serializable is a marker interface and therefore not of much burden.
•Fields should be private. This is to provide the complete control on fields.
•Fields should have getters or setters or both.
•A no-argconstructor should be there in a bean.
•Fields are accessed only by constructor or getter setters.
•Uses
•An API/specification for ORM
•POJO classes
•XML based mapping file (represent the DB)
•A provider (implementation of JPA)
Identify the ORM tools available for different development platforms (Java, PHP, and .Net)
Discuss the need for NoSQL indicating the benefits, also explain different types of NoSQL databases
Not Only SQL (NOSQL)
•Relational DBs are good for structured data
•For semi-structured and un-structured data, some other types of DBs can be used
➮Key-value stores
➮Document databases
➮Wide-column stores
➮Graph stores
A NoSQL database is exactly the type of database that can handle the sort of unstructured, messy and unpredictable data that our system of engagement requires. NoSQL is a whole new way of thinking about a database. NoSQL is not a relational database.It is not built on tables and does not employ SQL to manipulate data.
Benefits of NoSQL
•When compared to relational databases, NoSQL databases are more scalable and provide superior performance,and their data model addresses several issues that the relational model is not designed to address
•Large volumes of rapidly changing structured, semi-structured, and unstructured data
Types of NoSQL databases
➮Key-Value Store – It has a Big Hash Table of keys & values
➮Document-based Store- It stores documents made up of tagged elements
➮Column-based Store- Each storage block contains data from only one column
➮Graph Base NoSQL Database
1.Key Value Store NoSQL Database
The schema-less format of a key value database like Riak is just about what you need for your storage needs. The key can be synthetic or auto-generated while the value can be String, JSON, BLOB (basic large object) etc.
The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular item of data. A bucket is a logical group of keys – but they don’t physically group the data. There can be identical keys in different buckets.
2.Document Store NoSQL Database
The data which is a collection of key value pairs is compressed as a document store quite similar to a key-value store, but the only difference is that the values stored (referred to as “documents”) provide some structure and encoding of the managed data. XML, JSON (Java Script Object Notation), BSON (which is a binary encoding of JSON objects) are some common standard encodings
3.Column Store NoSQL Database
In column-oriented NoSQL database, data is stored in cells grouped in columns of data rather than as rows of data. Columns are logically grouped into column families. Column families can contain a virtually unlimited number of columns that can be created at runtime or the definition of the schema. Read and write is done using columns rather than rows.
4.Graph Base NoSQL Database
In a Graph Base NoSQL Database, you will not find the rigid format of SQL or the tables and columns representation, a flexible graphical representation is instead used which is perfect to address scalability concerns. Graph structures are used with edges, nodes and properties which provides index-free adjacency. Data can be easily transformed from one model to the other using a Graph Base NoSQL database.
Discuss what Hadoop is, explaining the
core concepts of it
Hadoop
•The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
•It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
•Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Hadoop core concepts
•Hadoop Distributed File System (HDFS™):A distributed file system that provides high-throughput access to application data
•Hadoop YARN: A framework for job scheduling and cluster resource management.
•Hadoop Map Reduce: A YARN-based system for parallel processing of large data sets.
Explain the concept of IR, identifying tools for
IR
•Data in the storages should be fetched, converted into information, and produced for proper use
•Information is retrieved via search queries
➼Keyword search
➼Full-text search
•The output can be
➼Text
➼Multimedia
The information retrieval process should be
➧Fast/performance
➧Scalable
➧Efficient
➧Reliable/Correct






No comments:
Post a Comment