Thursday, June 25, 2015

Big Data Management and Big Data Analysis


What's The Big Data Management?     


 Big data management is the organization, administration and governance of large volumes of both structured and unstructured data
The goal of big data management is to ensure a high level of data quality and accessibility for business intelligence and big data analytics applications. Corporations, government agencies and other organizations employ big data management strategies to help them contend with fast-growing pools of data, typically involving many terabytes or even petabytes of information saved in a variety of file formats. Effective big data management helps companies locate valuable information in large sets of unstructured data and semi-structured data from a variety of sources, including call detail records, system logs and social media sites. Most big data environments go beyond relational databases and traditional data warehouse platforms to incorporate technologies that are suited to processing and storing nontransactional forms of data. The increasing focus on collecting and analyzing big data is shaping new platforms that combine the traditional data warehouse with big data systems in a logical data warehousing architecture. As part of the process, the must decide what data must be kept for compliance reasons, what data can be disposed of and what data should be kept and analyzed in order to improve current business processes or provide a business with a competitive advantage. This process requires careful data classification so that ultimately, smaller sets of data can be analyzed quickly and productively. 

 What's The Big Data Analysis?     


 Big data analytics enables organizations to analyse a mix of structured, semi-structured and unstructured data in search of valuable business information and insights.

The primary goal of big data analytics is to help companies make more informed business decisions by enabling data scientists, predictive modellers and other analytics professionals to analyse large volumes of transaction data, as well as other forms of data that may be untapped by conventional business intelligence (BI) programs. That could include Web server logs and Internet clickstream data, social media content and social network activity reports, text from customer emails and survey responses, mobile-phone call detail records and machine data captured by sensors connected to the Internet of Things. Some people exclusively associate big data with semi-structured and unstructured data of that sort, but consulting firms like Gartner Inc. and Forrester Research Inc. also consider transactions and other structured data to be valid components of big data analytics applications.

Big data can be analysed with the software tools commonly used as part of advanced analytics disciplines such as predictive analytics, data mining, text analytics and statistical analysis. Mainstream BI software and data visualization tools can also play a role in the analysis process. But the semi-structured and unstructured data may not fit well in traditional data warehouses based on relational databases. Furthermore, data warehouses may not be able to handle the processing demands posed by sets of big data that need to be updated frequently or even continually -- for example, real-time data on the performance of mobile applications or of oil and gas pipelines. As a result, many organizations looking to collect, process and analyse big data have turned to a newer class of technologies that includes Hardtop and related tools such as YARN, MapReduce, Spark, Hive and Pig as well as NoSQL databases. Those technologies form the core of an open source software framework that supports the processing of large and diverse data sets across clustered systems.

New Ph.D. Tracks in "Big Data"


 
The University of Washington is launching Ph.D. tracks in "Big Data" through a partnership between Computer Science & Engineering and Statistics.
The Big Data tracks will be an overlay on top of departments' regular requirements, leading to a new certificate en route to the Ph.D. degree.

The CSE track will have students select three out of the following four courses
  •  Data Management: Principles of Database Management Systems, CSE 544
  •  Machine Learning: CSE 546 
  •  Data Visualization: A new course to be offered by Jeff Heer.
  •  Statistics: A new Big Data course offered jointly by CSE and Statistics, Stat 592 and CSE 5999
References: