What's
The Big Data Management?
Big data management is the organization, administration and governance
of large volumes of both structured and unstructured
data.
The goal of big data
management is to ensure a high level of data quality and accessibility for business
intelligence and big data analytics applications.
Corporations, government agencies and other organizations employ big data management strategies to help
them contend with fast-growing pools of data, typically involving many terabytes or even petabytes of information saved in a
variety of file formats. Effective big data management helps companies locate
valuable information in large sets of unstructured data and semi-structured
data from a variety of sources, including call detail records, system logs and
social media sites. Most big data environments go beyond relational databases and traditional data warehouse platforms to incorporate
technologies that are suited to processing and storing nontransactional forms
of data. The increasing focus on collecting and analyzing big data is shaping
new platforms that combine the traditional data warehouse with big data systems in a
logical data warehousing architecture. As part of the process, the must decide
what data must be kept for compliance reasons, what data can be disposed of and
what data should be kept and analyzed in order to improve current business
processes or provide a business with a competitive advantage. This process
requires careful data classification so that ultimately,
smaller sets of data can be analyzed quickly and productively.
What's
The Big Data Analysis?
Big data analytics enables organizations to
analyse a mix of structured, semi-structured and unstructured data in search of
valuable business information and insights.
The primary goal of big data analytics is to help
companies make more informed business decisions by enabling data
scientists, predictive modellers and other analytics professionals
to analyse large volumes of transaction data, as well as other forms of data
that may be untapped by conventional business intelligence (BI)
programs. That could include Web server logs and Internet clickstream
data, social media content and social network activity reports, text from
customer emails and survey responses, mobile-phone call detail records and
machine data captured by sensors connected to the Internet of
Things. Some people exclusively associate big data with
semi-structured and unstructured
data of that sort, but consulting firms like Gartner Inc. and
Forrester Research Inc. also consider transactions and other structured data to
be valid components of big data analytics applications.
Big data can be analysed with the software tools
commonly used as part of advanced
analytics disciplines such as predictive
analytics, data mining,
text
analytics and statistical
analysis. Mainstream BI software and data
visualization tools can also play a role in the analysis process.
But the semi-structured and unstructured data may not fit well in traditional data
warehouses based on relational
databases. Furthermore, data warehouses may not be able to handle
the processing demands posed by sets of big data that need to be updated
frequently or even continually -- for example, real-time data on the
performance of mobile applications or of oil and gas pipelines. As a result,
many organizations looking to collect, process and analyse big data have turned
to a newer class of technologies that includes Hardtop
and related tools such as YARN,
MapReduce,
Spark,
Hive
and Pig
as well as NoSQL
databases. Those technologies form the core of an open source software
framework that supports the processing of large and diverse data sets across
clustered systems.
New Ph.D. Tracks in "Big Data"
The University of Washington is launching Ph.D. tracks in "Big
Data" through a partnership between Computer Science & Engineering and
Statistics.
The Big Data tracks will be an
overlay on top of departments' regular requirements, leading to a new
certificate en route to the Ph.D. degree.
The CSE track will have students select three
out of the following four courses
- Data Management: Principles of Database Management Systems, CSE 544
- Data Visualization: A new course to be offered by Jeff Heer.
http://searchdatamanagement.techtarget.com/definition/big-data-management
http://escience.washington.edu/blog/new-phd-tracks-big-data
https://www.cs.washington.edu/students/grad/specializedtracks/bigdata
http://escience.washington.edu/blog/new-phd-tracks-big-data
https://www.cs.washington.edu/students/grad/specializedtracks/bigdata