Invited Talk: Performance Modeling in MapReduce Environments: Challenges and Opportunities
Dr. Lucy Cherkasova Principal Scientist at HP Labs, Palo Alto |
Unstructured data is the largest and fastest growing portion of most enterprise's assets, often representing 70% to 80% of online data. These steep increase in volume of information being produced often exceeds the capabilities of existing commercial databases. MapReduce and its open-source implementation Hadoop represent an economically compelling alternative that offer an efficient distributed computing platform for handling large volumes of data and mining petabytes of unstructured information. It is increasingly being used across the enterprise for advanced data analytics and enabling new applications associated with data retention, regulatory compliance, e-discovery, and litigation issues.
Sharing a MapReduce cluster among multiple applications is a common practice in such environments. However, a key challenge in these shared environments is the ability to tailor and control resource allocations to different applications for achieving their performance goals and service level objectives (SLOs). Currently, there is no job scheduler for MapReduce environments that, given a job completion deadline, could allocate the appropriate amount of resources to the job so that it meets the required SLO. Benchmarking Hadoop, optimizing cluster parameters, efficient job scheduling, and workload management are new topics that create an exciting list of challenges and opportunities for modeling in MapReduce environments.
Dr. Lucy Cherkasova is a principal scientist at HP Labs, Palo Alto, where she works since 1991. Her current research interests are in developing quantitative methods for the analysis, design, and management of concurrent and distributed systems (such as internet and media applications, virtualized environments, and next generation data centers). Dr. Cherkasova has initiated, managed and led to success multiple R&D projects. She is the ACM Distinguished Scientist. She earned 5 Best Paper awards and recognized by the Certificate of Appreciation from the IEEE Computer Society. Her most recent works were on the design of new technologies for efficient management, capacity planning, and anomaly detection in internet and enterprise systems with the emphasis on performance and scalability issues.