# Data distribution

Description this assignment, we focus on describing data sets using numerical parameters. These parameters are referred to in several different ways. Most textbooks refer to these parameters as measures of the center, or measures of variation. However, statistical software like Excel or SPSS often refer to these measures as descriptive statistics. The essential idea is that we are trying to describe a data set using a small number of measures, somewhere between 2 and 10. This assignment should be completed by constructing two files – a Word file and a PowerPoint file. For the first file, which you will submit as a Word document, write a paper that uses the following structure: Begin with a one or two-paragraph introduction that summarizes the meaning of the reading material. Answer all of the questions included in Parts 1 and 2 below. Be sure to answer questions using complete sentences and show all work in your calculations. Provide a written conclusion, when appropriate, for the problem that you are addressing. The last part of your paper should include a paragraph or two that explains the information that you learned in the assignment. Be sure to include two scholarly peer-reviewed references. Part 1 (10 points) Describe three different ways to measure the center of a data set. Give an example where one measure of the center is preferred over another. Explain the quartiles of a distribution in terms of percentiles. Describe the different components of a box plot. Use the items included in the five-number summary. Describe the IQR rule for identifying outliers. Then, create a mock data set with at least 12 data points and with at least two outliers. Justify the outliers by applying the IQR rule. Write a short paragraph that defines standard deviation explains its importance. Explain the difference between population standard deviation and sample standard deviation. Find the sample standard deviation of the following data sets {10, 12, 16, 20, 22}. Show all steps of the calculation. The prices of a gallon of gasoline at 12 New York City gas stations in August 2016 were2: $2.15, $2.17, $2.19, $2.19, $2.39, $2.45, $2.49, $2.59, $2.79, $2.79, $2.89, $3.06, $3.99 Based on this data set of 12 gas stations: a) Find the mean price of gasoline. b) Find the median price of gasoline. c) Find the range of gasoline prices. d) Find the five-number summary for gasoline prices. What is the 68-95-99.7 rule for a normal distribution? Find three different items that are normally distributed. Give references used. What is meant by the phrase standard normal distribution? Explain what a z-score is and why it is important. How can one determine from a histogram if a distribution is approximately normal? Part 2 Use SPSS or Excel to obtain the following. Place your results in your Word file. a) Find the five-number summary for the following data below. Hint, use the Excel statistic function called QUARTILE. b) Find the IQR and use it to determine if there are any outliers. 30 28 33 29 37 39 57 27 16 25 35 37 37 38 34 25 21 3 34 35 Use Excel to determine the mean and sample standard deviation for the data given in Problem 1 of Part 2. Hint, use the Excel functions AVERAGE and STDEV. The supply manager of a university orders all supplies, including items for the athletics department. Before the football season, he has to develop a separate inventory list for the football team. This list will include supplies for both the players and the department itself. Although the department budget is set in terms of inventory (based on historical data), the football team’s needs change based on the size of the team as well as its individual players. The historical data shows that the number of gallons of Gatorade consumed by a football team during a game follows a normal distribution with mean 20. The standard deviation is 3. To help with the decision of how much Gatorade to order for each game, the supply manager would like to know the following information. The probability that the number of gallons consumed will be: a) Greater than 18 gallons b) Between 22 and 25 gallons c) Less than 16 gallons Obtain the probabilities by first finding the appropriate z-score, then use the Standard Normal Cumulative Proportion table in the textbook. Body Mass Index may be determined by taking your weight in kilograms and then dividing by the square of your height in meters. The National Center for Health Statistics found that BMI of American men, ages 20 to 29 follows an approximate normal distribution with mean 26.8 and standard deviation 5.2 (Cheryl D. Fryar et al. , “Anthropometric reference data for children and adults: United States, 2007-2010,” Vital and Health Statistics, Series 11, Number 252 (October, 2012), at www.cdc.gov/nchs). a) People with BMI less than 18.5 are often classified as “underweight.” What percent of men aged 20 to 29 are underweight by this criterion? b) People with BMI more than 30 are often classified as “obese.” What percent of men aged 20 to 29 are underweight by this criterion? c) What percent of men aged 20 to 29 have a BMI between 25 and 30? d) What percent of men aged 20 to 29 have a BMI between 18.5 and 25? The second file consists of preparing a PowerPoint presentation assignment is described below in Part 3 below. Part 3 (5 points) Read the article, “Continuous Quality Improvement by Statistical Process Control,” by Gejdos in Procedia Economics and Finance 34(2015) 565 – 572. Then, prepare a PowerPoint presentation that explains the role of the normal distribution in quality improvement. Pay particular attention to the role of the normal distribution as described by the author. Part 1 and Part 2 Paper Length: 5 – 7 pages References: A minimum of two scholarly peer-reviewed references Part 3 PowerPoint Length: 10-15 slides, with additional talking points. Talking points should include 100 to 150 words per slide; a minimum of two scholarly resources.