Data Science

eBook - The Executive Summary - A Technical Book for Non-Technical Professionals

66,99 €
(inkl. MwSt.)
E-Book Download

Download

Bibliografische Daten
ISBN/EAN: 9781119544166
Sprache: Englisch
Umfang: 208 S., 2.48 MB
Auflage: 1. Auflage 2020
E-Book
Format: PDF
DRM: Adobe DRM

Beschreibung

Tap into the power of data science with this comprehensive resource for non-technical professionals

Data Science: The Executive Summary A Technical Book for Non-Technical Professionals is a comprehensive resource for people in non-engineer roles who want to fully understand data science and analytics concepts. Accomplished data scientist and author Field Cady describes both the "business side" of data science, including what problems it solves and how it fits into an organization, and the technical side, including analytical techniques and key technologies.

Data Science: The Executive Summarycovers topics like:

Assessing whether your organization needs data scientists, and what to look for when hiring themWhen Big Data is the best approach to use for a project, and when it actually ties analysts handsCutting edge Artificial Intelligence, as well as classical approaches that work better for many problemsHow many techniques rely on dubious mathematical idealizations, and when you can work around them

Perfect for executives who make critical decisions based on data science and analytics, as well as mangers who hire and assess the work of data scientists,Data Science: The Executive Summary also belongs on the bookshelves of salespeople and marketers who need to explain what a data analytics product does. Finally, data scientists themselves will improve their technical work with insights into the goals and constraints of the business situation.

Autorenportrait

Field Cady, is a data scientist and author in the Seattle area. Most of his career has focused on consulting, for clients of all sizes in a range of industries. More recently he focused on using AI to mine scientific literature at the Allen Institute for Artificial Intelligence. His previous book,The Data Science Handbook, was published in 2017. His work has been covered inWired, MIT Press and theWall Street Journal among others.

Inhalt

1 Introduction1

1.1 Why Managers Need to Know About Data Science 1

1.2 The New Age of Data Literacy 2

1.3 Data-Driven Development 3

1.4 How to Use this Book 4

2 The Business Side of Data Science7

2.1 What Is Data Science? 7

2.1.1 What Data Scientists Do 7

2.1.2 History of Data Science 9

2.1.3 Data Science Roadmap 12

2.1.4 Demystifying the Terms: Data Science, Machine Learning, Statistics, and Business Intelligence 13

2.1.4.1 Machine Learning 13

2.1.4.2 Statistics 14

2.1.4.3 Business Intelligence 15

2.1.5 What Data Scientists Dont (Necessarily) Do 15

2.1.5.1 Working Without Data 16

2.1.5.2 Working with Data that Cant Be Interpreted 17

2.1.5.3 Replacing Subject Matter Experts 17

2.1.5.4 Designing Mathematical Algorithms 18

2.2 Data Science in an Organization 19

2.2.1 Types of Value Added 19

2.2.1.1 Business Insights 19

2.2.1.2 Intelligent Products 19

2.2.1.3 Building Analytics Frameworks 20

2.2.1.4 Offline Batch Analytics 21

2.2.2 One-Person Shops and Data Science Teams 21

2.2.3 Related Job Roles 22

2.2.3.1 Data Engineer 22

2.2.3.2 Data Analyst 22

2.2.3.3 Software Engineer 23

2.3 Hiring Data Scientists 25

2.3.1 Do I Even Need Data Science? 26

2.3.2 The Simplest Option: Citizen Data Scientists 27

2.3.3 The Harder Option: Dedicated Data Scientists 28

2.3.4 Programming, Algorithmic Thinking, and Code Quality 28

2.3.5 Hiring Checklist 31

2.3.6 Data Science Salaries 32

2.3.7 Bad Hires and Red Flags 32

2.3.8 Advice with Data Science Consultants 34

2.4 Management Failure Cases 36

2.4.1 Using Them as Devs 36

2.4.2 Inadequate Data 36

2.4.3 Using Them as Graph Monkeys 37

2.4.4 Nebulous Questions 37

2.4.5 Laundry Lists of Questions Without Prioritization 38

3 Working with Modern Data41

3.1 Unstructured Data and Passive Collection 41

3.2 Data Types and Sources 42

3.3 Data Formats 43

3.3.1 CSV Files 43

3.3.2 JSON Files 44

3.3.3 XML and HTML 46

3.4 Databases 47

3.4.1 Relational Databases and Document Stores 48

3.4.2 Database Operations 49

3.5 Data Analytics Software Architectures 50

3.5.1 Shared Storage 51

3.5.2 Shared Relational Database 52

3.5.3 Document Store+Analytics RDB 52

3.5.4 Storage+Parallel Processing 53

4 Telling the Story, Summarizing Data55

4.1 Choosing What to Measure 56

4.2 Outliers, Visualizations, and the Limits of Summary Statistics: A Picture IsWorth a Thousand Numbers 58

4.3 Experiments, Correlation, and Causality 60

4.4 Summarizing One Number 62

4.5 Key Properties to Assess: Central Tendency, Spread, and Heavy Tails 63

4.5.1 Measuring Central Tendency 63

4.5.1.1 Mean 63

4.5.1.2 Median 64

4.5.1.3 Mode 65

4.5.2 Measuring Spread 65

4.5.2.1 Standard Deviation 65

4.5.2.2 Percentiles 66

4.5.3 Advanced Material: Managing Heavy Tails 67

4.6 Summarizing Two Numbers: Correlations and Scatterplots 68

4.6.1 Correlations 68

4.6.1.1 Pearson Correlation 71

4.6.1.2 Ordinal Correlations 71

4.6.2 Mutual Information 72

4.7 Advanced Material: Fitting a Line or Curve 72

4.7.1 Effects of Outliers 75

4.7.2 Optimization and Choosing Cost Functions 76

4.8 Statistics: How to Not Fool Yourself 77

4.8.1 The Central Concept: Thep-Value 78

4.8.2 Reality Check: Picking a Null Hypothesis and Modeling Assumptions 80

4.8.3 Advanced Material: Parameter Estimation and Confidence Intervals 81

4.8.4 Advanced Material: Statistical TestsWorth Knowing 82

4.8.4.1𝜒2-Test 83

4.8.4.2T-test 83

4.8.4.3 Fishers Exact Test 84

4.8.4.4 Multiple Hypothesis Testing 84

4.8.5 Bayesian Statistics 85

4.9 Advanced Material: Probability Distributions Worth Knowing 86

4.9.1 Probability Distributions: Discrete and Continuous 87

4.9.2 Flipping Coins: Bernoulli Distribution 89

4.9.3 Adding Coin Flips: Binomial Distribution 89

4.9.4 Throwing Darts: Uniform Distribution 91

4.9.5 Bell-Shaped Curves: Normal Distribution 91

4.9.6 Heavy Tails 101: Log-Normal Distribution 92

4.9.7 Waiting Around: Exponential Distribution and the Geometric Distribution 93

4.9.8 Time to Failure: Weibull Distribution 94

4.9.9 Counting Events: Poisson Distribution 95

5 Machine Learning101

5.1 Supervised Learning, Unsupervised Learning, and Binary Classifiers 102

5.1.1 Reality Check: Getting Labeled Data and Assuming Independence 103

5.1.2 Feature Extraction and the Limitations of Machine Learning 104

5.1.3 Overfitting 105

5.1.4 Cross-Validation Strategies 106

5.2 Measuring Performance 107

5.2.1 Confusion Matrices 108

5.2.2 ROC Curves 108

5.2.3 Area Under the ROC Curve 110

5.2.4 Selecting Classification Cutoffs 110

5.2.5 Other Performance Metrics 111

5.2.6 Lift Curves 112

5.3 Advanced Material: Important Classifiers 113

5.3.1 Decision Trees 113

5.3.2 Random Forests 115

5.3.3 Ensemble Classifiers 116

5.3.4 Support Vector Machines 116

5.3.5 Logistic Regression 119

5.3.6 Lasso Regression 121

5.3.7 Naive Bayes 121

5.3.8 Neural Nets 123

5.4 Structure of the Data: Unsupervised Learning 124

5.4.1 The Curse of Dimensionality 125

5.4.2 Principal Component Analysis and Factor Analysis 125

5.4.2.1 Scree Plots and Understanding Dimensionality 128

5.4.2.2 Factor Analysis 128

5.4.2.3 Limitations of PCA 129

5.4.3 Clustering 129

5.4.3.1 Real-World Assessment of Clusters 130

5.4.3.2k-means Clustering 131

5.4.3.3 Advanced Material: Other Clustering Algorithms 132

5.4.3.4 Advanced Material: Evaluating Cluster Quality 133

5.5 Learning as You Go: Reinforcement Learning 135

5.5.1 Multi-Armed Bandits and𝜀-Greedy Algorithms 136

5.5.2 Markov Decision Processes and Q-Learning 137

6 Knowing the Tools141

6.1 A Note on Learning to Code 141

6.2 Cheat Sheet 142

6.3 Parts of the Data Science Ecosystem 143

6.3.1 Scripting Languages 144

6.3.2 Technical Computing Languages 145

6.3.2.1 Pythons Technical Computing Stack 145

6.3.2.2 R 146

6.3.2.3 Matlab and Octave 146

6.3.2.4 Mathematica 147

6.3.2.5 SAS 147

6.3.2.6 Julia 147

6.3.3 Visualization 147

6.3.3.1 Tableau 148

6.3.3.2 Excel 148

6.3.3.3 D3.js 148

6.3.4 Databases 148

6.3.5 Big Data 149

6.3.5.1 Types of Big Data Technologies 150

6.3.5.2 Spark 151

6.3.6 Advanced Material: The Map-Reduce Paradigm 151

6.4 Advanced Material: Database Query Crash Course 153

6.4.1 Basic Queries 153

6.4.2 Groups and Aggregations 154

6.4.3 Joins 156

6.4.4 Nesting Queries 157

7 Deep Learning and Artificial Intelligence161

7.1 Overview of AI 161

7.1.1 Dont Fear the Skynet: Strong and Weak AI 161

7.1.2 System 1 and System 2 162

7.2 Neural Networks 164

7.2.1 What Neural Nets Can and Cant Do 164

7.2.2 Enough Boilerplate: Whats a Neural Net? 165

7.2.3 Convolutional Neural Nets 166

7.2.4 Advanced Material: Training Neural Networks 167

7.2.4.1 Manual Versus Automatic Feature Extraction 168

7.2.4.2 Dataset Sizes and Data Augmentation 168

7.2.4.3 Batches and Epochs 169

7.2.4.4 Transfer Learning 170

7.2.4.5 Feature Extraction 171

7.2.4.6 Word Embeddings 171

7.3 Natural Language Processing 172

7.3.1 The Great Divide: Language Versus Statistics 172

7.3.2 Save Yourself Some Trouble: Consider Regular Expressions 173

7.3.3 Software and Datasets 174

7.3.4 Key Issue: Vectorization 175

7.3.5 Bag-of-Words 175

7.4 Knowledge Bases and Graphs 177

Postscript 181

Index 183

Informationen zu E-Books

Herzlichen Glückwunsch zum Kauf eines Ebooks bei der BUCHBOX! Hier nun ein paar praktische Infos.

Adobe-ID

Hast du E-Books mit einem Kopierschutz (DRM) erworben, benötigst du dazu immer eine Adobe-ID. Bitte klicke einfach hier und trage dort Namen, Mailadresse und ein selbstgewähltes Passwort ein. Die Kombination von Mailadresse und Passwort ist deine Adobe-ID. Notiere sie dir bitte sorgfältig. 
 
Achtung: Wenn du kopiergeschützte E-Books OHNE Vergabe einer Adobe-ID herunterlädst, kannst du diese niemals auf einem anderen Gerät außer auf deinem PC lesen!!
 
Du hast dein Passwort zur Adobe-ID vergessen? Dann kannst du dies HIER neu beantragen.
 
 

Lesen auf dem Tablet oder Handy

Wenn du auf deinem Tablet lesen möchtest, verwende eine dafür geeignete App. 

Für iPad oder Iphone etc. hole dir im iTunes-Store die Lese-App Bluefire

Für Android-Geräte (z.B. Samsung) bekommst du die Lese-App Bluefire im GooglePlay-Store (oder auch: Aldiko)
 
Lesen auf einem E-Book-Reader oder am PC / MAC
 
Um die Dateien auf deinen PC herunter zu laden und auf dein E-Book-Lesegerät zu übertragen gibt es die Software ADE (Adobe Digital Editions).
 
 

Andere Geräte / Software

 

Kindle von Amazon. Wir empfehlen diese Geräte NICHT.

EPUB mit Adobe-DRM können nicht mit einem Kindle von Amazon gelesen werden. Weder das Dateiformat EPUB, noch der Kopierschutz Adobe-DRM sind mit dem Kindle kompatibel. Umgekehrt können alle bei Amazon gekauften E-Books nur auf dem Gerät von Amazon gelesen werden. Lesegeräte wie der Tolino sind im Gegensatz hierzu völlig frei: Du kannst bei vielen tausend Buchhandlungen online Ebooks für den Tolino kaufen. Zum Beispiel hier bei uns.

Software für Sony-E-Book-Reader

Wenn du einen Sony-Reader hast, dann findest du hier noch die zusätzliche Sony-Software.
 

Computer/Laptop mit Unix oder Linux

Die Software Adobe Digital Editions ist mit Unix und Linux nicht kompatibel. Mit einer WINE-Virtualisierung kommst du aber dennoch an deine E-Books.