-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path07-functions.Rmd
More file actions
366 lines (232 loc) · 15.2 KB
/
Copy path07-functions.Rmd
File metadata and controls
366 lines (232 loc) · 15.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
---
editor_options:
markdown:
wrap: 72
---
# Functions
## What is a function?
A "function" in programming is a piece of code that does a specific task, but packages it so it can be called with a single line of code. Functions can be predefined by developers, and also may be custom to any script.
We've already been using examples of functions in our psuedocode - such as when we would include the `write` statement - and formal functions, such as the `print`, `cout`, or `echo` statements we saw in the conditional examples at the end of the previous chapter.
The `print` function does something very simple, which is provide output, either on the R or Python console or interface, that people can read. It is simple to use, but internal code that creates the print function is a little more complex (to say the least). Since communicating output is ubiquitous in all computer languages, each language has some sort of print function as a built in function so it makes it easier for programmers to focus on their particular code.
Functions can do something very simple or very complex. But in essence, functions are used when you know you're going to do the same task over and over again.
Take our cheesy mashed potatoes example. A simple function might be something like `peel_potato()`, which would take our first example algorithm on peeling potatoes, and create a function.
The original algorithm
**Algorithm for peeling potatoes for Cheesy Mashed Potatoes**
1. Get three pounds of russet potatoes.
2. Get bowl large enough to hold three pounds of russet potatoes.
2. Get a vegetable peeler.
3. Take one potato.
5. Peel potato.
4. Rinse potato with water.
7. Place peeled potato in bowl
8. Repeat 4-7 until all potatoes are washed and peeled
But, we don't want to create a function for the whole thing. We want to create a function that allows us to easily execute what needs to be repeated, Steps 4-7. So the function might look like
```
define function peel_potato(potato, peeler):
while(potato has skin):
location <- on potato, locate skin
potato <- at location on potato, with peeler remove strip of skin
potato <- rotate potato
// end-while (this is a comment, it begins with // )
potato <- rinse potato with water
return (potato)
// end function definition
```
A bonus of functions, is you can make them generic if you know the action can be applied over many objects. Perhaps you want to peel both potatoes and carrots, you can make a function for that.
```
define function peel_vegetable(vegetable, peeler):
while(vegetable has skin):
location <- on vegetable, locate skin
vegetable <- at location on vegetable, with peeler remove strip of skin
vegetable <- rotate(vegetable)
// end-while
vegetable <- rinse_with_water(vegetable)
return (vegetable)
// end function definition
```
Now the `peel_vegetable` function can be used with multiple objects. (Well, it could before, but using the function name `peel_potato` would be confusing when trying to peel `carrot` or `zucchini`. Naming matters, especially for understandability!)
Now that we've created a function for peeling vegetables, our algorithm might look like this
```
potatoes <- 16
peeler <- "swiss peeler"
bowl <- [] // empty array
for potato in potatoes:
bowl <- bowl+peel_vegetable(potato, peeler)
// end for-statement
```
### New Programming Concepts
Sometimes, we may want to _add_ a variable, or function output, to an existing variable type that can accept new information. Other times, we may want to _change_ a variable in a completely different way.
#### Adding to a Variable
These two examples add to something that already exists:
- `bowl <- bowl+peel_vegetable(potato)`
- `x <- 0; x <- x + 1`
A key piece of info here is the __original variable needs to be of a data type that can accept the new piece of data__.\
We defined the bowl as an array `[]`, so we can keep adding the same data type (potato, maybe vegetable) to the array.\
```
potatoes <- 16
peeler <- "swiss peeler"
bowl <- [] // empty array
for potato in potatoes:
bowl <- bowl+peel_vegetable(potato, peeler)
// bowl array after for loop finishes
[potato, potato, potato, potato,...] // 16 'potato' items in array
```
For the other example, if we tried to add two different data types
- `x <- "zero"; x <- x + 1`
The computer would return an error message, because adding text and numbers, in `"zero" + 1`, is not possible.
The task of adding content to something that already exists happens so often that some languages, such as C/C++, will combine assignment operators with math functions, as in `+=`. In this case, `x += 1` is the same as `x = x + 1`
#### Modifying a Variable
For this statement
`vegetable <- rotate(vegetable)`
the vegetable is rotating, and being reassigned to the same variable name. This is a little more tricky and in need of caution, because you're completely __rewriting the value in `vegetable` instead of modifying it__. If it were to be modified to a different data type accidentally, the program would still execute with no error message for this assignment, because it simply sees a variable assignment.
```
x <- 123
x <- as.text(x) // accidentally uses the wrong function
// now the data type of x is a string
// many lines of code later
x <- x + 1 // this will cause an error in the code
```
The error will happen because a string and a numeric value can't be added together. But, this will be tricky to troubleshoot because the error will result from the `x <- x + 1` part of the code, even though the actual error was the assignment which happened many lines prior in `x <- as.text(x)`.
<hr>
## Back to Functions
A complex function might be something like `make_cheesy_mashed_potatoes()`, which is the equivalent to having an in-home chef who can make you cheesy mashed potatoes on demand, as long as you provide the raw ingredients.
```
make_cheesy_mashed_potatoes(cheese, potatoes, milk, salt, butter)
```
In programming parlance, these ingredients listed `(cheese, potatoes, milk, salt, butter)` would be the "arguments".
### Arguments
Functions will have inputs _(usually)_ and outputs. The function inputs are formally called "arguments," and are specified by the programmer, entered within the parentheses as part of the function call. For cheesy mashed potatoes, this could be specifying what kind of cheese to use, or the quantity of each ingredient.
While our example is in pseudocode, luckily function calls look the same across most languages.
```
# these variables have quantities in oz
cheese <- 16
potatoes <- 32
milk <- 8
salt <- 0.125
butter <- 4
make_cheesy_mashed_potatoes(cheese, potatoes, milk, salt, butter)
paste("this function uses", cheese, "oz of cheese")
```
We already mentioned that `print` is a common built-in function. Most languages a number of built-in functions which are commonly used, such as the `paste` function in R, represented above. Let's look at another built in function that is common to both R and Python: `round()`
Built-in functions have help pages which will allow you to see the syntax and arguments. The help files will include definitions of the arguments, if they're required or have a default, and examples for use.
The [syntax for Python](https://docs.python.org/3/library/functions.html#round) is `round(number, ndigits=None)`
The [syntax for R](https://search.r-project.org/R/refmans/base/html/Round.html) is `round(x, digits = 0, ...)`, where `x` is a numeric vector.
The second argument for both languages indicates the number of digits along with the default value. Since there's a default value, you don't necessarily need to include the argument in the function call.
An example in R is
``` {r function-round}
round(c(1.4, 2.5, 3.5))
```
Another argument common in many functions which deal with calculating something is how to handle missing or `NA` values.
For example, the function [`mean()` in R](https://search.r-project.org/R/refmans/base/html/mean.html) is `mean(x, trim = 0, na.rm = FALSE)`. The default value of `na.rm` (which you can think of as "NA remove") is `FALSE`, indicating that `NA` values will _not_ be removed from the calculation.
Optional arguments like this may come with a pre-supplied default, in this case that `na.rm = FALSE` and any `NA`values will be retained, unless this argument is manually changed to `TRUE`.\
## Function Takeaways
When working with functions, which is the vast majority of programming, it is important to keep in mind two things:
1. The arguments in a function
2. Which arguments are optional, and what are their defaults
A function will run as long as the required arguments are provided, but the resulting output may not match expectations unless you recognize which optional arguments were included with default values.
## Libraries
We've already discussed built-in functions. Built in functions are often limited to basic tasks and do not include more complex or custom functions that you may with to use. Now, you can code more complex functions yourself, building off of the built in functions, but this would take a lot of time and require more in-depth programming knowledge.
The good news is that most programming languages will have optional "libraries" (or packages, or modules, depending on what term your programming language of choice uses) that include additional functions, beyond the built in function. So before creating a new function from scratch, it is worthwhile to check whether a library exists that includes a function that does what you want to do.\
Using our cheesy mashed potatoes example, you might use a programming library to find a recipe that uses different ingredients. You want to make cheesy mashed potatoes from scratch, but don't have the time to do so. Luckily someone else has created function that uses instant mashed potatoes instead of raw potatoes.
Libraries are also useful because they mean that your computer and programming language of choice don't need to always have every single function on hand. This set up saves computer disk space, ensures you don't have to recreate the wheel and make every function from scratch, and provides a level of standardization (e.g., everyone uses the same reference "recipe" so output should be the same for the same input, across users).\
A good rule of thumb is if it _seems_ like the function you want is broadly useful, then someone has likely created a library containing it. This is also true for niche or domain-specific functions: if the task is one that comes up a lot in analysis, there is likely a library that has functions for those analysis tasks.\
Finding the 'right' library for the function you need can be overwhelming, but a good starting point is the official library collection for a programming language, such as [CRAN for R](https://cran.r-project.org/web/views/) or [PyPI for Python](https://pypi.org/).
### Accessing Functions in Libraries
The syntax for accessing functions in libraries varies by programming language but follows the general process of:
1. Install the library from the source. You only need to do this once.
2. Load or import the library. You will need to do this every time you want to access a function in a library. By convention, libraries are loaded at the top of a script, so you, and other people, can see at a glance what libraries are needed to run the script.
3. Use the functions as normal.
### Function Names
There are only so many function names that make sense in the English language, so there may be functions from _different libraries_ that have the same name. How does the programming language know which function you are trying to access? By default, the language will use the function of the more recently loaded or imported package.
Let's say we have two `mean()` functions, one from `library_A` and one from `library_B`. They differ in their default settings:
- `library_A` defauls to `na.rm = TRUE`
- `library_B` defaults to `na.rm = FALSE`
If we load libraries in order `library_A` then `library_B` and then use `mean()` as a function, we will be using the `mean()` function from `library_B`.
```
load(library_A)
load(library_B)
values <- c(2, 4, 7, 5, 9, NA)
mean(values)
```
Our result will be `NA`.
Conversely, if we load `library_B` then `library_A`, we will use the `mean()` function from `library_A`.
```
load(library_B)
load(library_A)
values <- c(2, 4, 7, 5, 9, NA)
mean(values)
```
Our value will be `5.4`.
The tricky part is that all this happens invisibly. There may or may not - depending on your programming language - be a warning that two libraries contain functions of the same name. So, keeping track of your order of loading is important. If you get any unexpected results, you can double check which library the function you are using is from.\
A better method is to be explicit about which function you are calling. Most programming languages will allow a syntax along the lines of `library:function()` to specify use of a function from a stated library.\
```
load(library_B)
load(library_A)
values <- c(2, 4, 7, 5, 9, NA)
library_B:mean(values)
#result will be `NA`
```
### Aliases
You may encounter the concept of an "alias" for a library. This is common in Python, where users can set an alias for a library name, and use that going forward rather than writing out the full library name. This will mostly come up if you are looking for help online, or wondering why you are seeing abbreviations.
For example, Python uses the `import` term to load a library (or "package" as Python calls them) and allows setting an alias using `import package as alias` syntax. By convention, many Python users will use standard aliases for common packages, such as:
```
import pandas as pd
import numpy as np
```
Functions can then be called using the explicit `package:function` syntax, such as `pd.DataFrame` to designate a `pandas` `DataFrame` object.
### Exercise: Create Popcorn Function
_Timing: 15 minutes_
Let's return to the section 2 activity where your group created an algorithm to make popcorn.
- How would you adapt your step by step narrative algorithm to be a function called `make_popcorn()`?
- What arguments would be included, and would you have any default values set?
- Does your function look different than others in your group? How would you tell the computer which function you wanted to use?
<details>
<summary>Example functions</summary>
```
# specify which library we want to get a function from,
# in this case instructor Steph
from Steph import make_popcorn()
# The arguments and defaults for this library is
#
# make_popcorn(quantity_popcorn, type = microwave, time = 2,
# butter = movie_theater_flavor, seasoning = TRUE)
#
# where time is in minutes and quantity_popcorn is in bags
# example in use
make_popcorn(quantity_popcorn = 1, time = 1.5)
```
```
# specify which library we want to get a function from,
# in this case instructor Kat
from Kat import make_popcorn()
# The arguments and defaults for this library is
#
# make_popcorn(quantity_popcorn, quantity_oil = 1, type = stovetop,
# butter = TRUE, salt = "light")
#
# where quantity is in tbsp
# example in use
make_popcorn(2)
```
</details>
## Language examples: Formatting may vary [may not need in this section, since round() and mean() included above]
**Python**
```python
```
<hr>
**R**
```r
```
<hr>
**C/C++**
```c
```
<hr>
**PHP**
```php
```
<hr>
**Bash**
```bash
```
<hr>