Skip to content

Commit 4f41f67

Browse files
author
anisa-hawes
committed
Merge branch 'gh-pages' into daphne-alexandre-new-roles
2 parents 8c177f4 + 79874f8 commit 4f41f67

32 files changed

Lines changed: 68 additions & 114 deletions

en/lessons/basic-text-processing-in-r.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -646,7 +646,7 @@ Many generic tutorials exist for all three of these, as well as extensive packag
646646

647647
[^3]: All Presidential State of the Union Addresses were downloaded from The American Presidency Project at the University of California Santa Barbara. (Accessed 2016-11-11) [http://www.presidency.ucsb.edu/sou.php](http://www.presidency.ucsb.edu/sou.php).
648648

649-
[^4]: Peter Norvig. "Google Web Trillion Word Corpus". (Accessed 2016-11-11) [http://norvig.com/ngrams/](http://norvig.com/ngrams/).
649+
[^4]: Peter Norvig. "Google Web Trillion Word Corpus". (Accessed 2016-11-11) [http://norvig.com/ngrams/](https://web.archive.org/web/20260326183858/http://norvig.com/ngrams/).
650650

651651
[^5]: This does happen for a few written State of the Union addresses, where a long bulleted list gets parsed into one very long sentence.
652652

en/lessons/beginners-guide-to-twitter-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ While this walkthrough proposes a specific workflow that we think is suitable fo
4040

4141
First, we need to gather some data. George Washington University’s [TweetSets](https://tweetsets.library.gwu.edu/) allows you to create your own data queries from existing Twitter datasets they have compiled. The datasets primarily focus on the biggest (mostly American) geopolitical events of the last few years, but the TweetSets website states they are also open to queries regarding the construction of new datasets. We chose TweetSets because it makes narrowing and cleaning your dataset very easy, creating stable, archivable datasets through a relatively simple graphical interface. Additionally, this has the benefit of allowing you to search and analyze the data with your own local tools, rather than having your results shaped by Twitter search algorithms that may prioritize users you follow, etc.
4242

43-
You could, however, substitute any tool that gives you a set of dehydrated tweets. Because tweets can be correlated to so much data, it’s more efficient to distribute dehydrated data sets consisting of unique tweet IDs, and then allow users to “hydrate” the data, linking retweet counts, geolocation info, etc., to unique IDs. More importantly, [Twitter's terms for providing downloaded content to third parties](https://web.archive.org/web/20190927151316/https://developer.twitter.com/en/developer-terms/agreement-and-policy), as well as research ethics, are at play. Other common places to acquire dehydrated datasets include Stanford’s [SNAP](https://snap.stanford.edu/data/) collections, the [DocNow Project](https://www.docnow.io) and data repositories, or the [Twitter Application Programming Interface (API)](https://developer.twitter.com/), directly. (If you wonder what an API is, please check this [lesson](/en/lessons/introduction-to-populating-a-website-with-api-data#what-is-application-programming-interface-api).) This latter option will require some coding, but Justin Littman, one of the creators of TweetSets, does a good job summarizing some of the higher-level ways of interacting with the API in this [post](https://gwu-libraries.github.io/sfm-ui/posts/2017-09-14-twitter-data).
43+
You could, however, substitute any tool that gives you a set of dehydrated tweets. Because tweets can be correlated to so much data, it’s more efficient to distribute dehydrated data sets consisting of unique tweet IDs, and then allow users to “hydrate” the data, linking retweet counts, geolocation info, etc., to unique IDs. More importantly, [Twitter's terms for providing downloaded content to third parties](https://web.archive.org/web/20190927151316/https://developer.twitter.com/en/developer-terms/agreement-and-policy), as well as research ethics, are at play. Other common places to acquire dehydrated datasets include Stanford’s [SNAP](https://snap.stanford.edu/data/) collections, the [DocNow Project](https://web.archive.org/web/20260316082621/https://www.docnow.io/) and data repositories, or the [Twitter Application Programming Interface (API)](https://developer.twitter.com/), directly. (If you wonder what an API is, please check this [lesson](/en/lessons/introduction-to-populating-a-website-with-api-data#what-is-application-programming-interface-api).) This latter option will require some coding, but Justin Littman, one of the creators of TweetSets, does a good job summarizing some of the higher-level ways of interacting with the API in this [post](https://gwu-libraries.github.io/sfm-ui/posts/2017-09-14-twitter-data).
4444

4545
We find that the graphical, web-based nature of TweetSets, however, makes it ideal for learning this process. That said, if you want to obtain a dehydrated dataset by other means, you can just start at the [Hydrating](/en/lessons/beginners-guide-to-twitter-data#hydrating) section.
4646

en/lessons/correspondence-analysis-in-R.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,7 @@ JUST 0.408 0.000 0.000
398398

399399
The normalisation process does something interesting. Those who are members of multiple committees and/or who belong to committees with many members will tend to have normalisation scores that are lower, suggesting that they are more central to the network. These members will be put closer to the centre of the matrix. For example, the cell belonging to S Ambler and IWFA has the lowest score of 0.192 because S Ambler is a member of three committees and the IWFA committee has nine members in the graph represented.
400400

401-
The next stage is to find the singular value decomposition of this normalised data. This involves fairly complex linear algebra that will not be covered here, but you can learn more from [these Singular Value Decomposition lecture notes](https://math.mit.edu/classes/18.095/2016IAP/lec2/SVD_Notes.pdf) or in more detail from [this pdf file on SVD](http://davetang.org/file/Singular_Value_Decomposition_Tutorial.pdf). I will try to summarize what happens in lay terms.
401+
The next stage is to find the singular value decomposition of this normalised data. This involves fairly complex linear algebra that will not be covered here, but you can learn more from [these Singular Value Decomposition lecture notes](https://math.mit.edu/classes/18.095/2016IAP/lec2/SVD_Notes.pdf) or in more detail from [this pdf file on SVD](https://perma.cc/F7MJ-EGET). I will try to summarize what happens in lay terms.
402402

403403
* Two new matrices are created that show "dimension" scores for the rows (committees) and the columns (MPs) based on eigenvectors.
404404
* The number of dimensions is equal to the size of the columns or rows minus 1, which ever is smaller. In this case, there are five committees compared to the MPs eleven, so the number of dimensions is 4.

en/lessons/creating-apis-with-python-and-flask.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -492,7 +492,7 @@ After incorporating these design improvements, a request to our API might look l
492492

493493
Without documentation, even the best-designed API will be unusable. Your API should have documentation describing the resources or functionality available through your API that also provides concrete working examples of request URLs or code for your API. You should have a section for each resource that describes which fields, such as `id` or `title`, it accepts. Each section should have an example in the form of a sample HTTP request or block of code.
494494

495-
A fairly common practice in documenting APIs is to provide annotations in your code that are then automatically collated into documentation using a tool such as [Doxygen](http://www.doxygen.org/) or [Sphinx](http://www.sphinx-doc.org/en/stable/). These tools create documentation from **docstrings**—comments you make on your function definitions. While this kind of documentation is a good idea, you shouldn't consider your job done if you've only documented your API to this level. Instead, try to imagine yourself as a potential user of your API and provide working examples. In an ideal world, you would have three kinds of documentation for your API: a reference that details each route and its behavior, a guide that explains the reference in prose, and at least one or two tutorials that explain every step in detail.
495+
A fairly common practice in documenting APIs is to provide annotations in your code that are then automatically collated into documentation using a tool such as [Doxygen](https://www.doxygen.nl/) or [Sphinx](http://www.sphinx-doc.org/en/stable/). These tools create documentation from **docstrings**—comments you make on your function definitions. While this kind of documentation is a good idea, you shouldn't consider your job done if you've only documented your API to this level. Instead, try to imagine yourself as a potential user of your API and provide working examples. In an ideal world, you would have three kinds of documentation for your API: a reference that details each route and its behavior, a guide that explains the reference in prose, and at least one or two tutorials that explain every step in detail.
496496

497497
For inspiration on how to approach API documentation, see the [New York Public Library Digital Collections API](http://api.repo.nypl.org/), which sets a standard of documentation achievable for many academic projects. For an extensively documented (though sometimes overwhelming) API, see the [MediaWiki Action API](https://www.mediawiki.org/wiki/API:Main_page), which provides documentation to users who pass partial queries to the API. (In our example above, we returned an error on a partial query.) For other professionally maintained API documentation examples, consider the [World Bank API](https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-api-documentation), the various [New York Times APIs](https://developer.nytimes.com/), or the [Europeana Pro API](https://pro.europeana.eu/resources/apis).
498498

en/lessons/detecting-text-reuse-with-passim.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -906,4 +906,4 @@ MR gratefully acknowledges the financial support of the Swiss National Science F
906906
9. Hannu Salmi, Heli Rantala, Aleksi Vesanto, Filip Ginter. The long-term reuse of text in the Finnish press, 1771–1920. **2364**, 394–544 In *CEUR Workshop Proceedings*. (2019).
907907
10. Axel J Soto, Abidalrahman Mohammad, Andrew Albert, Aminul Islam, Evangelos Milios, Michael Doyle, Rosane Minghim, Maria Cristina de Oliveira. Similarity-Based Support for Text Reuse in Technical Writing. 97–106 In *Proceedings of the 2015 ACM Symposium on Document Engineering*. ACM, 2015. [Link](http://dx.doi.org/10.1145/2682571.2797068)
908908
11. Alexandra Schofield, Laure Thompson, David Mimno. Quantifying the Effects of Text Duplication on Semantic Models. 2737–2747 In *Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing*. Association for Computational Linguistics, 2017. [https://doi.org/10.18653/v1/D17-1290](https://perma.cc/KSK6-5TXP)
909-
12. Matteo Romanello, Aurélien Berra, Alexandra Trachsel. Rethinking Text Reuse as Digital Classicists. *Digital Humanities conference*, 2014. [Link](https://wiki.digitalclassicist.org/Text_Reuse)
909+
12. Matteo Romanello, Aurélien Berra, Alexandra Trachsel. Rethinking Text Reuse as Digital Classicists. *Digital Humanities conference*, 2014. [Link](https://web.archive.org/web/20140829121705/https://wiki.digitalclassicist.org/Text_Reuse)

en/lessons/installing-python-modules-pip.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Lesson Goals
3030

3131
This lesson shows you how to download and install Python modules. There
3232
are many ways to install external modules, but for the purposes of this
33-
lesson, we’re going to use a program called pip, easily installable on [mac/linux](https://pip.pypa.io/en/stable/) and [windows]( https://sites.google.com/site/pydatalog/python/pip-for-windows). As of Python 2.7.9 and newer, pip is installed by default. This tutorial will be helpful for anyone using older versions of Python (which are still quite common).
33+
lesson, we’re going to use a program called pip, easily installable on [mac/linux](https://pip.pypa.io/en/stable/) and [windows](https://pydatalog.readthedocs.io/en/latest/installation/#using-pip). As of Python 2.7.9 and newer, pip is installed by default. This tutorial will be helpful for anyone using older versions of Python (which are still quite common).
3434

3535
Introducing Modules
3636
-------------------

en/lessons/research-data-with-unix.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,8 +128,8 @@ _____
128128

129129
In this lesson you have learnt to undertake some basic file counting, to query across research data for common strings, and to save results and derived data. Though this lesson is restricted to using the Unix shell to count and mine tabulated data, the processes can be easily extended to free text. For this we recommend two guides written by William Turkel:
130130

131-
- William Turkel, '[Basic Text Analysis with Command Line Tools in Linux](http://williamjturkel.net/2013/06/15/basic-text-analysis-with-command-line-tools-in-linux/)' (15 June 2013)
132-
- William Turkel, '[Pattern Matching and Permuted Term Indexing with Command Line Tools in Linux](http://williamjturkel.net/2013/06/20/pattern-matching-and-permuted-term-indexing-with-command-line-tools-in-linux/)' (20 June 2013)
131+
- William Turkel, '[Basic Text Analysis with Command Line Tools in Linux](https://web.archive.org/web/20140925220046/http://williamjturkel.net/2013/06/15/basic-text-analysis-with-command-line-tools-in-linux/)' (15 June 2013)
132+
- William Turkel, '[Pattern Matching and Permuted Term Indexing with Command Line Tools in Linux](https://web.archive.org/web/20200925054120/http://williamjturkel.net/2013/06/20/pattern-matching-and-permuted-term-indexing-with-command-line-tools-in-linux/)' (20 June 2013)
133133

134134
As these recommendations suggest, the present lesson only scratches the surface of what the Unix shell environment is capable of. It is hoped, however, that this lesson has provided a taster sufficient to prompt further investigation and productive play.
135135

en/lessons/sonification.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ Let us assume that you have a historic diary to which you've fitted a [topic mod
195195
Installing miditime is straightforward using [pip](/lessons/installing-python-modules-pip):
196196

197197
`$ pip install miditime` or `$ sudo pip install miditime` for a Mac or Linux machine;
198-
`> python pip install miditime` on a Windows machine. (Windows users, if the instructions above didn't quite work for you, you might want to try [this helper program](https://sites.google.com/site/pydatalog/python/pip-for-windows) instead to get Pip working properly on your machine).
198+
`> python pip install miditime` on a Windows machine. (Windows users, if the instructions above didn't quite work for you, you might want to try [this helper program](https://pydatalog.readthedocs.io/en/latest/installation/#using-pip) instead to get Pip working properly on your machine).
199199

200200
### Practice
201201
Let us look at the sample script provided. Open your text editor, and copy and paste the sample script in:
@@ -240,7 +240,7 @@ Baa, Baa, black, sheep, have, you, any, wool?
240240

241241
Can you make your computer play this song? (This [chart](https://web.archive.org/web/20171211192102/http://www.electronics.dit.ie/staff/tscarff/Music_technology/midi/midi_note_numbers_for_octaves.htm) will help).
242242

243-
**By the way** There is a text file specification for describing music called '[ABC Notation](http://abcnotation.com/wiki/abc:standard:v2.1)'. It is beyond us for now, but one could write a sonification script in say a spreadsheet, mapping values to note names in the ABC specification (if you've ever used an IF - THEN in Excel to convert percentage grades to letter grades, you'll have a sense of how this might be done) and then using a site like [this one](http://trillian.mit.edu/~jc/music/abc/ABCcontrib.html) to convert the ABC notation into a .mid file.
243+
**By the way** There is a text file specification for describing music called '[ABC Notation](https://web.archive.org/web/20160617203735/http://abcnotation.com/wiki/abc:standard:v2.1)'. It is beyond us for now, but one could write a sonification script in say a spreadsheet, mapping values to note names in the ABC specification (if you've ever used an IF - THEN in Excel to convert percentage grades to letter grades, you'll have a sense of how this might be done) and then using a site like [this one](http://trillian.mit.edu/~jc/music/abc/ABCcontrib.html) to convert the ABC notation into a .mid file.
244244

245245
### Getting your own data in
246246

@@ -439,7 +439,7 @@ The code is pretty clear: loop the 'bd_boom' sample with the reverb sound effect
439439

440440
By the way, 'live-coding'? What makes this a 'live-coding' environment is that you can make changes to the code _while Sonic Pi is turning it into music_. Don't like what you're hearing? Change the code up on the fly!
441441

442-
For more on Sonic Pi, [this workshop website](https://www.miskatonic.org/music/access2015/) is a good place to start. See also Laura Wrubel's [report on attending that workshop, and her and her colleague's work in this area](http://library.gwu.edu/scholarly-technology-group/posts/sound-library-work).
442+
For more on Sonic Pi, [this workshop website](https://web.archive.org/web/20150907155822/https://www.miskatonic.org/music/access2015/) is a good place to start. See also Laura Wrubel's [report on attending that workshop, and her and her colleague's work in this area](http://library.gwu.edu/scholarly-technology-group/posts/sound-library-work).
443443

444444
# Nihil Novi Sub Sole
445445
Again, lest we think that we are at the cutting edge in our algorithmic generation of music, a salutary reminder was published in 1978 on 'dice music games' of the eighteenth century, where rolls of the dice determined the recombination of pre-written snippets of music. [Some of these games have been explored and re-coded for the Sonic-Pi by Robin Newman](https://rbnrpi.wordpress.com/project-list/mozart-dice-generated-waltz-revisited-with-sonic-pi/). Newman also uses a tool that could be described as Markdown+Pandoc for musical notation, [Lilypond](http://www.lilypond.org/) to score these compositions. The antecedents for everything you will find at _The Programming Historian_ are deeper than you might suspect!

en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -704,7 +704,7 @@ according to [this blog
704704
post](http://web.archive.org/web/20140120195538/http://mashable.com/2013/06/24/markdown-tools/))
705705
other, Markdown-specific alternatives to MS Word are available online,
706706
and often free of cost. From the standalone ones, we liked
707-
[Mou](http://mouapp.com/), [Write Monkey](http://writemonkey.com), and
707+
[Mou](http://mouapp.com/), [Write Monkey](https://web.archive.org/web/20260327163157/http://writemonkey.com/), and
708708
[Sublime Text](http://www.sublimetext.com/). Several web-based platforms
709709
have recently emerged that provide slick, graphic interfaces for
710710
collaborative writing and version tracking using Markdown. These

en/lessons/topic-modeling-and-mallet.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -468,7 +468,7 @@ This command
468468
- opens your `tutorial.mallet` file
469469
- trains MALLET to find 20 topics
470470
- outputs every word in your corpus of materials and the topic it
471-
belongs to into a compressed file (`.gz`; see www.gzip.org on how to
471+
belongs to into a compressed file (`.gz`; see [www.gzip.org](https://web.archive.org/web/20260607182747/https://gzip.org/) on how to
472472
unzip this)
473473
- outputs a text document showing you what the top key words are for
474474
each topic (`tutorial_keys.txt`)

0 commit comments

Comments
 (0)