Web basics

Website

Some basic knowledge to prepare for webscrapping.

Author

Chi Zhang

Published

August 20, 2023

Resources

https://jakobtures.github.io/web-scraping/rvest1.html

HTML tags

<!DOCTYPE html>

<html>
  <head>
    <title>Hello World!</title>
  </head>
  <body>
    <b>Hello World!</b>
  </body>
</html>

Line 1: declare which version of HTML. For now it is 5

The rest are different tags.

Important tags:

  • header, <h1>, <h2>, ..., <h6>
  • division or span, <div>, <span>
  • paragraph, <p>
  • line break, <br>. This does not need to be closed with </br>
  • bold, italics <b>, <i>
  • lists - ordered <ol>, unordered <ul>. Within the tags, use <li>
  • tables. Lines are defined by <tr> (table row), table header <th> and <td>, table data.
  • anchor, <a>, useful for url links: but it is different from <link> tag (which links to files such as JS or CSS).

Attributes

Basic syntax: <tag attribute="value">...</tag>. No space between equal sign and value.

Images

Two images, one with adjusted size

<img src="https://jakobtures.github.io/web-scraping/Rlogo.png">
<img src="https://jakobtures.github.io/web-scraping/Rlogo.png" width="100" height="100">

Can combine image with link, by puttinng the links within the anchor.

<a href="https://www.r-project.org/" target="_blank"><img src="https://jakobtures.github.io/web-scraping/Rlogo.png"></a> 

Entities

Coded representations of certain characters, &..;. For example,

  • &lt; less than, <
  • &quot;
  • &amp; &: ampersand
  • &nbsp; non-breaking space