COMPRESSION, ENCRYPTION AND HASHING Purpose of compression • Compression reduces the size of files, so more can be stored • Useful for sharing data (limits on size + increased speed) • Images on websites need to be compessed for speed Lossy vs lossless compression • Lossy ○ Unnecessary data is removed, at the expense of quality ○ MP3 removes frequencies that aren't audible to humans • Lossless ○ No data is loss so original quality, but size reduced may be less than lossy ○ Relies on recognising repeating patterns of data, allowing data to be stored more efficiently ○ Run-Length Encoding (RLE) § Stores contiguous repeating blocks of memory as the value and the number of times it repeats § Works well for images with large blocks of the same colour ○ Dictionary encoding § Uses a dictionary to store frequently used groups of data, like words or phrases § Phrases are replaced by a shorter key and the key is stored in the dictionary § Good for text files where words are repeated Encryption • Encryption is the transformation of plaintext (original data) into ciphertext (encrypted) to prevent unauthorised parties from understanding it • A key is required to encrypt and decrypt the message Symmetric vs asymmetric encryption • Symmetric (private key) ○ Uses one key for encryption and decryption ○ Sender uses key to encrypt, then sends the key for the receiver to decrypt ○ Key must be transferred securely, if someone intercepts key, they can read everything ○ Very fast • Asymmetric ○ Uses a public key to encrypt and private key to decrypt ○ Anyone can see the receiver's public key ○ The sender encrypts the message with the receiver's public key and sends it to the receiver ○ The receiver decrypts it with their private key ○ Much slower as maths much more complicated • As asymmetric encryption is much slower, so usually, symmetric encryption is used and the key is sent with asymmetric encryption Hashing • Provides a one-way mapping between any length data to produce a fixed length hash • Even a tiny change to data causes a huge difference to the hash • Used in passwords - hash of password is stored, then when logging in, the input is hashed and compared against the hash of the password • More secure as if hackers gain unauthorised access, they only get the hashes which can't be decrypted, not the actual passwords • Also used to check file integrity - hash of a file compared before and after transfer, if different the file is changed or corrupted DATABASES Databases • A database is an organised collection of data stored electronically, allowing easy storage, retrieval and management of data • Field - column/attribute • Record - one row in a table • Table - structured set of data organised into rows/records and fields/columns Flat file vs relational database • Flat file ○ Everything is stored in one table ○ Simple and easy to understand • Relational database ○ A collection of tables in which tables are linked via unique keys ○ Less duplication, easier to update, more efficient storage Primary, foreign and secondary key • Primary key - a key that uniquely identifies a record (no two records can share a primary key) e.g. StudentID • Foreign key - a foreign key linking to a primary key in another table e.g. TeacherID links to Teachers table • Secondary key - a field used to help search, sort or organise records but is not unique Normalisation • 1NF (first normal form) ○ Each record needs a primary key ○ Each field only has one thing in it, which can't be split up (atomicity) • 2NF (second normal form) ○ In first normal form ○ No partial dependencies - if a table has a composite primary key (made from more than one attribute), all other fields need to depend on all parts of it • 3NF (third normal form) ○ In second normal form ○ No transitive dependencies ○ All fields depend only on the whole primary key • Attributes depend on ○ The key (1NF) ○ The whole key (2NF) ○ Nothing but the key (3NF) • Why normalise ○ Remove duplication/redundancy ○ Easier to make changes ○ Faster sorting/searching • Why not normalise ○ Can make queries and code more complex ○ Can be performance reasons • If not normalised called denormalised Entity relationship diagrams • An entity is an item of interest about which information is stored in a table Methods for capturing data • Forms - contains blank fields users can fill in with data • OCR (optical character recognition) - converts scanned text/images into editable text, but can make mistakes with unclear text • OMR (optical mark recognition) - detects marks in specific places e.g. multiple choice exams. Highly accurate and fast but only works with predefined positions • Sensors - automatically collect real world data e.g. temperature, motion, pressure Selecting data • Query by example (QBE) ○ You fill in fields you want to display, the criteria, e.g. *forest* means only fields containing "forest" and sorting options ○ Easier for beginners but less flexible than SQL • SQL (structured query language) - programming language used to manage, manipulate and query data in databases Managing data • Changing data by manipulating it - arithmetic functions, adding, editing, deleting data Exchanging data • CSV (comma separated values) - simple text file, each line is a row, commas separate columns ○ Simple, small file size, widely supported ○ No data types/formatting • JSON (JavaScript object notation) - organises data using arrays and objects (key-value pairs) ○ Structured, used heavily in web APIs ○ Larger than CSV • Communication mediums ○ Electronic - email, memory stick, cloud ○ Non-electronic - paper-based (data is printed physically) Interrogating and indexing data • Interrogating - search/question the database to find useful information • Indexing - makes searching faster, but uses extra storage and indexes must be updated when records change Data mining • Analysing large data sets to find patterns, trends, correlations, anomalies and other useful information • Can find useful information quickly, improves business decisions/profits, helps prediction • But personal data may be analysed without users fully understanding, large data sets have security risks, correlation does not always equal causation (misleading) and if the data is biased, results may also be biased SQL examples • SELECT Name, Age FROM Students • SELECT * FROM Students WHERE Age > 18 AND Grade = 'A' • SELECT Name • FROM Students • WHERE StudentID = ( SELECT StudentID FROM Awards WHERE Prize = 'Gold' ) • % and LIKE ○ WHERE Name LIKE '%A' - names beginning with "A" ○ WHERE Name LIKE 'n%' - names ending with "n" ○ WHERE Name LIKE '%ann%' - names containing "ann" • DELETE FROM Students WHERE StudentID = 3 (without WHERE, all rows are deleted) • INSERT INTO Students (StudentID, Name, Age) VALUES (1, 'Alice', 18), (2, 'John', 17) • SELECT Students.Name, Grades.Grade FROM Students JOIN Grades ON Students.ID = Grades.StudentID Referential integrity • A foreign key in one table must always refer to a valid, existing primary key in another table • So you cannot delete a record in one table if records in another table is still referencing it • Keeps relationships between tables valid and consistent Transaction processing • A transaction is one or more queries • Take place in banking systems, online shopping, airline booking etc. • Problems - concurrent transactions (inconsistent/overwritten data), system failure (incomplete updates, corrupted data) ACID rules • Atomicity ○ A transaction must be fully completed or not happen at all ○ If the computer crashes during transaction, nothing changes at all • Consistency ○ Database remains valid before and after the transaction ○ Rules are maintained, including referential integrity • Isolation ○ Transactions do not interfere with each other ○ Concurrent transactions are treated as sequential ○ Prevents data corruption, overbooking and conflicting updates • Durability ○ Once a transaction is complete and committed, the change is permanent ○ Even if the system crashes, the data must remain saved Record locking • When a user edits a record, the database temporarily locks it so others cannot edit it simultaneously • Ensures isolation NETWORKS Purpose of a network • A network is two or more devices connected together so they can communicate and share resources • Purpose - share files, share peripherals, communicate, allow centralised storage, management and security Protocols • A protocol is a set of rules that devices follow to communicate over a network • Allow hardware and software from different manufacturers to communicate • Ensure reliability - handle error detection and protection e.g. TCP requesting retransmission of packets if they do not arrive at their destination • Security - encrypt data • An example of a standard ○ A standard is an agreed set of rules followed by devices or software ○ Ensures compatibility between software and hardware made by different manufacturers and reliability Examples of protocols • HTTP (Hyper Text Transfer Protocol) allows web browsers to request and receive data from web servers • FTP is used to transfer files • SMTP is used to send emails to a mail server which are permanently stored • POP3 is used to receive emails by downloading them from a server that temporarily stores them (deleted after download) • IMAP is used to download emails from a server that permanently stores them Layering • Layering divides network communication into manageable sections • Each layer has a specific role and responsibility • Layers work independently of each other so changes in one layer do not affect the others • Benefits of layering ○ Simplifies complex networking systems - decomposition ○ Allows interoperability between different hardware and software ○ Enables standardisation of network communication ○ Faults can be isolated to a specific layer ○ Encourages development and improvement of individual layers TCP/IP stack • Application layer ○ Provides network services to applications so software can communicate ○ Acts as interface between software and network ○ Performs encryption and decryption if necessary ○ e.g. HTTP, FTP, IMAP, SMTP, POP3 • Transport layer ○ Handles end-to-end communication between devices ○ Ensures data is delivered correctly and in order ○ Breaks data into segments which are reassembled into order at the destination ○ Checks for errors, retransmits missing data, keeps packets in order ○ Uses TCP • Internet layer ○ Responsible for routing packets between networks (determines where packets need to go) ○ Adds source and destination IP and determines routes packets take across the internet ○ Uses IP • Link layer ○ Handles physical transmission of data across network hardware ○ Adds MAC address of the next device to the packet ○ Converts packets into electrical/radio/light signals ○ Uses ethernet, Wi-Fi, fibre optics • How data is transmitted ○ Application layer creates the data, then formats it according to the protocol's rules ○ Transport layer uses TCP to split data into segments, add numbers to data so they can be reordered, adds the checksum and ensures reliable transmission ○ Internet layer adds source destination IP to each packet, routers read destination IP and decide route to forward each packet ○ Link layer converts data into signals, sending it through cables, fibre optics or Wi-Fi radio waves Hardware to connect/build a network • Modem ○ Changes a signal from digital to analogue • Router ○ Connects networks together ○ Assigns IP address to devices ○ Examines data packets and forwards them • Cable ○ Carries digital data from one device/NIC to the next ○ Connects wired devices to network • NIC (network interface card) ○ Gives each device a MAC address / unique ID ○ Allows a computer system to interface with a network • Wireless access point (WAP) ○ Allows wireless devices to communicate with each other ○ Sends and receives radio waves ○ Examines data packets and forwards them • Switch ○ Connects multiple wired devices to the network ○ Receives data and forwards it to the intended recipient ○ Examines data packets and forwards them ○ Routes based on MAC addresses ○ Higher speed, better security and better scalability than hub • Hub ○ Connects multiple wired devices to the network ○ Receives data from a device and broadcasts it to all devices connected to it • Firewall ○ Filters traffic coming in and out of a network LAN vs WAN • LAN ○ Covers small geographical area, privately owned and managed ○ Higher data transfer speed, more secure, cheaper • WAN ○ Covers large geographical area, uses public or leased infrastructure ○ Slower, lower security, more expensive Domain Name System (DNS) • The system that finds the IP address for the domain name • User inputs URL, then the computer first checks its DNS cache • If not found, the browser contacts your computer's DNS resolver (run by your ISP) to find the address • Resolver asks a root server, which points the resolver towards the right Top-Level Domain (TLD) server e.g. .com, .org • TLD server returns the address of the authoritative DNS server for the domain • The authoritative server returns the correct IP address to the DNS resolver, which sends it back to the computer • The browser now knows the IP address of the URL and can communicate with the web server Packet vs circuit switching • Packet switching - send data efficiently ○ Data is split into packets ○ Each packet contains part of the data, source and destination IP address and sequencing information so they can be reordered ○ Routers forward packets independently along the fastest possible route, so packets may take different routes ○ At the destination, packets are reordered and reassembled ○ Benefits § Efficient use of network - routes can be shared by many users § Reliable - if one route fails another can be used § Good for data sent irregularly § More scalable ○ Drawbacks § Packets must be reordered § Packets may be lost in transmission § Each packet needs extra information stored in header • Circuit switching - dedicated communication path between 2 devices e.g. old telephones ○ A dedicated circuit/path is established ○ All data travels along the same route ○ Resources are reserved for the connection until communication ends ○ Benefits § Consistent performance § Good for real time communication e.g. calls ○ Drawbacks § Inefficient - reserved bandwidth may sit unused § Setup time required - connection must be established first § Single point of failure (if circuit breaks communication stops § Poor scalability Network security threats • Hackers ○ Attempt to gain unauthorised access to a computer or network ○ Prevention § Firewalls, strong passwords, access control, regular software updates • Viruses ○ A virus is malicious software that replicates itself and spreads between computers ○ Usually attaches itself to files or programs ○ Corrupts/deletes data, slows systems, spreads to other devices ○ Prevention § Anti-virus software, avoiding suspicious downloads, regular software updates • Unauthorised access ○ When someone gains access to a system without permission ○ Risks of data theft, modification, privacy breaches ○ Prevention § Strong passwords, authentication/user IDs, encryption, access control • Denial of service (DoS) ○ An attack that floods a network with traffic so actual users cannot access it ○ Websites/services become unavailable, disruption to businesses ○ Prevention § Traffic filtering (blocks suspicious or excessive traffic), firewalls, load balancing (spreading traffic across servers) • Spyware ○ Malware that monitors user activity and collects information ○ Risk of stolen passwords, privacy breach (browsing history), identity theft ○ Prevention § Anti-spyware, avoiding suspicious links/downloads, regular software updates • SQL injection ○ Malicious SQL code is entered into input fields to manipulate a database e.g. entering SQL commands into a login box ○ Risks of unauthorised database access, stolen/deleted data ○ Prevention § Input validation, parameterised queries, limiting database permissions • Phishing ○ Using fake emails/messages/phone calls to trick users into revealing sensitive information ○ Risks of stolen passwords financial fraud ○ Prevention § User awareness/training, checking email senders carefully, multi-factor authentication • Pharming ○ Redirects users from a legitimate website to a fake one without their knowledge ○ By compromising Domain Name Systems or local devices ○ Stolen login details, malware installation ○ Prevention § Secure DNS, anti-malware, checking HTTPS certificates Network threat prevention techniques • Firewalls - monitor and filter incoming traffic based on predetermined security rules to block unauthorised access • Secure-passwords - make it harder for attackers to gain unauthorised access to accounts • Anti-virus - detects, blocks and removes viruses and other malware • Anti-spyware - detects and removes spyware • Software updates - patch security weaknesses • Encryption - so unauthorised users cannot read data • Multi-factor authentication - requires extra verification beyond a password Client-server vs peer-peer network • Client-server ○ One or more central servers that provide services to client computers ○ Clients request services/resources from the server e.g. browser requests web page from server ○ Benefits § Centralised management, better security, centralised backup, better scalability ○ Drawbacks § Expensive (servers), requires specialist staff, single point of failure, high maintenance • Peer-to-peer ○ All computers are equal ○ Each device can act as a server or a client, so devices share resources directly with each other ○ Benefits § Cheaper (no dedicated server), simple (easy for small networks), no dependency on central server ○ Drawbacks § Poor security (each device manages own security), difficult backups, hard to manage, performance issues as devices act as servers, poor scalability WEB TECHNOLOGIES Purpose of HTML, CSS and JavaScript • HTML defines the structure and content of the web page • CSS controls the style and appearance of the web page • JavaScript adds interactivity and behaviour to web pages HTML •
- container for metadata, information not displayed on page (title, style, link, script) • - contains all visible webpage content (p, h1, h2, h3, img, forms, buttons) • ○ alt - alternative text if image can't load or for people who use screen readers • Visit site •