Master's Revision: Data Analytics & Governance

1. Analyse the significance of data cleaning and transformation in analytics readiness

Analytics readiness is the foundation of high-quality decision-making. Without these steps, models suffer from "Garbage In, Garbage Out" (GIGO).

Trust & Reliability: Cleaning removes duplicates and outliers, ensuring stakeholders trust the data.
Structural Unity: Transformation (like Normalization) allows data from different sources (e.g., Sales and HR) to "talk" to each other.
Feature Engineering: It turns raw data into useful variables, such as converting a birthdate into an "Age" category for better modeling.

2. Discuss the role of data governance in supporting organizational compliance

Data Governance (DG) provides the legal and operational "rulebook" for handling information assets.

Accountability: DG defines Data Stewards—specific people responsible for data quality and security.
Data Lineage: It creates an audit trail, showing exactly where data originated and how it was changed, which is vital for legal audits.
Standardized Security: Ensures that privacy rules (like encryption) are applied across the entire company consistently.

3. Evaluate privacy preserving techniques for sensitive data analytics

Technique	Concept	Critical Evaluation
Differential Privacy	Adding mathematical "noise" to data.	High security, but can make small datasets less accurate.
Anonymization	Removing Names/IDs.	Fast and easy, but vulnerable to "re-identification" by clever hackers.
Homomorphic Encryption	Processing encrypted data.	Highest security; extremely slow and computationally expensive.

4. Analyse the role of analytics engineering in modern enterprises, distinguish it from data science and software engineering and evaluate its impact on organizational decision making and operational efficiency

Analytics Engineering is the bridge between raw data collection and final analysis.

Distinction: Unlike Software Engineering (building apps) or Data Science (predicting trends), Analytics Engineering focuses on clean, modeled data for business users.
Efficiency: It automates the data pipeline, removing the "wait time" for custom reports.
Decision Making: It creates a "Single Source of Truth," ensuring every department uses the same definitions for key metrics like "Revenue."

5. Compare monolithic, microservices and event driven architectures for analytics applications. Evaluate their suitability for large scale real time financial analytics platform

Monolithic: All-in-one system. Easy to build but fails under heavy financial loads.
Microservices: Broken into independent parts. Good for growth, but "talking" between parts can cause delays.
Event-Driven: Recommended for Finance. Processes data as it happens (e.g., a bank transaction), allowing for instant fraud detection.

6. Analyse the performance optimization techniques used in analytics application

To ensure speed in big-data environments, developers use:

Indexing: Like a book index, it helps the system find specific data without scanning everything.
Materialized Views: Pre-calculating complex math so the answer is ready before the user even asks.
Columnar Storage: Storing data in columns (instead of rows) to speed up "average" or "sum" calculations across millions of records.

7. An e-commerce company collects customer browsing behaviour, purchase history and geolocation data to train a recommendation model. Analyse whether the system complies with data privacy regulations and identify compliance gaps

Potential compliance failures include:

Lack of Consent: Many companies track behavior without a clear "Opt-In" from the user.
Data Minimization Gap: Collecting exact GPS locations for a product recommendation is often considered "excessive" under the law.
Right to be Forgotten: Systems often lack a way to completely delete a user's history if requested.

8. Evaluate how the general data protection regulation and the local data protection regulation affect data analytics design

Modern design must be "Privacy by Design":

Data Residency: Laws may require data to stay within the country (e.g., Kenyan data must stay on Kenyan servers).
Explainability: If an algorithm makes a decision (like a loan denial), the system must be designed to explain why.
Automatic Masking: Designing systems so that analysts only see "hidden" versions of names and IDs by default.