Data Science Project Management

Compiled from all unique exam questions · Aug 2024 · Nov 2024 · Apr 2025 · Dec 2025
01
Q Justify (one reason) why a feasibility study is important before undertaking a project.
It prevents wasted resources. A feasibility study reveals early whether the project is technically achievable, financially viable, and operationally practical. Without it, an organisation may commit large budgets, staff, and time to a project that cannot succeed — resulting in sunk costs and opportunity loss. In data science specifically, feasibility also checks whether the required data exists, is accessible, and is of sufficient quality to support the intended models.
Q Discuss Project Feasibility Analysis (PFA) in the context of project management.

PFA is a structured pre-project evaluation that examines whether a proposed project can be successfully planned and executed. It is conducted before the project is approved and resources are committed.

PFA typically assesses the following dimensions:

  • Technical feasibility: Can the project be built with existing technology, tools, and expertise? For DS projects this includes checking data availability and infrastructure.
  • Economic feasibility: Do the projected benefits (ROI, NPV, payback period) outweigh the costs?
  • Operational feasibility: Will the end-users and the organisation actually adopt and use the output?
  • Schedule feasibility: Can the project be completed within acceptable time constraints?
  • Legal/Ethical feasibility: Does the project comply with data-protection laws and ethical guidelines?

The result of PFA is a feasibility report that informs the go/no-go decision for project initiation.

Q Identify and describe three common project feasibility checkpoints. (Nov 2024 Q2c)
1 — Technical Checkpoint

Evaluates whether the organisation possesses (or can acquire) the hardware, software, data infrastructure, and skilled personnel needed to execute the project. For a DS project this means assessing data pipelines, compute resources, and model-deployment capabilities.

2 — Financial / Economic Checkpoint

Examines cost-benefit trade-offs. Metrics such as Net Present Value (NPV), Return on Investment (ROI), and payback period are used to determine whether the project is economically justified and fundable.

3 — Operational / Organisational Checkpoint

Assesses whether the organisation has the processes, change-management capacity, and stakeholder support for the project outputs to be used effectively. A technically sound DS model is worthless if end-users are not trained or willing to adopt it.

02
Q Justify using two reasons why understanding the business is crucial before or during a data science project.
Reason 1 — Alignment of Data Science Goals with Business Objectives

Without understanding the business, data scientists risk building technically excellent models that answer the wrong question. For example, a churn-prediction model built without understanding customer-retention strategy may optimise the wrong metric. Business context ensures the problem definition, target variable, and success criteria are meaningful.

Reason 2 — Identification of Relevant Data and Constraints

Business context reveals which data sources are available, which are trusted, and which face legal or ethical restrictions (e.g., GDPR, Kenya Data Protection Act). It also uncovers domain-specific rules that influence feature engineering and model interpretation — saving re-work late in the project.

03
Q Explain a project checklist and give one reason why it is important for a data science project.

A project checklist is a structured, sequential list of tasks, milestones, deliverables, and verification items that must be completed or confirmed at defined stages of the project. It acts as a quality-control and progress-tracking tool used by the project manager and team.

Importance for DS projects: Data science projects involve iterative, non-linear workflows (data collection → cleaning → modelling → evaluation → deployment). A checklist prevents steps from being skipped — for instance, ensuring data-quality checks are performed before model training, which avoids the classic "garbage in, garbage out" problem.
Q Identify a data science project and create a complete checklist covering at least five key contents. (Apr 2025 Q2a)

Project: Student Academic Performance Prediction System for KCA University

Objective: Predict students at risk of failing so that early interventions can be applied.

#Checklist ItemDescription / Activities
1Problem Definition & Business CaseDefine prediction target (pass/fail/grade), establish stakeholder sign-off, document expected ROI and success metrics (e.g., reduction in dropout rate).
2Data Identification & AccessIdentify data sources (SIS, LMS, attendance records), confirm data-sharing agreements, ensure compliance with student privacy laws. Verify data completeness and time coverage.
3Data Preparation & Quality AssuranceHandle missing values, remove duplicates, encode categorical variables, normalise numerical features. Document all transformations. Validate that class balance is acceptable.
4Model Development & ValidationTrain multiple candidate models (Logistic Regression, Random Forest, XGBoost). Use stratified k-fold cross-validation. Compare using accuracy, F1-score, and AUC-ROC.
5Ethical Review & Bias AuditCheck that the model does not discriminate by gender, ethnicity, or disability. Obtain institutional ethics clearance before deployment.
6Deployment & IntegrationIntegrate prediction dashboard into university student support system. Train academic advisors. Set up monitoring for model drift and schedule quarterly re-training.
7Documentation & ClosureProduce technical report, user manual, and lessons-learned document. Archive code in version-controlled repository. Obtain final sign-off from stakeholders.
Q Develop a project preparation and planning checklist for a healthcare disease-prediction project (at least 5 activities, each justified). (Dec 2025 Q2b)

Project: Disease Prediction from Patient Data for a Kenyan Healthcare Provider

  • Stakeholder Identification & Requirements Gathering — Clinicians, hospital admin, and IT staff have different needs. Early alignment prevents scope creep and ensures the model output is clinically usable.
  • Data Governance & Ethics Clearance — Patient data is sensitive. Obtaining IRB/ethics clearance and confirming compliance with Kenya's Data Protection Act (2019) before accessing records is legally mandatory.
  • Data Availability & Quality Assessment — Determine which electronic health record fields are consistently populated. Poor data quality is the leading cause of DS project failure. Early assessment allows remediation time.
  • Infrastructure & Resource Planning — Identify compute resources, storage, and software licenses needed. Healthcare facilities in Kenya often have limited IT infrastructure; cloud vs on-premise decisions must be made early.
  • Risk Register Creation — Document risks (biased training data, model misuse by untrained staff, connectivity outages) and assign mitigation owners. Proactive risk planning reduces firefighting during execution.
  • Project Schedule & Budget Baseline — Create a Gantt chart with milestones and a costed work breakdown structure. This enables earned-value monitoring and early detection of budget overruns.
04
Q Justify, using three reasons, why project planning and preparation is important before undertaking a project.
1 — Establishes Clear Scope and Objectives

Planning forces stakeholders to agree on what the project will and will not deliver. This reduces scope creep — one of the most common causes of cost and schedule overruns in DS projects.

2 — Enables Effective Resource Allocation

Preparation reveals the human, technical, and financial resources required. Without this, teams may discover mid-project that critical skills (e.g., ML engineers, domain experts) or data infrastructure are unavailable.

3 — Provides a Baseline for Monitoring and Control

A documented plan gives the project manager a baseline against which actual progress, cost, and quality can be compared. This enables early detection and correction of deviations before they become critical.

05
Q Discuss any two known criteria for selecting project models within project management.
1 — Realism

The model must reflect the true situation of the project and organisation. It should capture real constraints (budget, technology maturity, regulatory environment) and real objectives, rather than idealistic assumptions. A realistic model produces decisions that can actually be implemented.

2 — Capability

The selection model must be capable of adequately distinguishing between good and bad projects — i.e., it must be sensitive enough to differentiate project proposals on the dimensions that matter (ROI, risk, strategic fit). A model that rates all projects similarly has no discriminatory power and is useless for decision-making.

Other recognised criteria include: flexibility (adaptable to change), ease of use (understood by decision-makers), and cost-effectiveness (the cost of the selection process should not exceed its benefit).

06
Q Describe any two common Project Selection Models.
1 — Numeric (Quantitative) Models

These models use measurable financial or statistical criteria to rank and select projects. Common examples include Net Present Value (NPV), Internal Rate of Return (IRR), Payback Period, and Scoring Models (weighted criteria matrices). They are objective and comparable across projects but require reliable data estimates which may be difficult early in a project.

2 — Non-Numeric (Qualitative) Models

These rely on human judgment, strategic priorities, or political considerations rather than numbers. Examples include Sacred Cow (a project championed by senior leadership), Operating Necessity (required for the organisation to continue operating), and Comparative Benefit (comparison of projects by committee opinion). They capture intangible strategic value but can be subjective and inconsistent.

07
Q Discuss one key advantage of Nonnumeric models. (Nov 2024)
Captures Strategic and Intangible Value: Nonnumeric models allow decision-makers to select projects based on strategic alignment, competitive positioning, regulatory compliance needs, or reputational considerations that cannot be reduced to a financial number. For example, a project that improves community trust or fulfils a government mandate may have an NPV of zero yet be essential for the organisation's long-term survival.
Q Discuss two benefits and two drawbacks of choosing projects via nonnumeric/qualitative selection, and suggest one way to make such decisions fairer. (Dec 2025 Q1b)

Benefits:

  • Speed: Decisions can be made quickly without lengthy financial modelling, which is valuable when a project is time-critical.
  • Captures intangibles: Factors like brand reputation, staff morale, and societal impact — which resist quantification — can still influence the decision.

Drawbacks:

  • Bias and subjectivity: Decisions may reflect the preferences of influential leaders rather than the organisation's best interest (the "HiPPO effect" — Highest-Paid Person's Opinion).
  • Lack of consistency: Without a structured framework, different decision-makers may evaluate the same project very differently, making it hard to compare alternatives fairly.
Strategy to improve fairness: Introduce a weighted scoring matrix that forces decision-makers to rate each project against agreed strategic criteria (e.g., alignment with university vision, resource availability, ethical compliance). This preserves qualitative judgment but channels it through a transparent, documented, and consistent structure.
08
Q Discuss one key advantage of profit-profitability numeric models.
Objectivity and Comparability: Profitability numeric models (NPV, IRR, payback period, ROI) produce quantified, comparable scores for each project alternative. This removes personal bias from project selection — a project with NPV of KSh 5 million can be directly compared to one with NPV of KSh 2 million. The objectivity makes the selection process auditable and defensible to external stakeholders such as boards, funders, and regulators.
09
Q Describe the typical stages you would follow in developing a data science project for KCA University.

The following stages reflect an industry-standard DS lifecycle (aligned to CRISP-DM and agile DS practice):

  1. Business Understanding (Problem Definition): Engage stakeholders to clarify the business problem, define measurable success criteria, and document constraints (budget, timeline, data access). For KCA, this might mean confirming what "student at risk" means to academic advisors and what data the SIS holds.
  2. Data Understanding: Perform an initial exploration of available data — inventory sources, assess volume/velocity/variety, identify obvious quality issues, and document data lineage. Exploratory Data Analysis (EDA) begins here.
  3. Data Preparation: Clean and transform raw data into a model-ready dataset. This includes handling missing values, outlier treatment, encoding, normalisation, feature engineering, and train/test splitting. This stage typically consumes 60–80% of project time.
  4. Modelling: Select and train candidate algorithms (e.g., Logistic Regression, Decision Trees, Neural Networks). Tune hyperparameters. Track experiments using version control (MLflow, DVC).
  5. Evaluation: Assess models against business success criteria (not just technical metrics). Validate on holdout data. Conduct bias and fairness audits. Present results to stakeholders for approval.
  6. Deployment: Package the approved model as an API or dashboard and integrate into the operational system. Establish monitoring for data drift and model performance decay.
  7. Monitoring & Maintenance: Continuously track prediction accuracy in production. Schedule periodic re-training as new data accumulates. Document model version changes.
  8. Project Closure: Archive code, data, and documentation. Conduct a lessons-learned review. Obtain formal stakeholder sign-off and transition to operational support teams.
10
Q Discuss the data science project development approach that builds on and extends the 6-stage CRISP-DM standard.

CRISP-DM (Cross-Industry Standard Process for Data Mining) provides a cyclical 6-stage framework: Business Understanding → Data Understanding → Data Preparation → Modelling → Evaluation → Deployment. However, industry DS projects require extensions to address enterprise concerns. The extended approach adds:

Stage 0 — Project Initiation & Governance

Before CRISP-DM begins, the extended approach adds formal project charter creation, stakeholder mapping, ethics review, data-sharing agreements, and project selection sign-off. This grounds the DS work in project management discipline.

Stage 1 — Business Understanding (extended)

Beyond defining objectives, the extended approach requires mapping the business process the DS solution will support, identifying KPIs for success, and conducting a PFA. ROI calculations and strategic alignment are documented here.

Stage 2 — Data Understanding (extended)

Extends CRISP-DM by adding data-quality profiling tools, data lineage mapping, and a data-readiness assessment. Data infrastructure needs (warehouses, lakes, streaming platforms) are identified and provisioned.

Stage 3 — Data Preparation (extended)

Introduces automated data pipelines, feature stores, and reproducibility requirements (version-controlled datasets). This industrialises the preparation step for ongoing production use.

Stage 4 — Modelling (extended)

Adds MLOps practices: experiment tracking, model registry, and automated hyperparameter optimisation (AutoML). Multiple model architectures are evaluated systematically rather than ad hoc.

Stage 5 — Evaluation (extended)

Beyond technical metrics, the extended approach requires a business-impact evaluation (does it meet the KPIs from Stage 1?), an ethical audit (fairness, explainability, bias), and a risk review before deployment approval.

Stage 6 — Deployment (extended)

Extends CRISP-DM deployment to include: CI/CD pipelines for model updates, A/B testing frameworks, rollback strategies, SLA agreements, and handover documentation for operations teams.

Stage 7 — Monitoring, Maintenance & Project Closure

A new stage added by the extended approach. Covers drift detection, model retraining schedules, user feedback loops, and formal project termination with lessons-learned documentation. Critically, it treats the deployed model as a living system requiring ongoing stewardship.

The extended approach differs from pure CRISP-DM in that it is project-managed (with budgets, timelines, risk registers, and governance gates at each stage) rather than purely data-analytically driven. This makes it suitable for enterprise-scale, multi-stakeholder DS projects.
11
Q Discuss five key challenges unique to data science projects (compared to software development). (Apr 2025 Q3a)
  1. Data Dependency and Uncertainty: Software projects start with defined specifications; DS projects start with data of unknown quality, completeness, and relevance. Poor data quality cannot be compensated for by better algorithms, making DS projects inherently riskier from the outset.
  2. Iterative and Non-Linear Workflow: DS development requires frequent cycling back to earlier stages (e.g., discovering in the modelling phase that more feature engineering is needed). Traditional software SDLC is more sequential. This makes scheduling DS projects difficult and increases scope-creep risk.
  3. Experimentation and Reproducibility: DS involves testing many hypotheses (algorithms, features, hyperparameters). Without proper experiment tracking tools, results are hard to reproduce. Software projects produce deterministic code; DS produces probabilistic models whose outcomes can differ between runs.
  4. Model Decay / Concept Drift: A deployed DS model degrades over time as the real-world data distribution changes (e.g., consumer behaviour shifts after COVID-19). Software rarely "wears out" unless the business rules change. DS systems require ongoing monitoring and retraining post-deployment.
  5. Explainability and Ethics: Stakeholders increasingly demand interpretable models (especially in healthcare, finance, and law). A software feature is inherently explainable (it follows coded logic); a neural network's decision may not be. DS projects therefore carry ethical and regulatory risks not present in standard software development.
Q Explain a DS project to a novice and discuss two features that set it apart from a typical IT project, in a Kenyan context. (Dec 2025 Q2a)

A data science project uses statistical methods, machine learning, and domain knowledge to extract insights from data and build predictive or decision-support systems. For example, M-Pesa (Safaricom's mobile money platform) uses data science to detect fraudulent transactions in real time by analysing patterns in millions of daily transactions.

Feature 1 — Outcome Uncertainty

Unlike a typical IT project (e.g., building an HR payroll system) where deliverables are defined upfront, a DS project does not guarantee a useful model. It is possible to complete all project activities and still find that the data cannot produce a model that meets business accuracy thresholds. In Kenya, where data collection infrastructure is still maturing, this uncertainty is amplified by patchy or inconsistent historical records.

Feature 2 — Data as the Core Asset

In a conventional IT project, the product is software with defined functionality. In a DS project, data quality and availability are the primary constraint. In Kenya, challenges such as limited digitisation of records (e.g., paper-based health records in rural clinics) or multilingual unstructured data (Swahili, Sheng, English) create data preparation challenges not encountered in a typical IT project.

12
Q Discuss three distinct common challenges in implementing DS projects and a solution for each. (Apr 2025 Q1e)
ChallengeDescriptionSolution
Poor Data Quality Real-world data is often incomplete, inconsistent, duplicated, or mislabelled. Models trained on poor data produce unreliable predictions regardless of algorithm sophistication. Implement a formal data quality management framework — automated profiling tools (e.g., Great Expectations), data-cleaning pipelines, and a data stewardship programme with clear ownership of data assets.
Stakeholder Communication Gap Business users and data scientists speak different languages. Decision-makers may not understand model outputs, leading to distrust or misuse of DS recommendations. Use model explainability tools (SHAP, LIME) and invest in data visualisation dashboards. Embed a "data translator" role — someone who bridges technical and business teams and presents findings in business-metric terms.
Deployment and Integration Challenges Many DS projects produce excellent prototypes that never reach production due to incompatibilities with existing IT infrastructure, security policies, or operational workflows. Adopt MLOps practices — containerise models (Docker/Kubernetes), use CI/CD pipelines for automated testing and deployment, and involve IT operations and security teams from the planning stage rather than at the end.
Q Discuss five key challenges experienced in DS or AI projects. (Nov 2024 Q2a)
  1. Data Availability and Quality: Acquiring sufficient, labelled, high-quality data is the most persistent challenge. In many African contexts, data exists in siloed, undigitised, or inconsistently structured forms.
  2. Talent Shortage: Skilled data scientists, ML engineers, and data engineers are scarce and expensive. Teams may lack the breadth of skills (statistics, coding, domain expertise, communication) needed for a full DS project.
  3. Ethical and Bias Issues: Models can encode historical biases present in training data, producing discriminatory outcomes. Ensuring fairness requires deliberate bias auditing which adds time and cost.
  4. Scalability and Infrastructure: Prototype models that work on small samples often fail at production scale. Cloud costs, latency requirements, and data-security constraints create new challenges at scale.
  5. Organisational Change Management: Even technically successful DS systems fail if the organisation is not prepared to change its workflows and trust algorithmic recommendations. Resistance from staff who fear job displacement is a real barrier.
13
Q Discuss any five key challenges faced in data preparation and how they can be addressed.
  1. Missing Data — Incomplete records reduce dataset size and can bias results. Solution: Use imputation techniques (mean/median/mode imputation, KNN imputation, or multiple imputation) or flag and explicitly model missingness as a feature.
  2. Inconsistent Data Formats — Dates, currencies, and categorical labels stored in multiple formats cause parsing errors. Solution: Standardise formats early using data pipeline schemas and enforce validation rules at the point of ingestion.
  3. Duplicate Records — Duplicates inflate sample size and bias model training toward repeated patterns. Solution: Apply deduplication algorithms and entity-resolution techniques; use unique identifiers where possible.
  4. Imbalanced Classes — In classification problems (e.g., fraud detection), the target class may represent <1% of records, causing models to learn to always predict the majority class. Solution: Apply oversampling (SMOTE), undersampling, or class-weight adjustments; use metrics like AUC-ROC instead of accuracy.
  5. High Dimensionality (Curse of Dimensionality) — Datasets with hundreds of features make model training slow and prone to overfitting. Solution: Apply feature selection (correlation analysis, mutual information, Lasso regularisation) or dimensionality reduction (PCA, t-SNE) to retain only informative features.
14
Q Identify three numeric and two non-numeric ML techniques and explain how each works.

Numeric (quantitative output / regression) techniques:

  1. Linear Regression: Models the relationship between a continuous dependent variable and one or more independent variables by fitting a straight line (hyperplane) that minimises the sum of squared residuals. Produces a numeric output (e.g., predicted house price).
  2. Support Vector Regression (SVR): Extends SVM to regression tasks. It finds a function within an ε-margin of the true values while minimising model complexity. Effective for non-linear relationships using kernel functions.
  3. Neural Networks (Deep Learning Regression): Multi-layered networks of artificial neurons learn complex non-linear mappings from input features to a continuous output through backpropagation. Used for tasks like demand forecasting and price prediction.

Non-numeric (classification / categorical output) techniques:

  1. Decision Trees: Recursively splits the feature space into subsets based on feature thresholds that best separate classes. Each leaf node represents a class label. Highly interpretable and used for customer churn or disease classification.
  2. Naïve Bayes: A probabilistic classifier based on Bayes' theorem with the assumption that features are conditionally independent given the class. Computes the posterior probability of each class and assigns the most probable. Widely used in text classification (spam detection) and sentiment analysis.
15
Q Explain the key metrics and performance indicators used to evaluate the success of data science projects.

DS project success is evaluated at two levels — technical model performance and business impact:

Technical Metrics:

  • Accuracy: % of correct predictions (not reliable for imbalanced datasets).
  • Precision, Recall, F1-Score: More meaningful for imbalanced classification. F1 balances precision and recall.
  • AUC-ROC: Area Under the ROC Curve — measures model discrimination power across thresholds. AUC = 0.5 is random; AUC = 1.0 is perfect.
  • RMSE / MAE: Root Mean Square Error and Mean Absolute Error for regression tasks — measure average prediction error magnitude.

Business / Project-Level KPIs:

  • ROI (Return on Investment): Financial return generated by the DS solution relative to its cost.
  • Time-to-Insight: How much faster decisions are made using the DS system vs the previous process.
  • Adoption Rate: Percentage of intended users actively using the DS output.
  • Project Completion Metrics: On-time delivery, on-budget performance, scope adherence.
16
Q Explain the rationale for project risk management.

Project risk management exists because all projects operate under uncertainty. No project — regardless of how well planned — unfolds exactly as anticipated. Risks, if unmanaged, can cause schedule delays, cost overruns, quality failures, or complete project failure.

The core rationale is proactive rather than reactive management: identifying risks early, assessing their likelihood and impact, and implementing controls before risks become issues. In DS projects specifically, risks extend beyond conventional project risks to include data-quality failures, model bias, regulatory non-compliance, and technology obsolesce — all of which require active management from day one.

Ultimately, risk management protects project investment, increases stakeholder confidence, and improves the probability of achieving project objectives within constraints.

17
Q Discuss the six-step risk management process recommended by the Project Management Institute.
  1. Plan Risk Management: Decide how risk management activities will be conducted and resourced throughout the project. Produce a Risk Management Plan that defines methodology, roles, risk categories, probability/impact scales, and reporting thresholds.
  2. Identify Risks: Systematically determine which risks could affect the project. Techniques include brainstorming, Delphi technique, SWOT analysis, checklists from past DS projects, and expert interviews. The output is a Risk Register.
  3. Perform Qualitative Risk Analysis: Prioritise identified risks by assessing their probability of occurrence and potential impact using a probability-impact matrix. This focuses attention on the most significant risks without requiring detailed quantitative data.
  4. Perform Quantitative Risk Analysis: Apply numerical methods (Monte Carlo simulation, sensitivity analysis, decision trees) to the highest-priority risks to estimate their effect on project objectives (time, cost, scope) in measurable terms.
  5. Plan Risk Responses: Develop options and strategies for each significant risk. For threats: Avoid, Transfer, Mitigate, or Accept. For opportunities: Exploit, Share, Enhance, or Accept. Assign a risk owner for each response action.
  6. Monitor and Control Risks: Track identified risks, monitor trigger conditions, implement response plans, and identify new risks as the project progresses. Conduct regular risk reviews at project milestones.
18
Q Explain three critical success factors for the plan risk management process.
1 — Senior Management Support

Risk management requires resources (time, budget, tools) and the authority to escalate issues. Without visible commitment from senior leadership, risk processes are treated as bureaucratic exercises and are ignored when they conflict with schedule pressure.

2 — Early and Continuous Stakeholder Engagement

Risks must be identified with input from all stakeholder groups — technical teams, domain experts, end-users, and legal/compliance officers. No single person sees all risks. Continuous engagement ensures new risks (e.g., a regulatory change mid-project) are captured promptly.

3 — Clear Risk Ownership and Accountability

Every risk in the register must have a named owner responsible for monitoring and implementing the agreed response. Without ownership, risk responses are planned but never executed. In DS projects, data-quality risks may be owned by the data engineer, while ethical risks may be owned by the project sponsor.

19
Q Explain how you would handle project risks and uncertainties in a data science project.

Handling risk in a DS project involves both standard PM risk practices and DS-specific adaptations:

  • Risk Identification: At project kick-off, conduct a risk workshop covering data risks (availability, quality, privacy), model risks (bias, overfitting, interpretability), operational risks (infrastructure, adoption), and schedule/cost risks.
  • Risk Register: Document each risk with: description, probability (High/Medium/Low), impact (H/M/L), risk score, planned response, and owner.
  • Data-Specific Mitigations:
    • Run a data-quality audit before committing to timelines — adjust scope if data is found to be insufficient.
    • Establish data backup and fallback sources in case the primary data source is unavailable.
    • Include privacy impact assessments (PIAs) for projects using personal data.
  • Agile Iteration as Risk Management: Using short sprint cycles means risks are surfaced and resolved frequently rather than accumulating until late in the project. Each sprint review is effectively a risk checkpoint.
  • Contingency Reserves: Build schedule buffer (typically 10–15%) and cost contingency into the project plan to absorb realised risks without derailing the project.
  • Continuous Monitoring: Review the risk register at every project milestone. Update probability and impact ratings as the project progresses and new information becomes available.
20
Q Explain Agile methodology in data science projects and how it differs from traditional project management.

Agile is an iterative, incremental approach to project management that emphasises flexibility, collaboration, and delivering working outputs in short cycles called sprints (typically 1–4 weeks).

Key Agile principles applied in DS:

  • Iterative development: Instead of delivering a complete model at project end, DS teams deliver prototype models or partial analyses at the end of each sprint for stakeholder feedback.
  • Scrum framework: Daily standups, sprint planning, sprint reviews, and retrospectives keep the team aligned and blockers visible. The product backlog contains DS user stories (e.g., "As a bank manager, I need a fraud-probability score per transaction").
  • Kanban boards: Used to visualise DS workflow stages (data collection → EDA → feature engineering → modelling → review) and manage work-in-progress limits.
  • Continuous stakeholder involvement: Business owners review model outputs at every sprint, reducing the risk of building a technically correct but business-irrelevant model.
DimensionAgileTraditional (Waterfall)
PlanningRolling; adapts each sprintUpfront; fixed baseline plan
DeliverablesIncremental working outputs every sprintSingle delivery at project end
Change managementEmbraces change; backlog adjustedResists change; formal change control
Stakeholder involvementContinuous (sprint reviews)At defined milestones only
Risk exposureLower — issues surface in each sprintHigher — issues surface late
DocumentationLean; working model over documentationHeavy; detailed specs required
Agile is particularly well-suited to DS projects because of their inherent uncertainty — the final model architecture cannot be known at the start. Short sprints allow teams to pivot based on what the data reveals.
21
Q Discuss how effective project scheduling (Gantt charts / Agile Kanban) supports budget control in a DS project; show how missed milestones cause overruns; propose two synchronisation strategies.

Scheduling and budget control: A project schedule (Gantt chart or Kanban board) translates the work breakdown structure into time-bounded tasks with dependencies. When each task is costed, the schedule becomes a time-phased budget baseline. Earned Value Management (EVM) then compares actual spend against the value of work completed:

  • Gantt charts show planned vs actual progress for each DS phase (data collection, model training, deployment). Slippage is immediately visible and budget impact can be calculated.
  • Kanban boards in Agile DS projects surface bottlenecks (e.g., data cleaning tasks piling up) that predict upcoming delays and cost increases before they materialise.
Example — Missed Milestone Leading to Budget Overrun: Suppose the data preparation milestone for a healthcare DS project was planned for Week 6 but is completed in Week 10. The DS engineers who were scheduled to begin model training in Week 7 are now idle or redeployed, wasting 3 weeks of salary cost. The cloud compute environment provisioned for model training from Week 7 incurs idle-time charges. A 4-week delay in a team of 5 at an average cost of KSh 200,000/month per person generates KSh 400,000 in unplanned personnel cost alone — not counting infrastructure waste.

Two strategies to synchronise financial tracking with schedule updates:

  1. Earned Value Management (EVM) with monthly re-baselining: Link the project schedule directly to the cost account structure. Each time the schedule is updated (Gantt bars moved), the cost baseline is automatically recalculated. The project manager reviews Cost Performance Index (CPI) and Schedule Performance Index (SPI) at every sprint review or Gantt milestone.
  2. Milestone-based payment and budget release gates: Structure the project budget so funds are released only upon achieving verified milestones (e.g., "data pipeline signed off" before model development budget is released). This creates a direct link between schedule progress and financial authority, incentivising teams to complete stages before moving on.
22
Q Describe the steps involved in the termination of a data science project and explain why each step is essential. (Aug 2024 Q2b)
  1. Final Deliverable Acceptance: Obtain formal written sign-off from the client/sponsor confirming that all deliverables meet agreed specifications. Why essential: Prevents disputes about what was delivered and triggers contractual payment.
  2. Resource Release: Release project staff back to their functional departments or other projects; terminate cloud compute subscriptions; return leased equipment. Why essential: Prevents ongoing costs after project completion and frees resources for other initiatives.
  3. Contract Closure: Close all vendor and subcontractor agreements. Confirm that all payments are settled and warranties are documented. Why essential: Protects the organisation from future financial and legal liability.
  4. Documentation and Archiving: Archive all project artefacts — code repositories, datasets, model files, reports, meeting minutes — in a retrievable system with clear labelling. Why essential: Enables future model auditing, retraining, or project reactivation. Regulatory compliance may also require retention.
  5. Lessons Learned Review: Conduct a structured retrospective with the full project team to document what went well, what went wrong, and what should be done differently. Why essential: Organisational learning prevents repeating mistakes in future DS projects and builds institutional knowledge.
  6. Formal Project Closure Report: Produce a closure report summarising final costs, schedule performance, quality outcomes, and lessons learned. Distribute to all stakeholders. Why essential: Creates an official record that the project has ended and documents accountability.
Q Design a comprehensive termination plan for an ML project terminated due to ethical concerns in data usage. (Dec 2025 Q3b)

Context: A credit-scoring ML model for a Kenyan bank is terminated mid-deployment after an audit reveals the model uses demographic proxies that discriminate against rural customers.

1 — Immediate System Shutdown & Data Quarantine

Suspend all model predictions immediately. Quarantine the training dataset and any outputs that may have caused harm. Justification: Prevents further discriminatory decisions while the full impact is assessed; protects the bank from regulatory sanctions.

2 — Impact Assessment & Stakeholder Communication

Identify all customers whose loan decisions were influenced by the biased model. Communicate transparently with the regulator (CBK), affected customers, and internal stakeholders. Justification: Regulatory bodies expect prompt, honest disclosure. Delayed communication compounds reputational and legal risk.

3 — Product Handover & Reversal

Revert the credit-scoring process to the previous (manual or rule-based) system to ensure business continuity. Hand over interim operational procedures to the credit department. Justification: The business cannot pause lending operations while the remediation is designed.

4 — Contract Closure

Review vendor and data-supplier contracts for termination clauses. Ensure data-deletion obligations under the Kenya Data Protection Act (2019) are met. Justification: Failure to honour contractual termination terms creates financial liability; data-deletion is a legal obligation.

5 — Resource Release

Release DS team members to other projects. Archive but do not delete project assets (they may be needed for the ethics investigation). Decommission cloud environments except audit-required storage. Justification: Controls ongoing costs while preserving evidence for regulatory review.

6 — Documentation of Lessons Learned

Document: root cause of the bias (which features acted as proxies), when it should have been caught (ethics review gap), and what controls failed. Recommend: mandatory bias audits at model evaluation stage in all future DS projects. Justification: Institutional learning is the primary long-term benefit of any termination — this prevents recurrence across the organisation.

Q Explain the meaning of termination with a valid example. Critically examine three distinct termination strategies. (Dec 2025 Q4b)

Termination is the formal conclusion of a project — planned or unplanned — in which project activities cease, resources are released, and the project is formally closed. It may occur upon successful completion, or early due to changed circumstances, failure, or ethical issues.

Example: The Kenyan government terminates a crime-prediction DS pilot after civil society groups raise concerns about bias against specific communities — this is an ethical early termination.

Strategy 1 — Termination by Extinction (Natural Closure)

The project reaches its planned end: all deliverables are accepted, the system is deployed, and the project team is disbanded. This is the ideal scenario. Best suited for: Projects that achieve their objectives on schedule — e.g., a DS model for supermarket demand forecasting that is successfully integrated into the ERP and handed to operations. Critical point: Even natural closure requires formal documentation; many organisations skip lessons-learned reviews when projects end positively, losing valuable institutional knowledge.

Strategy 2 — Termination by Addition

The project is so successful that it becomes a permanent part of the organisation — it is absorbed into an ongoing business unit rather than closed. Best suited for: DS systems that generate continuous value and need ongoing maintenance (e.g., a fraud-detection model at a bank becomes a permanent ML Ops function). Critical point: The transition from project to operations must be carefully managed — project-style governance (sprint planning, daily standups) must transition to a service-management model (SLAs, change control).

Strategy 3 — Termination by Starvation (Budget Cancellation)

Resources are progressively cut until the project can no longer continue. This often signals a politically difficult termination where leadership does not want to formally cancel a project but withdraws support. Best suited for: Projects that have lost strategic relevance or where results are disappointing but the organisation is unwilling to admit failure. Critical point: This is the most damaging strategy — it drains resources slowly, demoralises the team, and produces no clean closure. Better practice is an explicit go/no-go decision at a defined review gate than allowing a project to die gradually.

23
Q An ML project delivers strong ROI and positive NPV but later audits reveal discrimination. Would you call it a success? Discuss the limits of profit-based models and the importance of ethics in data science.

Short answer: No — a project cannot be deemed fully successful if it causes demonstrable harm to users, even if financial targets are met.

Limits of profit-based success models:

  • They ignore externalities: Financial ROI captures value to the organisation but not costs imposed on affected users, communities, or society. A credit-scoring model that denies loans to creditworthy rural women may generate profit for the lender but perpetuates financial exclusion.
  • They are backward-looking: Positive NPV is calculated at a point in time. Regulatory fines, reputational damage, litigation costs, and forced system shutdowns that arise post-deployment can obliterate the financial case entirely. An ethical failure can turn a positive NPV project into a net loss.
  • They don't account for trust: Long-term organisational value depends on customer trust. A discriminatory DS system — even a profitable one — erodes the social licence to operate.
Importance of ethical and social responsibility in DS: Data scientists and project managers hold significant power — their systems make decisions that affect people's access to healthcare, credit, employment, and justice. This creates a duty of care that extends beyond shareholders to all affected stakeholders. Frameworks such as the Responsible AI principles (fairness, accountability, transparency, privacy) should be embedded as success criteria from the project definition stage — not as afterthoughts evaluated in post-implementation audits. In Kenya's context, where digital systems are rapidly being adopted for government services, ethical DS practice is also a matter of constitutional rights and social equity.
24
Q Discuss the challenges and opportunities of integrating multiple tools and platforms into a cohesive DS project management ecosystem.

Challenges:

  • Tool fragmentation: DS projects commonly use Jira (project management), Git (version control), MLflow (experiment tracking), Airflow (pipelines), and Tableau (visualisation) — all from different vendors with different APIs. Keeping these in sync creates integration overhead.
  • Data security across platforms: When data moves between on-premise databases, cloud storage, and SaaS tools, data-governance policies are harder to enforce consistently.
  • Skill requirements: Team members must be proficient in multiple tools. Tool sprawl can reduce productivity if team members spend more time managing the toolchain than doing DS work.
  • Vendor lock-in: Heavy reliance on proprietary platforms (e.g., AWS SageMaker, Azure ML) makes it costly to switch vendors if pricing or service quality changes.

Opportunities:

  • End-to-end automation: A well-integrated ecosystem enables CI/CD for ML — from data ingestion through model training, testing, and deployment — reducing manual errors and accelerating delivery.
  • Improved traceability: Integrated platforms link data versions, code versions, experiment results, and deployed models, creating a complete audit trail that satisfies regulatory requirements.
  • Scalability: Cloud-native integrated platforms scale compute and storage on demand, enabling DS projects to handle large datasets without upfront infrastructure investment.
25
Q You are leading a DS project for the Kenyan government's transportation agency to optimise urban traffic flow using real-time sensor data. Describe a comprehensive risk minimisation process throughout the project lifecycle. (Dec 2025 Q4a)

Phase 1 — Initiation & Planning (Risk Identification):

  • Conduct a structured risk workshop with the transport agency, Nairobi City County, traffic police, and sensor vendors to identify risks: sensor failure, data latency, data privacy (vehicle tracking), political interference, and budget cuts.
  • Create a risk register with probability/impact ratings. Given Kenya's infrastructure variability, assign high probability to connectivity outages and power failures at sensor nodes.

Phase 2 — Data Collection (Data Risk Controls):

  • Diversify data sources: combine fixed sensors with mobile probe data (e.g., matatu GPS, Google Maps APIs) to reduce single-source failure risk.
  • Implement real-time data quality monitoring — alert when sensor dropout rates exceed 10% in any corridor.

Phase 3 — Modelling (Model Risk Controls):

  • Test models on historical traffic data before live deployment. Include scenarios for extreme events (Nairobi Marathon, state funerals).
  • Build fallback rule-based systems so traffic signal controllers revert to fixed timing if the ML system fails.

Phase 4 — Deployment & Monitoring (Operational Risk Controls):

  • Deploy in a single corridor first (e.g., Uhuru Highway) as a pilot before citywide rollout — limits blast radius of any failure.
  • Establish 24/7 operations monitoring with defined incident response procedures. Conduct quarterly model performance reviews and retrain as traffic patterns evolve (e.g., post-construction of Nairobi Expressway).
Cross-cutting control: Engage a local Kenyan DS partner to ensure cultural context (e.g., informal transport patterns, road naming conventions) is embedded in the model — a risk unique to projects that apply imported algorithmic frameworks to Kenyan urban realities.
26
Q Identify a real-world issue affecting a Kenyan industry and outline how it represents a business or social problem solvable using data science.

Issue: Crop Failure and Food Insecurity in Kenya's Agricultural Sector

Business/Social Problem: Kenya's agricultural sector employs roughly 40% of the workforce, but smallholder farmers — who produce the majority of food — face high crop-failure rates due to unpredictable rainfall, pest outbreaks, and poor soil management. This drives food insecurity (particularly in arid/semi-arid counties such as Turkana and Marsabit) and economic loss for farmers and agribusinesses that depend on reliable supply chains.

DS Solution Approach: A predictive crop-yield and risk model could integrate satellite imagery (NDVI indices from Sentinel-2), historical weather data (Kenya Meteorological Department), soil health sensor readings, and mobile-survey data from farmers via platforms like Digifarm (Safaricom). Using machine learning (Random Forests, LSTM for time-series weather patterns), the model would predict county-level yield estimates and flag high-risk zones 4–6 weeks before harvest, enabling government agencies and NGOs to pre-position food aid and input subsidies. This directly addresses both the business problem (supply chain planning for agribusinesses) and the social problem (early-warning food insecurity alerts for vulnerable communities).