Endurance Consulting | Endurance Consulting | Fractional CTO & Platform Engineering Leadership

About Adam Hicks

Wed, 06 May 2026 00:00:00 +0000

Experience & Case Studies

Wed, 06 May 2026 00:00:00 +0000

Services

Wed, 06 May 2026 00:00:00 +0000

Fractional CTO vs. Technical Consultant: What's the Difference?

Fri, 01 May 2026 00:00:00 +0000

Clients ask me this in almost every first call: “What’s the difference between a fractional CTO and a technical consultant?”

On the surface, they look similar. Both work part-time, both charge similar rates, both claim to solve your technology problems. But the scope of responsibility is different, and hiring the wrong one is expensive.

The Consultant Model

Technical consultants solve defined problems.

You know what’s broken, you know roughly how to fix it, you need execution horsepower. Examples:

“Migrate our monolith to microservices”
“Implement Kubernetes with proper GitOps”
“Build an observability pipeline using OpenTelemetry”

Consultants are project-scoped. They deliver an artifact: architecture document, working platform, implementation roadmap. When the project ends, they leave.

What consultants don’t own:

Long-term technical strategy
Build vs. buy decisions
Engineering team performance and culture
Vendor relationships and negotiations
Technology budget allocation
Alignment between engineering roadmap and business goals

The Fractional CTO Model

Fractional CTOs own outcomes, not just deliverables.

You know you have a technology problem, but you’re not sure:

What the actual problem is
Whether it’s worth solving
What tradeoffs you’re making
Who should own it internally

Fractional CTOs operate at the executive level, not the project level. Examples of what I do:

Strategic Decision-Making

Client scenario: Should we build our ML platform in-house or use Databricks?
My role: Model TCO over 3 years, assess team capability, evaluate lock-in risk, make a recommendation, own the decision

Organizational Design

Client scenario: Our platform team is underwater. Do we hire more people or change scope?
My role: Audit current workload, identify low-value work, redesign team charter, help hire, establish OKRs

Technology Due Diligence

Client scenario: We’re evaluating an acquisition. Is their tech stack viable?
My role: Audit architecture, assess technical debt, estimate integration costs, flag deal-breakers

Vendor Negotiation

Client scenario: Our observability vendor wants to raise prices by 40%.
My role: Assess alternatives (including open-source), architect OTEL-based migration path, negotiate renewal from position of strength

Risk Assessment

Client scenario: Are we ready to scale from 10K to 1M users?
My role: Identify failure modes, design chaos experiments, prioritize infrastructure investments, establish SLOs

None of these are “projects.” They’re executive judgment calls requiring pattern recognition across technology, business, and organizational dynamics.

When You Need a Fractional CTO

You need fractional CTO-level thinking when:

You’re making multi-million-dollar technology bets: Build vs. buy decisions, platform investments, major vendor commitments
Technology is blocking business goals: Sales is losing deals because your platform can’t scale. Marketing can’t launch campaigns because your data pipeline is broken.
You have a Director of Engineering but no CTO: Your Director is great at execution but doesn’t have the scar tissue to make strategic calls
You’re post-Series A, pre-full-time-CTO: You’ve raised enough money that bad technology decisions are existential, but not enough to hire a $400K/year CTO
Your current CTO is underwater: They’re great technically but drowning in operations. They need strategic air cover.

When You Need a Consultant

You need consultant-level execution when:

The problem is well-defined: “Implement Kubernetes” is a consultant problem. “Should we use Kubernetes?” is a CTO problem.
You lack internal execution capacity: Your team knows what to do but doesn’t have bandwidth
You need deep domain expertise: Migrating from Oracle to Postgres requires database specialists, not strategic thinking
The engagement is time-bound: Platform migration projects have clear start/end dates

Why I Do Both

The honest answer is that most fractional CTO engagements include consulting work.

Example: I’m advising a client on observability strategy (fractional CTO work). My recommendation is “adopt OpenTelemetry.” They ask: “Can you help implement it?” Now I’m doing consultant work within a CTO engagement.

The difference is who owns the decision. I own the strategy. I recommended OTEL because I believe it’s the right long-term architecture. If I were only a consultant, I’d implement whatever they asked for, even if I thought it was the wrong call.

Pricing Differences

Consultants typically charge project-based or hourly:

$200-$400/hour for technical consultants
$50K-$200K for project engagements (migrations, implementations)

Fractional CTOs typically charge monthly retainers:

$10K-$25K/month for roughly 2-3 days of attention, priced for the judgment and the ongoing relationship rather than the hours
Comparable to a $300K-$500K fully-loaded full-time CTO, at a fraction of the cost

The retainer model reflects always-on availability for critical decisions. When your site goes down at 2 AM, I’m not going to invoice you for emergency advice. That’s included.

Real-World Hybrid Example

Current engagement with a mid-market SaaS company:

Fractional CTO scope:

Own technology roadmap and alignment with product
Advise on build vs. buy for analytics platform
Participate in board meetings to report on engineering health
Help recruit senior engineers
15 hours/month retainer

Consulting scope (within same engagement):

Implement OpenTelemetry golden paths
Design Kubernetes autoscaling strategy
Conduct architecture review for resilience
Billed hourly on top of retainer

This is common. The CTO work defines priorities. The consulting work executes them.

The Wrong Hire Is Expensive

Hire a consultant when you need a CTO:

Your projects get delivered, but they don’t move the business forward
You make technology decisions without considering long-term consequences
You pay for implementation but not strategy

Hire a fractional CTO when you need a consultant:

You pay executive rates for work that could be done by a senior engineer
The engagement drags because there’s not enough strategic decision-making to fill the retainer

Bottom Line

If you can write a scope of work with clear deliverables, hire a consultant.

If your problem is “I don’t know what I don’t know,” hire a fractional CTO.

Most organizations need both at different times. The key is knowing which hat you’re hiring for.

Not sure what you need? Let’s talk. The first conversation is free, and I’ll tell you straight whether you need fractional leadership or project execution.

GenAI Observability: What to Measure When Your Product Uses LLMs

Tue, 28 Apr 2026 00:00:00 +0000

After two quarters embedded with a Fortune 10 company’s Applied Machine Learning teams instrumenting GenAI workloads, I’ve seen the core mistake: most organizations instrument LLM applications like they’re REST APIs. They’re not.

Traditional observability (latency, error rate, throughput) tells you that something broke. GenAI observability tells you why your AI is failing to deliver value.

What Makes GenAI Different

When your API returns a 500 error, that’s unambiguous. When your LLM returns a response, you have no idea if it’s:

Correct
Hallucinated
Off-topic
Biased
Too expensive
Too slow for the user’s patience

You need different instrumentation.

The Four Layers of GenAI Observability

Layer 1: Infrastructure Metrics (Table Stakes)

These are your traditional observability signals, adapted for LLM workloads:

Latency Metrics:

TTFT (Time to First Token): How long before the user sees something? This determines perceived performance.
Tokens per Second: Throughput rate during generation. Affects user patience.
Total Request Duration: End-to-end latency including prompt processing.

Cost Metrics:

Cost per Request: Input tokens × price + output tokens × price. Track cached and uncached input separately, since prompt caching charges cached tokens at a fraction of the rate
Cost per User Session: Aggregated across multi-turn conversations
Cost per Feature: Which parts of your product are burning money?

Throughput:

Requests per Second: Standard, but important for capacity planning
Concurrent Requests: How many LLM calls are in-flight?
Queue Depth: Are you throttling before you hit provider rate limits?

Layer 2: LLM-Specific Signals

This is where GenAI observability diverges from traditional monitoring:

Token Metrics:

Input Token Count: How much context are you sending? Larger = slower + more expensive
Output Token Count: How verbose is your model? Can affect UX and cost
Cache Hit Rate (if using prompt caching): Are you paying for redundant processing?

Model Behavior:

Temperature: Are you using consistent sampling parameters?
Model Version: Track which model version generated each response (for A/B testing and rollback)
Retry Count: How often are you retrying failed requests?
Fallback Triggers: When did you fall back from your primary model to a cheaper or backup one, and why?

Rate Limiting:

Rate Limit Hits: How often are you throttled by your provider?
Quota Exhaustion: Are you hitting daily/monthly spending caps?

Layer 3: Quality Signals

Infrastructure can be perfect while your AI delivers garbage. You need quality metrics:

Response Quality (Automated):

Toxicity Score: Are you generating harmful content?
Relevance Score: Does the response match the prompt intent?
Hallucination Detection: Is the model making things up? (This is hard; more below)
PII Leakage: Are you exposing sensitive data without realizing it?

Response Quality (Human-Labeled):

Thumbs Up/Down Ratios: The simplest signal
User Edits: Did the user have to fix the output?
Retry Rate: Did the user regenerate the response?
Abandonment: Did they give up and close the feature?

Prompt Engineering Effectiveness:

Prompt Version: Track which prompt template was used
Few-Shot Example Count: How many examples are you including?
RAG Context Size: How much retrieved context are you injecting?

Layer 4: Business Impact

The reason you’re building with LLMs is business value. Measure it:

User Engagement:

Feature Adoption: Are users using your AI features at all?
Session Length: Does AI make users stick around longer?
Churn Impact: Do AI users churn less?

Conversion:

AI-Assisted Conversions: Did the LLM help close a sale?
Content Generation Volume: For content- or code-generation products, output volume maps directly to revenue

Cost-Benefit:

Revenue per Dollar Spent on LLMs: Your AI P&L
Cost per Value Delivered: What’s the unit economics?

What You Can Instrument Today vs. What’s Hard

Easy Wins (Implement These First)

TTFT and Tokens/s: Every LLM provider returns timing data. Log it.
Cost Tracking: Token counts × pricing. Track by user, by feature, by model.
Model Version & Parameters: Log which model you called and with what settings.
User Feedback: Add thumbs up/down buttons. You’d be shocked how few products do this.

Medium Difficulty

Prompt Versioning: Treat prompts like code. Version them, deploy them, track which version served each request.
RAG Observability: If you’re doing retrieval, log what you retrieved, how relevant it was, and whether it made it into the final response.
Trace Context: Use OpenTelemetry to connect your LLM call to the upstream request. When a user complains, you can trace back through your entire stack.

Hard Problems

Hallucination Detection: There’s no silver bullet. You need:
- Fact-checking against known sources (expensive)
- Consistency checks across multiple generations (slow)
- Human labeling of a sample (doesn’t scale)
Semantic Quality: “Is this response helpful?” is subjective. You’ll need:
- LLM-as-judge (use a second model to grade the first; yes, really)
- Human eval loops (label a sample, train a classifier)
- User behavior proxies (did they edit it? regenerate? abandon?)

OpenLLMetry: The Standard That’s Emerging

OpenLLMetry is to LLM observability what OpenTelemetry is to traditional observability: a vendor-neutral instrumentation standard.

It’s built on top of OpenTelemetry and adds semantic conventions for:

LLM provider and model name/version
Token counts (prompt, completion, and cached)
Cost
Prompt and completion payloads (with PII redaction)

Worth knowing where this is heading: OpenLLMetry pioneered these conventions, and the OpenTelemetry GenAI working group is now standardizing the same ground upstream as gen_ai.* semantic conventions. Instrumenting on OTEL today keeps you aligned with where the ecosystem is converging, not locked to one vendor’s schema.

If you’re starting from scratch, use OpenLLMetry. It gives you:

Vendor portability (swap observability backends without reinstrumenting)
Ecosystem compatibility (works with Datadog, Honeycomb, Grafana, etc.)
Future-proofing (as the space matures, tooling will standardize on OTEL)

Real-World Implementation: What I Built at Fortune 10 Scale

I can’t share specifics, but the architecture was:

OpenLLMetry SDK wrapping LLM provider calls
OTEL Collector for aggregation, sampling, and enrichment
Trace Backend (vendor withheld) for storage and visualization
Custom Dashboards showing:
- TTFT P50/P95/P99 by model and feature
- Cost burn rate and projections
- Token usage patterns (prompt size vs. output size)
- Model version distribution
Alerting on:
- TTFT degradation (user experience)
- Cost spikes (budget protection)
- Error rate increases (availability)
- Rate limit hits (capacity planning)

Result: Engineering teams could diagnose production AI issues in minutes instead of days, and we caught a $50K/month cost leak from a prompt that was including full document context on every call.

Advice for Teams Starting Out

Start simple:

Log TTFT and cost for every LLM call
Add thumbs up/down feedback
Set a budget alert

Iterate toward quality:

Version your prompts
A/B test prompt variations
Sample and label responses for quality

Invest in infra when it pays:

If you’re spending $10K/month on LLMs, you can afford manual tracking
If you’re spending $100K/month, you need automated observability
If you’re spending $1M/month, you need a dedicated AI observability platform

The Tooling Landscape

Open Source:

OpenLLMetry: OTEL-native instrumentation SDK
Langfuse: Tracing, evals, and prompt management (self-hostable)
Phoenix (Arize): LLM tracing and evaluation

Commercial:

LangSmith: Tracing and eval platform from the LangChain team (works beyond LangChain)
Braintrust: Eval-focused, strong for LLM-as-judge and regression testing
Honeycomb: General-purpose tracing with LLM support
Datadog: APM expanding into LLM observability
Arize AI: Purpose-built for ML/LLM monitoring
Helicone: Cost tracking and prompt management

Avoid building from scratch unless you’re Netflix-scale. The tooling is maturing fast.

Bottom Line

GenAI observability is how you keep AI products from bleeding money or delivering garbage. Start with infrastructure metrics, add quality signals, then tie it to business impact.

And for the love of all that’s holy, instrument your prompts. If you don’t know which prompt template generated a bad response, you can’t fix it.

Building an LLM-powered product and not sure what to instrument? I’ve done this at Fortune 10 scale and scrappy startup scale. Let’s talk.

OpenTelemetry Adoption: Why Most Organizations Get It Wrong

Wed, 15 Apr 2026 00:00:00 +0000

After leading observability rationalization projects at Fortune 500 companies and serving as the OpenTelemetry SME at Groundcover, I’ve watched dozens of organizations attempt OTEL adoption. Most start with the wrong assumptions.

The Common Mistake: Big Bang Migration

Organizations treat OpenTelemetry adoption like a database migration: plan everything, execute once, celebrate. This fails because:

Legacy instrumentation has institutional knowledge baked in: your existing dashboards and alerts encode years of production incidents
Vendor SDKs are wired into everything: ripping them out breaks more than observability
Teams resist learning curves: especially when the old tools “work fine”

The Better Approach: Strategic Layering

Instead of replacing existing observability, add OpenTelemetry as a data layer:

Phase 1: New Services Only (Months 1-3)

Mandate OTEL for all new microservices
Create golden path instrumentation templates
Export telemetry in parallel to both your OTEL collector and the legacy backend
Build confidence without breaking production

Phase 2: Edge and Mesh Instrumentation (Months 3-6)

Instrument API gateways, load balancers, and (if you run one) the service mesh with OTEL
Capture entry-point and hop-level spans without touching application code
Push W3C trace-context propagation inward so those spans start linking into end-to-end traces, instead of staying isolated at the edge

Phase 3: High-Value Migrations (Months 6-12)

Migrate services where legacy observability is painful
Target the services that deploy often, break often, and have the worst instrumentation
Show ROI: faster troubleshooting, lower vendor costs

Phase 4: Observability Pipeline (Months 12+)

Introduce OTEL collector pipelines for sampling, enrichment, routing
Route less critical data to cheaper backends
Keep high-value data in premium tools

Why This Works

It separates data collection from data destinations. Your teams adopt OTEL instrumentation without also migrating dashboards, alerts, and runbooks. You de-risk the transition.

It proves value early. By month 4 you have distributed traces across your instrumented paths and edge: request flows most orgs can’t see today, and you got there without migrating a single dashboard or alert.

It strengthens your hand at renewal. When your observability vendor sees OpenTelemetry collectors in production, renewal conversations get interesting. You’re no longer locked in.

The Build vs. Buy Decision

The dirty secret is that OpenTelemetry is infrastructure, not a product. You still need:

Trace storage and query engines
Dashboarding and visualization
Alerting and incident management
SLO tracking and error budgets

Some organizations build these (Uber, Netflix). Most should buy (Honeycomb, Grafana, Datadog’s OTEL support).

The win isn’t avoiding vendors. It’s avoiding vendor lock-in.

Real-World Outcomes

At a Fortune 500 semiconductor manufacturer, we:

Reduced observability spend by 40% over 18 months
Cut MTTR by 25% through distributed tracing
Migrated 60% of critical services without a single SEV-1 incident

At a major financial services firm:

Consolidated 9 observability vendors down to 3
Improved signal-to-noise ratio (alert fatigue dropped by 50%)
Gave teams flexibility to choose backends based on use case

Where Organizations Need Help

Most engineering teams can instrument a service with OTEL. What’s hard:

Sampling strategy: head-based sampling is cheap but blind; tail-based sampling keeps the errored and slow traces that matter, but forces you to buffer every span until the trace completes. Tuning that tradeoff at scale is the hard part.
Collector pipelines: enrichment, routing, backpressure handling, failover
Schema governance: preventing instrumentation drift across 200 microservices
Organizational change: getting SREs to trust new tooling

This is where fractional leadership pays off. You need someone who’s done this before, can architect the strategy, and doesn’t need 6 months to understand your infrastructure.

Bottom Line

OpenTelemetry adoption is an organizational transformation disguised as a technical problem. Treat it like one.

Need help with your observability strategy? I’ve led these transformations at enterprise scale. Let’s talk.