<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Endurance Consulting | Endurance Consulting | Fractional CTO &amp; Platform Engineering Leadership</title><link>https://enduranceconsulting.com/</link><atom:link href="https://enduranceconsulting.com/index.xml" rel="self" type="application/rss+xml"/><description>Endurance Consulting</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 06 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://enduranceconsulting.com/media/logo_hu_98515048530bb274.png</url><title>Endurance Consulting</title><link>https://enduranceconsulting.com/</link></image><item><title>About Adam Hicks</title><link>https://enduranceconsulting.com/about/</link><pubDate>Wed, 06 May 2026 00:00:00 +0000</pubDate><guid>https://enduranceconsulting.com/about/</guid><description/></item><item><title>Experience &amp; Case Studies</title><link>https://enduranceconsulting.com/experience/</link><pubDate>Wed, 06 May 2026 00:00:00 +0000</pubDate><guid>https://enduranceconsulting.com/experience/</guid><description/></item><item><title>Services</title><link>https://enduranceconsulting.com/services/</link><pubDate>Wed, 06 May 2026 00:00:00 +0000</pubDate><guid>https://enduranceconsulting.com/services/</guid><description/></item><item><title>Fractional CTO vs. Technical Consultant: What's the Difference?</title><link>https://enduranceconsulting.com/blog/fractional-cto-vs-consultant/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://enduranceconsulting.com/blog/fractional-cto-vs-consultant/</guid><description>&lt;p&gt;Clients ask me this in almost every first call: &amp;ldquo;What&amp;rsquo;s the difference between a fractional CTO and a technical consultant?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;On the surface, they look similar. Both work part-time, both charge similar rates, both claim to solve your technology problems. But the &lt;strong&gt;scope of responsibility&lt;/strong&gt; is different, and hiring the wrong one is expensive.&lt;/p&gt;
&lt;h2 id="the-consultant-model"&gt;The Consultant Model&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Technical consultants solve defined problems.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You know what&amp;rsquo;s broken, you know roughly how to fix it, you need execution horsepower. Examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Migrate our monolith to microservices&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Implement Kubernetes with proper GitOps&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Build an observability pipeline using OpenTelemetry&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Consultants are &lt;strong&gt;project-scoped&lt;/strong&gt;. They deliver an artifact: architecture document, working platform, implementation roadmap. When the project ends, they leave.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What consultants don&amp;rsquo;t own:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Long-term technical strategy&lt;/li&gt;
&lt;li&gt;Build vs. buy decisions&lt;/li&gt;
&lt;li&gt;Engineering team performance and culture&lt;/li&gt;
&lt;li&gt;Vendor relationships and negotiations&lt;/li&gt;
&lt;li&gt;Technology budget allocation&lt;/li&gt;
&lt;li&gt;Alignment between engineering roadmap and business goals&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-fractional-cto-model"&gt;The Fractional CTO Model&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Fractional CTOs own outcomes, not just deliverables.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You know you have a technology problem, but you&amp;rsquo;re not sure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What the actual problem is&lt;/li&gt;
&lt;li&gt;Whether it&amp;rsquo;s worth solving&lt;/li&gt;
&lt;li&gt;What tradeoffs you&amp;rsquo;re making&lt;/li&gt;
&lt;li&gt;Who should own it internally&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Fractional CTOs operate at the &lt;strong&gt;executive level&lt;/strong&gt;, not the project level. Examples of what I do:&lt;/p&gt;
&lt;h3 id="strategic-decision-making"&gt;Strategic Decision-Making&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Client scenario&lt;/strong&gt;: Should we build our ML platform in-house or use Databricks?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;My role&lt;/strong&gt;: Model TCO over 3 years, assess team capability, evaluate lock-in risk, make a recommendation, own the decision&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="organizational-design"&gt;Organizational Design&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Client scenario&lt;/strong&gt;: Our platform team is underwater. Do we hire more people or change scope?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;My role&lt;/strong&gt;: Audit current workload, identify low-value work, redesign team charter, help hire, establish OKRs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="technology-due-diligence"&gt;Technology Due Diligence&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Client scenario&lt;/strong&gt;: We&amp;rsquo;re evaluating an acquisition. Is their tech stack viable?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;My role&lt;/strong&gt;: Audit architecture, assess technical debt, estimate integration costs, flag deal-breakers&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="vendor-negotiation"&gt;Vendor Negotiation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Client scenario&lt;/strong&gt;: Our observability vendor wants to raise prices by 40%.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;My role&lt;/strong&gt;: Assess alternatives (including open-source), architect OTEL-based migration path, negotiate renewal from position of strength&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="risk-assessment"&gt;Risk Assessment&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Client scenario&lt;/strong&gt;: Are we ready to scale from 10K to 1M users?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;My role&lt;/strong&gt;: Identify failure modes, design chaos experiments, prioritize infrastructure investments, establish SLOs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;None of these are &amp;ldquo;projects.&amp;rdquo; They&amp;rsquo;re &lt;strong&gt;executive judgment calls&lt;/strong&gt; requiring pattern recognition across technology, business, and organizational dynamics.&lt;/p&gt;
&lt;h2 id="when-you-need-a-fractional-cto"&gt;When You Need a Fractional CTO&lt;/h2&gt;
&lt;p&gt;You need fractional CTO-level thinking when:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;You&amp;rsquo;re making multi-million-dollar technology bets&lt;/strong&gt;: Build vs. buy decisions, platform investments, major vendor commitments&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Technology is blocking business goals&lt;/strong&gt;: Sales is losing deals because your platform can&amp;rsquo;t scale. Marketing can&amp;rsquo;t launch campaigns because your data pipeline is broken.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You have a Director of Engineering but no CTO&lt;/strong&gt;: Your Director is great at execution but doesn&amp;rsquo;t have the scar tissue to make strategic calls&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You&amp;rsquo;re post-Series A, pre-full-time-CTO&lt;/strong&gt;: You&amp;rsquo;ve raised enough money that bad technology decisions are existential, but not enough to hire a $400K/year CTO&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Your current CTO is underwater&lt;/strong&gt;: They&amp;rsquo;re great technically but drowning in operations. They need strategic air cover.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="when-you-need-a-consultant"&gt;When You Need a Consultant&lt;/h2&gt;
&lt;p&gt;You need consultant-level execution when:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The problem is well-defined&lt;/strong&gt;: &amp;ldquo;Implement Kubernetes&amp;rdquo; is a consultant problem. &amp;ldquo;Should we use Kubernetes?&amp;rdquo; is a CTO problem.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You lack internal execution capacity&lt;/strong&gt;: Your team knows what to do but doesn&amp;rsquo;t have bandwidth&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You need deep domain expertise&lt;/strong&gt;: Migrating from Oracle to Postgres requires database specialists, not strategic thinking&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The engagement is time-bound&lt;/strong&gt;: Platform migration projects have clear start/end dates&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="why-i-do-both"&gt;Why I Do Both&lt;/h2&gt;
&lt;p&gt;The honest answer is that &lt;strong&gt;most fractional CTO engagements include consulting work&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Example: I&amp;rsquo;m advising a client on observability strategy (fractional CTO work). My recommendation is &amp;ldquo;adopt OpenTelemetry.&amp;rdquo; They ask: &amp;ldquo;Can you help implement it?&amp;rdquo; Now I&amp;rsquo;m doing consultant work within a CTO engagement.&lt;/p&gt;
&lt;p&gt;The difference is &lt;strong&gt;who owns the decision&lt;/strong&gt;. I own the strategy. I recommended OTEL because I believe it&amp;rsquo;s the right long-term architecture. If I were only a consultant, I&amp;rsquo;d implement whatever they asked for, even if I thought it was the wrong call.&lt;/p&gt;
&lt;h2 id="pricing-differences"&gt;Pricing Differences&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Consultants typically charge project-based or hourly:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;$200-$400/hour for technical consultants&lt;/li&gt;
&lt;li&gt;$50K-$200K for project engagements (migrations, implementations)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Fractional CTOs typically charge monthly retainers:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;$10K-$25K/month for roughly 2-3 days of attention, priced for the judgment and the ongoing relationship rather than the hours&lt;/li&gt;
&lt;li&gt;Comparable to a $300K-$500K fully-loaded full-time CTO, at a fraction of the cost&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The retainer model reflects &lt;strong&gt;always-on availability for critical decisions&lt;/strong&gt;. When your site goes down at 2 AM, I&amp;rsquo;m not going to invoice you for emergency advice. That&amp;rsquo;s included.&lt;/p&gt;
&lt;h2 id="real-world-hybrid-example"&gt;Real-World Hybrid Example&lt;/h2&gt;
&lt;p&gt;Current engagement with a mid-market SaaS company:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fractional CTO scope:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Own technology roadmap and alignment with product&lt;/li&gt;
&lt;li&gt;Advise on build vs. buy for analytics platform&lt;/li&gt;
&lt;li&gt;Participate in board meetings to report on engineering health&lt;/li&gt;
&lt;li&gt;Help recruit senior engineers&lt;/li&gt;
&lt;li&gt;15 hours/month retainer&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Consulting scope (within same engagement):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implement OpenTelemetry golden paths&lt;/li&gt;
&lt;li&gt;Design Kubernetes autoscaling strategy&lt;/li&gt;
&lt;li&gt;Conduct architecture review for resilience&lt;/li&gt;
&lt;li&gt;Billed hourly on top of retainer&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is common. The CTO work defines priorities. The consulting work executes them.&lt;/p&gt;
&lt;h2 id="the-wrong-hire-is-expensive"&gt;The Wrong Hire Is Expensive&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Hire a consultant when you need a CTO:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Your projects get delivered, but they don&amp;rsquo;t move the business forward&lt;/li&gt;
&lt;li&gt;You make technology decisions without considering long-term consequences&lt;/li&gt;
&lt;li&gt;You pay for implementation but not strategy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Hire a fractional CTO when you need a consultant:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You pay executive rates for work that could be done by a senior engineer&lt;/li&gt;
&lt;li&gt;The engagement drags because there&amp;rsquo;s not enough strategic decision-making to fill the retainer&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="bottom-line"&gt;Bottom Line&lt;/h2&gt;
&lt;p&gt;If you can write a scope of work with clear deliverables, hire a consultant.&lt;/p&gt;
&lt;p&gt;If your problem is &amp;ldquo;I don&amp;rsquo;t know what I don&amp;rsquo;t know,&amp;rdquo; hire a fractional CTO.&lt;/p&gt;
&lt;p&gt;Most organizations need both at different times. The key is knowing which hat you&amp;rsquo;re hiring for.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Not sure what you need?&lt;/strong&gt; &lt;a href="https://enduranceconsulting.com/#contact"&gt;Let&amp;rsquo;s talk&lt;/a&gt;. The first conversation is free, and I&amp;rsquo;ll tell you straight whether you need fractional leadership or project execution.&lt;/p&gt;</description></item><item><title>GenAI Observability: What to Measure When Your Product Uses LLMs</title><link>https://enduranceconsulting.com/blog/genai-observability-getting-started/</link><pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate><guid>https://enduranceconsulting.com/blog/genai-observability-getting-started/</guid><description>&lt;p&gt;After two quarters embedded with a Fortune 10 company&amp;rsquo;s Applied Machine Learning teams instrumenting GenAI workloads, I&amp;rsquo;ve seen the core mistake: &lt;strong&gt;most organizations instrument LLM applications like they&amp;rsquo;re REST APIs&lt;/strong&gt;. They&amp;rsquo;re not.&lt;/p&gt;
&lt;p&gt;Traditional observability (latency, error rate, throughput) tells you &lt;em&gt;that&lt;/em&gt; something broke. GenAI observability tells you &lt;em&gt;why your AI is failing to deliver value&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id="what-makes-genai-different"&gt;What Makes GenAI Different&lt;/h2&gt;
&lt;p&gt;When your API returns a 500 error, that&amp;rsquo;s unambiguous. When your LLM returns a response, you have no idea if it&amp;rsquo;s:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Correct&lt;/li&gt;
&lt;li&gt;Hallucinated&lt;/li&gt;
&lt;li&gt;Off-topic&lt;/li&gt;
&lt;li&gt;Biased&lt;/li&gt;
&lt;li&gt;Too expensive&lt;/li&gt;
&lt;li&gt;Too slow for the user&amp;rsquo;s patience&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You need different instrumentation.&lt;/p&gt;
&lt;h2 id="the-four-layers-of-genai-observability"&gt;The Four Layers of GenAI Observability&lt;/h2&gt;
&lt;h3 id="layer-1-infrastructure-metrics-table-stakes"&gt;Layer 1: Infrastructure Metrics (Table Stakes)&lt;/h3&gt;
&lt;p&gt;These are your traditional observability signals, adapted for LLM workloads:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Latency Metrics:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;TTFT (Time to First Token)&lt;/strong&gt;: How long before the user sees &lt;em&gt;something&lt;/em&gt;? This determines perceived performance.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tokens per Second&lt;/strong&gt;: Throughput rate during generation. Affects user patience.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total Request Duration&lt;/strong&gt;: End-to-end latency including prompt processing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost Metrics:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cost per Request&lt;/strong&gt;: Input tokens × price + output tokens × price. Track cached and uncached input separately, since prompt caching charges cached tokens at a fraction of the rate&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost per User Session&lt;/strong&gt;: Aggregated across multi-turn conversations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost per Feature&lt;/strong&gt;: Which parts of your product are burning money?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Throughput:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Requests per Second&lt;/strong&gt;: Standard, but important for capacity planning&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concurrent Requests&lt;/strong&gt;: How many LLM calls are in-flight?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Queue Depth&lt;/strong&gt;: Are you throttling before you hit provider rate limits?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="layer-2-llm-specific-signals"&gt;Layer 2: LLM-Specific Signals&lt;/h3&gt;
&lt;p&gt;This is where GenAI observability diverges from traditional monitoring:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Token Metrics:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Input Token Count&lt;/strong&gt;: How much context are you sending? Larger = slower + more expensive&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Output Token Count&lt;/strong&gt;: How verbose is your model? Can affect UX and cost&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cache Hit Rate&lt;/strong&gt; (if using prompt caching): Are you paying for redundant processing?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Behavior:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Temperature&lt;/strong&gt;: Are you using consistent sampling parameters?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Version&lt;/strong&gt;: Track which model version generated each response (for A/B testing and rollback)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retry Count&lt;/strong&gt;: How often are you retrying failed requests?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fallback Triggers&lt;/strong&gt;: When did you fall back from your primary model to a cheaper or backup one, and why?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Rate Limiting:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Rate Limit Hits&lt;/strong&gt;: How often are you throttled by your provider?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quota Exhaustion&lt;/strong&gt;: Are you hitting daily/monthly spending caps?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="layer-3-quality-signals"&gt;Layer 3: Quality Signals&lt;/h3&gt;
&lt;p&gt;Infrastructure can be perfect while your AI delivers garbage. You need quality metrics:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Response Quality (Automated):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Toxicity Score&lt;/strong&gt;: Are you generating harmful content?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Relevance Score&lt;/strong&gt;: Does the response match the prompt intent?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hallucination Detection&lt;/strong&gt;: Is the model making things up? (This is hard; more below)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PII Leakage&lt;/strong&gt;: Are you exposing sensitive data without realizing it?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Response Quality (Human-Labeled):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Thumbs Up/Down Ratios&lt;/strong&gt;: The simplest signal&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User Edits&lt;/strong&gt;: Did the user have to fix the output?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retry Rate&lt;/strong&gt;: Did the user regenerate the response?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Abandonment&lt;/strong&gt;: Did they give up and close the feature?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Prompt Engineering Effectiveness:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prompt Version&lt;/strong&gt;: Track which prompt template was used&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Few-Shot Example Count&lt;/strong&gt;: How many examples are you including?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RAG Context Size&lt;/strong&gt;: How much retrieved context are you injecting?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="layer-4-business-impact"&gt;Layer 4: Business Impact&lt;/h3&gt;
&lt;p&gt;The reason you&amp;rsquo;re building with LLMs is business value. Measure it:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;User Engagement:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Feature Adoption&lt;/strong&gt;: Are users using your AI features at all?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Session Length&lt;/strong&gt;: Does AI make users stick around longer?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Churn Impact&lt;/strong&gt;: Do AI users churn less?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Conversion:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AI-Assisted Conversions&lt;/strong&gt;: Did the LLM help close a sale?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Content Generation Volume&lt;/strong&gt;: For content- or code-generation products, output volume maps directly to revenue&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost-Benefit:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Revenue per Dollar Spent on LLMs&lt;/strong&gt;: Your AI P&amp;amp;L&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost per Value Delivered&lt;/strong&gt;: What&amp;rsquo;s the unit economics?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-you-can-instrument-today-vs-whats-hard"&gt;What You Can Instrument Today vs. What&amp;rsquo;s Hard&lt;/h2&gt;
&lt;h3 id="easy-wins-implement-these-first"&gt;Easy Wins (Implement These First)&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;TTFT and Tokens/s&lt;/strong&gt;: Every LLM provider returns timing data. Log it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost Tracking&lt;/strong&gt;: Token counts × pricing. Track by user, by feature, by model.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Version &amp;amp; Parameters&lt;/strong&gt;: Log which model you called and with what settings.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User Feedback&lt;/strong&gt;: Add thumbs up/down buttons. You&amp;rsquo;d be shocked how few products do this.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="medium-difficulty"&gt;Medium Difficulty&lt;/h3&gt;
&lt;ol start="5"&gt;
&lt;li&gt;&lt;strong&gt;Prompt Versioning&lt;/strong&gt;: Treat prompts like code. Version them, deploy them, track which version served each request.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RAG Observability&lt;/strong&gt;: If you&amp;rsquo;re doing retrieval, log what you retrieved, how relevant it was, and whether it made it into the final response.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Trace Context&lt;/strong&gt;: Use OpenTelemetry to connect your LLM call to the upstream request. When a user complains, you can trace back through your entire stack.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="hard-problems"&gt;Hard Problems&lt;/h3&gt;
&lt;ol start="8"&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hallucination Detection&lt;/strong&gt;: There&amp;rsquo;s no silver bullet. You need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fact-checking against known sources (expensive)&lt;/li&gt;
&lt;li&gt;Consistency checks across multiple generations (slow)&lt;/li&gt;
&lt;li&gt;Human labeling of a sample (doesn&amp;rsquo;t scale)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Semantic Quality&lt;/strong&gt;: &amp;ldquo;Is this response helpful?&amp;rdquo; is subjective. You&amp;rsquo;ll need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM-as-judge (use a second model to grade the first; yes, really)&lt;/li&gt;
&lt;li&gt;Human eval loops (label a sample, train a classifier)&lt;/li&gt;
&lt;li&gt;User behavior proxies (did they edit it? regenerate? abandon?)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="openllmetry-the-standard-thats-emerging"&gt;OpenLLMetry: The Standard That&amp;rsquo;s Emerging&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/traceloop/openllmetry" target="_blank" rel="noopener"&gt;OpenLLMetry&lt;/a&gt; is to LLM observability what OpenTelemetry is to traditional observability: a vendor-neutral instrumentation standard.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s built on top of OpenTelemetry and adds semantic conventions for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM provider and model name/version&lt;/li&gt;
&lt;li&gt;Token counts (prompt, completion, and cached)&lt;/li&gt;
&lt;li&gt;Cost&lt;/li&gt;
&lt;li&gt;Prompt and completion payloads (with PII redaction)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Worth knowing where this is heading: OpenLLMetry pioneered these conventions, and the OpenTelemetry GenAI working group is now standardizing the same ground upstream as &lt;code&gt;gen_ai.*&lt;/code&gt; semantic conventions. Instrumenting on OTEL today keeps you aligned with where the ecosystem is converging, not locked to one vendor&amp;rsquo;s schema.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re starting from scratch, &lt;strong&gt;use OpenLLMetry&lt;/strong&gt;. It gives you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Vendor portability (swap observability backends without reinstrumenting)&lt;/li&gt;
&lt;li&gt;Ecosystem compatibility (works with Datadog, Honeycomb, Grafana, etc.)&lt;/li&gt;
&lt;li&gt;Future-proofing (as the space matures, tooling will standardize on OTEL)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="real-world-implementation-what-i-built-at-fortune-10-scale"&gt;Real-World Implementation: What I Built at Fortune 10 Scale&lt;/h2&gt;
&lt;p&gt;I can&amp;rsquo;t share specifics, but the architecture was:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;OpenLLMetry SDK&lt;/strong&gt; wrapping LLM provider calls&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OTEL Collector&lt;/strong&gt; for aggregation, sampling, and enrichment&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Trace Backend&lt;/strong&gt; (vendor withheld) for storage and visualization&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Custom Dashboards&lt;/strong&gt; showing:
&lt;ul&gt;
&lt;li&gt;TTFT P50/P95/P99 by model and feature&lt;/li&gt;
&lt;li&gt;Cost burn rate and projections&lt;/li&gt;
&lt;li&gt;Token usage patterns (prompt size vs. output size)&lt;/li&gt;
&lt;li&gt;Model version distribution&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Alerting&lt;/strong&gt; on:
&lt;ul&gt;
&lt;li&gt;TTFT degradation (user experience)&lt;/li&gt;
&lt;li&gt;Cost spikes (budget protection)&lt;/li&gt;
&lt;li&gt;Error rate increases (availability)&lt;/li&gt;
&lt;li&gt;Rate limit hits (capacity planning)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Result: &lt;strong&gt;Engineering teams could diagnose production AI issues in minutes instead of days&lt;/strong&gt;, and we caught a $50K/month cost leak from a prompt that was including full document context on every call.&lt;/p&gt;
&lt;h2 id="advice-for-teams-starting-out"&gt;Advice for Teams Starting Out&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Start simple:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Log TTFT and cost for every LLM call&lt;/li&gt;
&lt;li&gt;Add thumbs up/down feedback&lt;/li&gt;
&lt;li&gt;Set a budget alert&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Iterate toward quality:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Version your prompts&lt;/li&gt;
&lt;li&gt;A/B test prompt variations&lt;/li&gt;
&lt;li&gt;Sample and label responses for quality&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Invest in infra when it pays:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you&amp;rsquo;re spending $10K/month on LLMs, you can afford manual tracking&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re spending $100K/month, you need automated observability&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re spending $1M/month, you need a dedicated AI observability platform&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-tooling-landscape"&gt;The Tooling Landscape&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Open Source:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenLLMetry&lt;/strong&gt;: OTEL-native instrumentation SDK&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Langfuse&lt;/strong&gt;: Tracing, evals, and prompt management (self-hostable)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Phoenix&lt;/strong&gt; (Arize): LLM tracing and evaluation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Commercial:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;LangSmith&lt;/strong&gt;: Tracing and eval platform from the LangChain team (works beyond LangChain)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Braintrust&lt;/strong&gt;: Eval-focused, strong for LLM-as-judge and regression testing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Honeycomb&lt;/strong&gt;: General-purpose tracing with LLM support&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Datadog&lt;/strong&gt;: APM expanding into LLM observability&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arize AI&lt;/strong&gt;: Purpose-built for ML/LLM monitoring&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Helicone&lt;/strong&gt;: Cost tracking and prompt management&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Avoid building from scratch&lt;/strong&gt; unless you&amp;rsquo;re Netflix-scale. The tooling is maturing fast.&lt;/p&gt;
&lt;h2 id="bottom-line"&gt;Bottom Line&lt;/h2&gt;
&lt;p&gt;GenAI observability is how you keep AI products from bleeding money or delivering garbage. Start with infrastructure metrics, add quality signals, then tie it to business impact.&lt;/p&gt;
&lt;p&gt;And for the love of all that&amp;rsquo;s holy, &lt;strong&gt;instrument your prompts&lt;/strong&gt;. If you don&amp;rsquo;t know which prompt template generated a bad response, you can&amp;rsquo;t fix it.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Building an LLM-powered product and not sure what to instrument?&lt;/strong&gt; I&amp;rsquo;ve done this at Fortune 10 scale and scrappy startup scale. &lt;a href="https://enduranceconsulting.com/#contact"&gt;Let&amp;rsquo;s talk&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>OpenTelemetry Adoption: Why Most Organizations Get It Wrong</title><link>https://enduranceconsulting.com/blog/opentelemetry-adoption-strategy/</link><pubDate>Wed, 15 Apr 2026 00:00:00 +0000</pubDate><guid>https://enduranceconsulting.com/blog/opentelemetry-adoption-strategy/</guid><description>&lt;p&gt;After leading observability rationalization projects at Fortune 500 companies and serving as the OpenTelemetry SME at Groundcover, I&amp;rsquo;ve watched dozens of organizations attempt OTEL adoption. Most start with the wrong assumptions.&lt;/p&gt;
&lt;h2 id="the-common-mistake-big-bang-migration"&gt;The Common Mistake: Big Bang Migration&lt;/h2&gt;
&lt;p&gt;Organizations treat OpenTelemetry adoption like a database migration: plan everything, execute once, celebrate. This fails because:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Legacy instrumentation has institutional knowledge baked in&lt;/strong&gt;: your existing dashboards and alerts encode years of production incidents&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vendor SDKs are wired into everything&lt;/strong&gt;: ripping them out breaks more than observability&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Teams resist learning curves&lt;/strong&gt;: especially when the old tools &amp;ldquo;work fine&amp;rdquo;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="the-better-approach-strategic-layering"&gt;The Better Approach: Strategic Layering&lt;/h2&gt;
&lt;p&gt;Instead of replacing existing observability, &lt;strong&gt;add OpenTelemetry as a data layer&lt;/strong&gt;:&lt;/p&gt;
&lt;h3 id="phase-1-new-services-only-months-1-3"&gt;Phase 1: New Services Only (Months 1-3)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Mandate OTEL for all new microservices&lt;/li&gt;
&lt;li&gt;Create golden path instrumentation templates&lt;/li&gt;
&lt;li&gt;Export telemetry in parallel to both your OTEL collector and the legacy backend&lt;/li&gt;
&lt;li&gt;Build confidence without breaking production&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="phase-2-edge-and-mesh-instrumentation-months-3-6"&gt;Phase 2: Edge and Mesh Instrumentation (Months 3-6)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Instrument API gateways, load balancers, and (if you run one) the service mesh with OTEL&lt;/li&gt;
&lt;li&gt;Capture entry-point and hop-level spans without touching application code&lt;/li&gt;
&lt;li&gt;Push W3C trace-context propagation inward so those spans start linking into end-to-end traces, instead of staying isolated at the edge&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="phase-3-high-value-migrations-months-6-12"&gt;Phase 3: High-Value Migrations (Months 6-12)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Migrate services where legacy observability is painful&lt;/li&gt;
&lt;li&gt;Target the services that deploy often, break often, and have the worst instrumentation&lt;/li&gt;
&lt;li&gt;Show ROI: faster troubleshooting, lower vendor costs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="phase-4-observability-pipeline-months-12"&gt;Phase 4: Observability Pipeline (Months 12+)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Introduce OTEL collector pipelines for sampling, enrichment, routing&lt;/li&gt;
&lt;li&gt;Route less critical data to cheaper backends&lt;/li&gt;
&lt;li&gt;Keep high-value data in premium tools&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-this-works"&gt;Why This Works&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;It separates data collection from data destinations.&lt;/strong&gt; Your teams adopt OTEL instrumentation without also migrating dashboards, alerts, and runbooks. You de-risk the transition.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;It proves value early.&lt;/strong&gt; By month 4 you have distributed traces across your instrumented paths and edge: request flows most orgs can&amp;rsquo;t see today, and you got there without migrating a single dashboard or alert.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;It strengthens your hand at renewal.&lt;/strong&gt; When your observability vendor sees OpenTelemetry collectors in production, renewal conversations get interesting. You&amp;rsquo;re no longer locked in.&lt;/p&gt;
&lt;h2 id="the-build-vs-buy-decision"&gt;The Build vs. Buy Decision&lt;/h2&gt;
&lt;p&gt;The dirty secret is that &lt;strong&gt;OpenTelemetry is infrastructure, not a product&lt;/strong&gt;. You still need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Trace storage and query engines&lt;/li&gt;
&lt;li&gt;Dashboarding and visualization&lt;/li&gt;
&lt;li&gt;Alerting and incident management&lt;/li&gt;
&lt;li&gt;SLO tracking and error budgets&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some organizations build these (Uber, Netflix). Most should buy (Honeycomb, Grafana, Datadog&amp;rsquo;s OTEL support).&lt;/p&gt;
&lt;p&gt;The win isn&amp;rsquo;t avoiding vendors. It&amp;rsquo;s &lt;strong&gt;avoiding vendor lock-in&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="real-world-outcomes"&gt;Real-World Outcomes&lt;/h2&gt;
&lt;p&gt;At a Fortune 500 semiconductor manufacturer, we:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduced observability spend by 40% over 18 months&lt;/li&gt;
&lt;li&gt;Cut MTTR by 25% through distributed tracing&lt;/li&gt;
&lt;li&gt;Migrated 60% of critical services without a single SEV-1 incident&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At a major financial services firm:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Consolidated 9 observability vendors down to 3&lt;/li&gt;
&lt;li&gt;Improved signal-to-noise ratio (alert fatigue dropped by 50%)&lt;/li&gt;
&lt;li&gt;Gave teams flexibility to choose backends based on use case&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="where-organizations-need-help"&gt;Where Organizations Need Help&lt;/h2&gt;
&lt;p&gt;Most engineering teams can instrument a service with OTEL. What&amp;rsquo;s hard:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Sampling strategy&lt;/strong&gt;: head-based sampling is cheap but blind; tail-based sampling keeps the errored and slow traces that matter, but forces you to buffer every span until the trace completes. Tuning that tradeoff at scale is the hard part.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Collector pipelines&lt;/strong&gt;: enrichment, routing, backpressure handling, failover&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Schema governance&lt;/strong&gt;: preventing instrumentation drift across 200 microservices&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Organizational change&lt;/strong&gt;: getting SREs to trust new tooling&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is where fractional leadership pays off. You need someone who&amp;rsquo;s done this before, can architect the strategy, and doesn&amp;rsquo;t need 6 months to understand your infrastructure.&lt;/p&gt;
&lt;h2 id="bottom-line"&gt;Bottom Line&lt;/h2&gt;
&lt;p&gt;OpenTelemetry adoption is an organizational transformation disguised as a technical problem. Treat it like one.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Need help with your observability strategy?&lt;/strong&gt; I&amp;rsquo;ve led these transformations at enterprise scale. &lt;a href="https://enduranceconsulting.com/#contact"&gt;Let&amp;rsquo;s talk&lt;/a&gt;.&lt;/p&gt;</description></item></channel></rss>