.NET外挂系列:7. harmony在高级调试中的一些实战案例

360影视 动漫周边 2025-05-25 08:42 2

摘要:如果你读完前六篇,我相信你对 harmony 的简单使用应该是没什么问题了,现在你处于手拿锤子看谁都是钉子的情况,那这篇我就找高级调试里非常经典的3个钉子让大家捶一锤。1. ConcurrentBag 大集合问题在高级调试中经常会遇到一类问题就是托管内存暴涨,

如果你读完前六篇,我相信你对 harmony 的简单使用应该是没什么问题了,现在你处于手拿锤子看谁都是钉子的情况,那这篇我就找高级调试里非常经典的3个钉子让大家捶一锤。1. ConcurrentBag 大集合问题在高级调试中经常会遇到一类问题就是托管内存暴涨,最终在托管堆上发现了超大的一个集合,windbg 输出如下:
0:014> !gcroot028266c9ff30
HandleTable:
0000028262d51328 (strong handle)
->0282675459a0 System.Object
->0282675459c8 System.Threading.ThreadLocal+WorkStealingQueue>+LinkedSlotVolatile
->028267545a00 System.Threading.ThreadLocal+WorkStealingQueue>+LinkedSlot
->028267545a30 System.Collections.Concurrent.ConcurrentBag+WorkStealingQueue
->028267fe0198 Example_20_1_1.Student
->028266c9ff30 Example_20_1_1.Student

0:014> !dumpobj /d28267545a30
Name: System.Collections.Concurrent.ConcurrentBag`1+WorkStealingQueue[[Example_20_1_1.Student, Example_20_1_1]]
File: C:\Program Files\dotnet\shared\Microsoft.NETCore.App\8.0.13\System.Collections.Concurrent.dll
Fields:
MT Field Offset Type VT Attr Value Name
00007fff8fc31188400001718System.Int321instance0_headIndex
00007fff8fc3118840000181c System.Int321instance1000000_tailIndex
00007fff8fe7616840000198System.__Canon0instance0000028267fe0198 _array
...

从windbg的输出中可以看到ConcurrentBag中有100w条记录,现在我就特别想知道,这个ConcurrentBag的变量是什么,谁在不断的Add操作?这刚好是 harmony 的大显神威之处,由于引用类型的泛型参数统一由__Canon替代,这里我就使用它的基类 object,参考代码如下:
namespace Example_20_1_1
{
internalclassProgram
{
static void Main(string args)
{
var harmony = new Harmony("com.example.threadhook");
harmony.PatchAll;

RunWork;

Console.ReadLine;
}

static void RunWork
{
ConcurrentBag studentBags = new ConcurrentBag;

studentBags.Add(new Student { Id =1});
2});

ConcurrentBag personBags = new ConcurrentBag;
personBags.Add(new Person { Id =1});
}
}

[HarmonyPatch(typeof(ConcurrentBag), "Add", new Type { typeof(object) })]
publicclassConcurrentBagHook
{
public static void Prefix(object __instance) { }

public static void Postfix(object __instance, object __0)
{
var count = Traverse.Create(__instance).Property("Count").GetValue;
Console.WriteLine($"泛型参数:{__0.GetType},当前Count={count}");
Console.WriteLine(Environment.StackTrace);
}
}

publicclassStudent { publicint Id { get; set; } }

publicclassPerson { publicint Id { get; set; } }
}

从卦中可以看到不同类型的ConcurrentBag的集合元素数,以及对应的上层调用栈,根据调用栈自然就能找到问题,即使它是在第三方sdk中。2. 非主线程创建UI控件导致卡死这个问题是wpf/winform常遇到的经典问题,介绍的再多也不为过,凡是遇到这种经典都会有这样的调用栈。
0:000:x86> !clrstack
OS Thread Id:0x4eb688(0)
Child SP IP Call Site
002fed380000002b[HelperMethodFrame_1OBJ:002fed38] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean)
002fee1c5cddad21 System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)
002fee345cddace8 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
002fee48538d876c System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
002fee8853c5214a System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object, Boolean)
002fee8c538dab4b [InlinedCallFrame:002fee8c]
002fef14538dab4b System.Windows.Forms.Control.Invoke(System.Delegate, System.Object)
002fef4853b03bc6 System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback, System.Object)
002fef605c774708 Microsoft.Win32.SystemEvents+SystemEventInvokeInfo.Invoke(Boolean, System.Object)
002fef945c6616ec Microsoft.Win32.SystemEvents.RaiseEvent(Boolean, System.Object, System.Object)
002fefe85c660cd4 Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(Int32, IntPtr, IntPtr)
002ff0085c882c98 Microsoft.Win32.SystemEvents.WindowProc(IntPtr, Int32, IntPtr, IntPtr)
...

底层原理我在这一篇中跟大家详细聊过,这里就不细说了,在这里我只要追踪到那个不该出生的control 就算赢了,即Application下的内部类 MarshalingControl,参考代码如下:
namespace WindowsFormsApp1
{
publicpartialclassForm1 : Form
{
public Form1
{
InitializeComponent;

var harmony = new Harmony("com.example.marshalingcontrolhook");
harmony.PatchAll;
}

private void Form1_Load(object sender, EventArgs e) { }

private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
Button btn = new Button;
var query = btn.Handle;
}

private void button1_Click(object sender, EventArgs e)
{
backgroundWorker1.RunWorkerAsync;
}
}

[HarmonyPatch]
publicclassMarshalingControlHook
{
[HarmonyTargetMethod]
static MethodBase TargetMethod
{
var methodInfo = AccessTools.Inner(typeof(Application), "MarshalingControl").Constructor;

return methodInfo;
}

public static void Prefix
{
Debug.WriteLine("");
Debug.WriteLine($"控件创建线程:{Thread.CurrentThread.ManagedThreadId}");
Debug.WriteLine(Environment.StackTrace);

}
}
}

从卦中可以轻松的看到,原来是用户代码backgroundWorker1_DoWork创建的 MarshalingControl 类,自此真相大白。3. 孤儿锁问题在大家的潜意识中都会认为lock锁都是有进有出,但在真实的场景下也会存在有进没出的情况,那是什么场景呢?对,就是 lock 处理非托管代码的时候,如果非托管代码意外让当前线程退出,就会遇到这种经典的孤儿锁现象,参考代码如下:
internalclassProgram
{
[DllImport("Example_20_1_5", CallingConvention = CallingConvention.Cdecl)]
public extern static void dowork;

publicstaticobject lockMe = newobject;

static void Main(string args)
{
var harmony = new Harmony("com.example.monitorhook");
harmony.PatchAll;

for (int i =0; i 3; i++)
{
Task.Run( =>
{
lock (lockMe)
{
Console.WriteLine("1. 调用 C++ 代码...");
dowork;
Console.WriteLine("2. C++ 代码执行完毕...");
}
});
}

Console.ReadLine;
}
}

代码中的 dowork 是由 C 实现的,参考如下:


extern "C"
{
_declspec(dllexport) void dowork;
}

#include "iostream"
#include

usingnamespacestd;

void dowork
{
ExitThread(0);
}

启动程序后,你会发现!syncblk中对object的持有线程丢了。。。一旦丢失,就会污染object的对象头,导致其他线程一直等待持有线程的释放,最终引发程序卡死的灾难后果,参考如下:
0:008> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
102D562D45102D4D4000XXX0504c1b0 System.Object

Total2
CCW0
RCW0
ComClassFactory0
Free0

上面的XXX就是丢失的持有线程,接下来的问题就是洞察到底是哪个线程持有锁之后意外退出了。。。这也是 harmony 的强项,我们对 lock 的底层Monitor.Enter进行监控,通过的内存地址观察当初是谁调用的,修改后的完整代码如下:
internalclassProgram
{
[DllImport("Example_20_1_5", CallingConvention = CallingConvention.Cdecl)]
public extern static void dowork;

publicstaticobject lockMe = newobject;

static void Main(string args)
{
var harmony = new Harmony("com.example.monitorhook");
harmony.PatchAll;

for (int i =0; i 3; i++)
{
Task.Run( =>
{
lock (lockMe)
{
Console.WriteLine("1. 调用 C++ 代码...");
dowork;
Console.WriteLine("2. C++ 代码执行完毕...");
}
});
}

Console.ReadLine;
}
}

[HarmonyPatch]
publicclassMonitorHook
{
[HarmonyTargetMethod]
static MethodBase TargetMethod
{
var enterMethodInfo = AccessTools.Method(typeof(Monitor), "Enter", new { typeof(object), typeof(bool).MakeByRefType });

return enterMethodInfo;
}

public static unsafe void Postfix(object obj)
{
void** ptr = (void**)Unsafe.AsPointer(ref obj);

//注意:不要使用带 lock 的底层方法,否则会导致 死循环,建议将内容通过 c++ 写入。
Debug.WriteLine("");
Debug.WriteLine($"对象引用地址: 0x{(long)(*ptr):X8}, tid={Thread.CurrentThread.ManagedThreadId}, 调用栈:\n{Environment.StackTrace}");

}
}

程序执行后,观察 output 和 windbg 的输出信息,参考如下:



对象引用地址:0x057CCFD8, tid=4, 调用栈:
at System.Environment.get_StackTrace
at Example_20_1_1.MonitorHook.Postfix(Object obj) in D:\skyfly\20.20250116\src\Example\Example_20_1_1\Program.cs:line61
at System.Threading.Monitor.Enter_Patch1(Object obj, Boolean& lockTaken)
at Example_20_1_1.Program.c.b__2_0 in D:\skyfly\20.2025011631
at System.Threading.Tasks.Task.InnerInvoke
at System.Threading.Tasks.Task.c.b__281_0(Object obj)
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
at System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread threadPoolThread)
at System.Threading.Tasks.Task.ExecuteFromThreadPool(Thread threadPoolThread)
at System.Threading.ThreadPoolWorkQueue.Dispatch
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart
at System.Threading.Thread.StartCallback

0:008> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
5 0AE9018451 035EC5780XXX057ccfd8 System.Object

Total6
CCW0
RCW0
ComClassFactory0
Free0

根据上面调用栈的输出结果,原来这个057ccfd8的 object 是由b__2_0方法调用的,在真实场景中可能有多处,不过此时我们把范围已经缩小到了极致。

这里还有一个告警点,即我用了 Debug.WriteLine 而没有使用 Console.WriteLine 是因为后者本身就带有锁,使用的话就直接死循环了,建议大家写一个C的导出函数来输出内容。

本篇列出的3个案例在.NET高级调试领域中还是非常经典的,如果用的合适,相信对你找出程序的疑难杂症事半功倍。

来源:opendotnet

相关推荐