GAC Provider

Topics: Developer Forum, Project Management Forum
Developer
Feb 15, 2007 at 10:48 PM
I have one more thing to do which shouldn't take any more than an hour or two, but I've got a fully functioning GAC provider working now.

It initializes three drives - gac: , download: and ngen:

wildcards, tab-completion etc all work. the base path unit is a simple name like "system.web" or "system*" etc.

I had tagged this for 1.2, but it looks like it could go into 1.1 now.

Thoughts?
Developer
Feb 15, 2007 at 10:50 PM
btw, it's checked in so you can test it now. Jachym, I know you might get that refactor itch, but please resist for the moment ;)
Developer
Feb 16, 2007 at 4:30 AM
ok, i'll try :) it's not easy, you know ;))

btw this assumption is not true:

        // NOTE: only one provider will ever be using
        // this class simultaneously.
        internal static Action<string> Progress = delegate { };
there can be many runspaces, which means many provider instances in a single app domain, accessing your static variable without any synchronization.
i also think you better not leak the AssemblyCacheProvider instance(s) from Gen 0. the provider objects are stateless and are created very often. therefore, there shouldn't be any GC roots pointing to them. it's not such a problem by itself, it's just a coding practice which could lead to an ugly memory leak later.
Developer
Feb 16, 2007 at 4:33 AM
also, you shouldn't be calling if(Stopping) { StopProcessing(); }. PowerShell will call it for you. You should just stop whatever you are doing.
Developer
Feb 16, 2007 at 4:44 AM
I've already merged this into 1.1 now ; I'll work on the fix for synchronization tomorrow. I've also merged in a changeset that lets Resolve-Assembly deal with ICollection<AssemblyName> objects from the provider (trust me - there's a good reason for this).

WRT to the Stopping code, AFAICT it is needed. Look at the native FileSystem provider with Reflector - their code is full of these checks. The point is that ctrl+c won't work if the provider is busy inside a loop, which can happen with a dir gac:\ call for example which takes a few seconds to complete.
Developer
Feb 16, 2007 at 5:10 AM
you need to exit the loop, but you should not call StopProcessing. I believe it will be called by powershell when you return from the currently executing method.

WRT the synchronization: I dont think a lock() around the static variable is a good idea. I have the same issue with the DirectoryServices provider. I'll check in a PscxProviderContext<T> class, which will be a simple wrapper around a thread-static variable. You will only wrap your GetChildItems/GetChildNames methods in a using(PscxProviderContext<AssemblyCacheProvider>.Enter(this)) call, and you can safely access the current provider object from other classes. Without any locking!
Developer
Feb 16, 2007 at 5:19 AM
It's there (1.1\PscxCore\Providers). If you use it, merge it into the trunk; otherwise I'll do it tomorrow.
Developer
Feb 16, 2007 at 4:11 PM
ok, thanks man. I wasn't aware that the providers were loaded in and out a lot like that. I figured there was only once instance and everything would be serialized through that. I guess I'm thinking along "old-school shell" lines again... I must stop that. It'll get me in trouble :)

Since this is a multithreaded environment, with a maximum of one instance of provider per-thread, this makes me think I should probably mark the static delegate with ThreadStatic too.
Developer
Feb 16, 2007 at 6:03 PM
Edited Feb 16, 2007 at 6:06 PM
ok, having sat and thought a while about the PscxProviderContext pattern and drank a coffee, I see the light. Correct me if I'm wrong:

The entry points at risk are all methods called by the powershell engine, e.g. getitem, getchilditems, getchildnames etc -- as long as these methods access "this" through the wrapper, we're all good. Right? e.g.

 
protected override void GetItem(string path)
        {
            using (PscxProviderContext<AssemblyCacheProvider>.Enter(this))
            {
                AssemblyCacheProvider self = PscxProviderContext<AssemblyCacheProvider>.Current;
 
                self.WriteDebug("GetItem: " + path);
 
                AssemblyCacheType root = self.GetRoot(path);
                AssemblyNameCache cache = self.ProviderInfo.GetCache(root);
                ...

I'm going to check this into the trunk now -- it'd be great if you could cast your eye over it and let me know of more things to do. I'd also really appreciate some help with how best to approach the path stuff, e.g. supporting provider qualified paths like dir AssemblyCache::gac\system.*

Since learning about the multiple instances of a single provider issue, I guess we'll have to come up with a better plan for the AssemblyNameCaches - currently they're members on the providerinfo which made sense only when there's a single provider in memory. Hmmm.

As always, your insights and help are greatly appreciated.
Developer
Feb 16, 2007 at 6:39 PM
Edited Feb 16, 2007 at 6:39 PM
This doesn't solve the gc root issue though, right? The static delegate will prevent collection of any providers, regardless of how thread safe they are. Nggghhhh.. I've not had to deal with threading issues in about 5 years since I last worked in C. Managed code makes you soft...

[C:\]
PS > [system.threading.Thread]::CurrentThread.ManagedThreadId
25
[C:\]
PS > [system.threading.Thread]::CurrentThread.ManagedThreadId
20

it's all so obvious in retrospect...
Developer
Feb 16, 2007 at 11:06 PM
Edited Feb 16, 2007 at 11:13 PM
there can be many runspaces, which means many provider instances in a single app domain, accessing your static variable without any synchronization

hmm, call me crazy but after spending some time with a memory profiler and windbg, I cannot find any cases where more than a single instance of the provider is instantiated in powershell. I created a new runspace/pipeline, added a "get-childitem" command with parameters ("path", "gac:") and invoked it. I did before and after snapshots with scitech memory profiler and there was still only one instance of the provider. I tried various combinations of pipes etc, etc.

Where exactly did you read that there can be multiple instances of a provider, and how exactly can I repro these conditions?
Developer
Feb 16, 2007 at 11:14 PM
ok, never mind -- I stressed it some more and managed to get 4 simultaneous instances in play... w0000

I'm sorry I doubted you.

hangs head in shame
Developer
Feb 17, 2007 at 4:55 AM
man, I'm having a great conversation here with myself.

Anyway, I'm refactoring this again because with multiple provider instances, the assemblyname cache needs a redesign.
Developer
Feb 17, 2007 at 9:02 PM
Edited Feb 17, 2007 at 9:03 PM
I am deeply sorry. I did it again :) Please get my "assembly cache provider refactoring" shelveset.

You obviously don't need to use the "self" thing inside the provider, as the context class returns the very same object you gave it one statement before (which is "this").
It also makes sense to share the assembly name cache between providers (runspaces), since it represents machine-wide state and therefore there's no point in caching it per-runspace.
I removed the Progress event and I'm using the provider instance directly. Also removed the WriteProgressDirect hack, since we have now the actual provider instance available.
I temporarily removed the cache refreshing on -Force, because we specify -Force in our dir function. We need to find another way of refreshing the cache.
I also removed the default download and ngen drives. Te download name is very very misleading. I also don't think these two drives add much value, since you can do nothing with their content.
Developer
Feb 17, 2007 at 11:14 PM
no problem man, I look forward to reading through it.

WRT the assemblynamecache being machine-wide state, that was always the intention. If your remember, I thought there was only once instance of the provider in memory at any time. Of course, when I saw this wasn't the case, the next step was to lock it down as a proper shared instance. I haven't examined your code yet (nor can I, as I'm not on my own machine) but I was going down the path of using a ReaderWriterLock to synchronize a cache singleton. Anyhow, I'll have a look tomorrow.

I'm not sure I understand the PscxProviderContext<T> pattern properly -- just looking at it, it looks like it provides a ThreadStatic reference to "this", but if I don't access self through the .Current property, how is access serialized? Is there somewhere I can read more about this pattern? I'm not the stronging MT programmer, but I'd be interested to learn more about this particular pattern.

Thanks!
Developer
Feb 17, 2007 at 11:17 PM
WRT to ngen/dowload psdrives, that makes sense. I still want to allow provider qualified access though, e.g.

dir assemblycache::ngen\* where ngen,zap and gac are roots.

Developer
Feb 17, 2007 at 11:57 PM
We don't need nor want to serialize the access to the provider object, since it is used only on the runspace thread. ThreadStatic attribute means the value is stored in a TLS slot. I'm afraid I don't know about any description. But it's pretty much the same as ASP.NET's HttpContext.Current and others.

I don't think a reader-writer lock is required. The Refresh method creates a new MultiDictionary, and assigns it to the shared variable when complete. There's only a few seconds of a race condition, and it's very unlikely you'd want to refresh the cache from two runspaces simultaneously. The worst what could happen are two threads concurrently enumerating the GAC, and discarding but the last results.
Developer
Feb 18, 2007 at 11:05 PM
aha, I understand now that I've read through the shelveset. :) That's a really nice way of sharing the context, never seen it before. Man, so much to learn. ;)

Developer
Feb 19, 2007 at 8:14 PM
Edited Feb 19, 2007 at 9:17 PM
BTW, When a GetItem call resolves to more than one item, you should not write out the items individually. The established pattern for providers is that you write out the Collection object in a single WriteItemObject. It's GetItem, not GetItems. Powershell will automagically enumerate the collection for you if you want to view it. GetItem output piped to another command should only cause a single ProcessRecord call in the receiving cmdlet.

With the recent refactoring, (gi gac:).PsIsContainer evaluates to False, which is incorrect. Using wildcards also results in duplicate items written to pipeline:

[Gac:\]
PS > dir system.?eb
 
Version        Name
-------        ----
2.0.0.0        System.Web
1.0.3300.0     System.Web
1.0.5000.0     System.Web
2.0.0.0        System.Web
1.0.3300.0     System.Web
1.0.5000.0     System.Web
2.0.0.0        System.Web
1.0.3300.0     System.Web
1.0.5000.0     System.Web

I'm fixing up this and will backport (eeeek) it to 1.1
Developer
Feb 19, 2007 at 8:28 PM
Edited Feb 19, 2007 at 8:33 PM
(pointless addendum deleted)
Developer
Feb 19, 2007 at 11:11 PM
are you sure about the WriteItemObject? The FileSystem and Registry providers are writing each item in a separate call:

foreach (FileSystemInfo info1 in list1)
{
    if (base.Stopping)
    {
          return;
    }
    if (((info1.Attributes & FileAttributes.Hidden) == ((FileAttributes) 0)) || ((bool) base.Force))
    {
          if (nameOnly)
          {
                base.WriteItemObject(info1.Name, info1.FullName, false);
          }
          else
          {
                if (info1 is FileInfo)
                {
                      base.WriteItemObject(info1, info1.FullName, false);
                      continue;
                }
                base.WriteItemObject(info1, info1.FullName, true);
          }
    }
}
Developer
Feb 19, 2007 at 11:48 PM
I'm not sure which Method you're dumping there, but I'm talking about GetItem -- here's the FileSystem provider's GetItem method:

protected override void GetItem(string path)
{
      using (IDisposable disposable1 = FileSystemProvider.tracer.TraceMethod(path, new object[0]))
      {
            bool flag1 = false;
            if (string.IsNullOrEmpty(path))
            {
                  throw FileSystemProvider.tracer.NewArgumentException("path");
            }
            try
            {
                  FileSystemInfo info1 = this.GetFileSystemItem(path, ref flag1, false);
                  if (info1 != null)
                  {
                       *base.WriteItemObject(info1, info1.FullName, flag1);*
                  }
                  else
                  {
                        string text1 = ResourceManagerCache.FormatResourceString("FileSystemProviderStrings", "ItemNotFound", new object[] { path });
                        Exception exception1 = new IOException(text1);
                        base.WriteError(new ErrorRecord(exception1, "ItemNotFound", ErrorCategory.ObjectNotFound, path));
                  }
            }
            catch (IOException exception2)
            ...

E.g. it dumps a single item.

Anyway, you should really be looking at the SessionStateProviderBase (environment,variable,function) for an example as they are based on ContainerCmdletProvider, just like the GAC provider is. NavigableCmdletProviders (like filesystem/registry) contain subcontainers; the GAC does not. My model is based on exactly how SessionStateProviderBase does things.

Also, things are a little different too since the relationship between Path and AssemblyName is technically "one to many", but I must treat it as "one to one," hence a path maps to either a single assemblyname, or a single collection.

Have a good look through the environment provider, it seems to be the simplest ContainerCmdletProvider implementation.
Developer
Feb 20, 2007 at 12:10 AM
Aha! I was thinking more in the context of the directory services provider. However, it seems GetItem returns a collection when required, and GetChildItems/GetChildNames always return single items. This is the SessionStateProviderBase.GetChildItems:

IDictionary dictionary1 = null;
try
{
    dictionary1 = this.GetSessionStateTable();
}
catch { /***/ }
 
foreach (DictionaryEntry entry1 in dictionary1)
{
    try
    {
          base.WriteItemObject(entry1.Value, (string) entry1.Key, false);
          continue;
    }
    catch { /***/ }
}

Regarding the one-to-many issue: perhaps we should return the simple assembly names as containers. These containers would contain the actual AssemblyNames by version/culture/etc... Hmmm?
Coordinator
Feb 20, 2007 at 3:18 AM
Edited Feb 20, 2007 at 3:18 AM
Very nice work guys! To give output a bit more consistent with the fusion shell extension, what if we changed the formatting to be effectively the same as:

dir | select Version, KeyPair, @{e={$_.ProcessorArchitecture};n='Arch'}, Name | ft -a
 
Version       KeyPair Arch Name
-------       ------- ---- ----
2.0.0.0               MSIL Accessibility
7.0.3300.0            None ADODB
2.0.0.0               MSIL AspNetMMCExt
6.0.6000.0             X86 BDATunePIA
3.0.0.0               MSIL ComSvcConfig
8.0.0.0               MSIL CppCodeProvider
10.2.3600.0           MSIL CRVsPackageLib
10.2.3600.0           MSIL CrystalDecisions.CrystalReports.Design
Note: That KeyPair doesn't seem to be grabbing the public key token.
Developer
Feb 20, 2007 at 3:38 AM
Thanks! Yeah, I like that, but we'll have to override another PowerShell built-in formatting. And first, we need to decide whether to go the name container way, or keep it as-is, with the quite unusual one-to-many behavior.
Coordinator
Feb 20, 2007 at 3:43 AM
So is the issue of one-to-many mean that you are considering gac:\MSIL, gac:\x86, etc? If so, then this wouldn't be too much different than just cd'ing into $env:windir\Assembly? I kind of like having this provider mimic the Explorer view of the world. In this case though you have to consider the name, version, culture, keypair (architecture??) as contributing to the 'unique' name for an assembly.
Developer
Feb 20, 2007 at 3:57 AM
No, one path maps to many assembly names; for example

[13] » get-item gac:\System.Web | ft Version, @{ E={ $_.GetPublicKeyToken() }; L='PublicKey' }, @{ E={$_.ProcessorArchitecture}; L='Arch' }, Name -a
 
Version PublicKey                            Arch Name
------- ---------                            ---- ----
2.0.0.0 {176, 63, 95, 127, 17, 213, 10, 58} Amd64 System.Web
2.0.0.0 {176, 63, 95, 127, 17, 213, 10, 58}   X86 System.Web

which is wierd, no other PowerShell provider acts this way. The proposed solution would be to return a container for each unique assembly name, each containing all assemblies which share that name but differ by version/architecture/culture...
Developer
Feb 20, 2007 at 4:05 AM
and to get the same view like in explorer, you'd use get-child -recurse
Coordinator
Feb 20, 2007 at 4:19 AM
OK that makes sense since that is the fundamental problem the GAC was meant to solve - one name can map to multiple versions/keys without stomping. So get-item gac:\System.Web would effectively treat System.Web as a container/folder? What would the sub-containers look like: gac:\System.Xml\MSIL\2.0.0.0__b77a5c561934e089? Or what if the names were like: gac:\System.Xml#MSIL#2.0.0.0#b77a5c561934e089? That would kind of solve the one-to-many issue but then again the cure might be worse than the disease. :-) But I could do gci gac:\MSIL to get all MSIL assemblies.

BTW could you change the final PublicKey formatting to do like GetHashCode and display in a binhex fashion (and without the {}).
Developer
Feb 20, 2007 at 4:35 AM
Edited Feb 20, 2007 at 5:18 AM
ok, I have to reassert my "vision" here again lads... ;)

The one to many relationship should remain and the provider should stay a ContainerCmdletProvider -- I don't want to complicate this by adding subcontainers. If you want to look at that view, I think cd FileSystem::c:\windows\assembly will get you there ;-) I think the perfect mechanism here is dynamic parameters to switch the source, e.g.

a driveless path (e.g. provider qualified):

ps> gci -root gacngen assemblycache::system.*

or mapping a drive:

ps> new-psdrive assemblycache -root <gac|ngen|shadow> <drivename>

I don't really see a problem with the one to many relationship; providers are flexible enough to cover a vast range of backing stores and this is the model that works for the GAC. It has to be this way if the paths are to remain simple. You can't compare this to the SessionStateProvider exactly, because fuctions,env.vars and variables have a one to one relationship between their paths and their values. I think that there's more benefit in having a path that is divorced from the version number. If you want to get a specific version, use where-object, or perhaps Filter capabilities should be added to the provider. Having to know specified version numbers of assemblies in order to enumerate them defeats the purpose.

The point of this provider -- as I see it -- is to enable the user to explore the GAC easily and flexibly, and to quickly load a type. Powershell's strengths lie in adopting quickly to its environment as an enabling tool to script whatever assemblies/products happen to be installed.

Let's try not to get hung up on getting it to look exactly like other providers (which are themselves limited by the 1980's hierarchical backing stores they represent). Powershell is breaking new ground here, so lets push it a bit further. It's also early days in Powershell's life, and there are many providers yet to be written. I guarantee they won't all be one-to-one models. I'm trying to keep it simple, and allow the best interaction with the powershell "glue." All of powershells intrinsic commands know how to unroll collections, and with this model, a plain gci gives you the view you see in Explorer; I don't think mimicing other providers' behaviour is a good enough reason in itself. The fact these other providers appear to look similar is a coincidence, nothing more. There is no cross provider support, so there is nothing holding us to make them work exactly the same.

The problem that was introduced accidently while Jachym refactored was that GetChildNames returned the names of the Values (the many) instead of the Keys (the one). If you guys wouldn't mind, I would like to finish this provider myself, but I value the input greatly. Lets keep the suggestions in this discussion and out of TF until I'm happy to let it go. ;)

Cheers
Developer
Feb 20, 2007 at 5:17 AM

oisin wrote:
I think the perfect mechanism here is dynamic parameters to switch the source, e.g.

a driveless path (e.g. provider qualified):

ps> gci -root gacngen assemblycache::system.*

or mapping a drive:

ps> new-psdrive assemblycache -root <gac|ngen|shadow> <drivename>


I think parsing a AssemblyCache::GAC|NGen|Shadow\... path would be more appropriate than introducing a Root parameter to gci. However I like the idea of having -Version, -Architecture, -Culture, etc parameters on gci and gi for filtering. I guess there's nothing wrong with the 1-to-many relationship, especially if we provide means to get items not only by name.
Developer
Feb 20, 2007 at 5:27 AM
Oh, you're still around. You probably witnessed me edit that post half a dozen times. ;)

Yeah, if that seems more consistent with other provider qualified paths, I'll cede to that.

Where are you geographically Jachym? I'm in Montreal, QC, Canada (GMT -5). I thought you were Czech?
Coordinator
Feb 20, 2007 at 5:36 AM
I run a second clock on my Vista sidebar set to the timezone for Prague. It's 6:36 AM there. BTW I'm just thinking out loud and have no intention of touching your code. :-) I'm just trying to understand the mental model for this provider. BTW, could you integrate this with the Import-Assembly script in the Scripts dir?
Developer
Feb 20, 2007 at 2:59 PM
hah, I don't have any problem with people touching the code normally; we are a team after all. Jachym has great ideas, as do you, but I'm wary of people leaping before they look.

re: the import-assembly script, yeah, I had that in mind already. It just needs to wrap Resolve-Assembly -Import and it will work perfectly; they both work with AssemblyName objects.
Developer
Feb 20, 2007 at 4:57 PM
Edited Feb 20, 2007 at 5:02 PM
OK, I've had a change of heart. Having GetItem emit collections isn't as useful as I thought it should be. It felt like the right thing, but after some more playing, I am changing it (as Jachym suggested initially) to dump each item separately. The following now works (as without a doubt, it should):

ps> dir gac:\system.we? | ? {$_.Version -eq "2.0.0.0"}
 
Version        Name
-------        ----
2.0.0.0        System.Web
 
ps>

I probably got a bit too defensive after his refactoring totally broke everything. I always get there in the end though, so please don't think pride will get in the way of doing the "right thing." It wont. I have to follow my own preaching, so this is the model (as I see it) that gives us the most. Simple paths; works with foreach/select/where; one to many; and always emits assemblyname objects.



Developer
Feb 21, 2007 at 12:43 AM
BTW, are you going to implement the -Version, -Culture, -Architecture, (perhaps even -PublicKeyToken) parameters for gi/gci so one does not need to use the lengthy where-object syntax?
Coordinator
Feb 21, 2007 at 1:57 AM
I like that idea especially for filtering by architecture specific assemblies. Hmm, that makes me wonder if this should be a filter string? Of course, the problem with filter strings is that provide absolutely no guidance on how they should be constructed.
Developer
Feb 21, 2007 at 2:00 AM
Edited Feb 21, 2007 at 7:03 PM
Yes, I will do that (-Version, -CultureInfo, -ProcessorArchitecture) -- as soon as I find a decent example how to implement it. I'm off now to have a look at your provider to see if you've done something similar...
Developer
Feb 21, 2007 at 2:57 AM
ok, I've got a handle on the dynamic parameter stuff.

btw, here's a question that's stupidly late in the day:

is a GAC provider the right thing to do? might this be better encapsulated in a few cmdlets instead? I don't really have any clue what copy-item, move-item, rename-item, new-item (install assembly?), delete-item (uninstall assembly?) should be doing. I'm not sure now that it is the "powershell way" to have a provider so light on features. hmmm.

Developer
Feb 21, 2007 at 6:30 PM
After some more ponderance, I guess the GAC provider is probably worth it despite the lack of functionality. I think the Tab Completion features clinch it for me anyway.
Coordinator
Feb 22, 2007 at 4:06 AM
There seems to be a minor issue with WriteProgress flashing on the screen when you execute:

> dir gac:
The first, lengthy progress display is fine but then when the provider starts dumping out all the individual assemblies a bunch of progress displays flicker which isn't terribly useful (and is kind of annoying). :-)
Developer
Feb 22, 2007 at 8:24 AM
I tried to fix that by adding the WriteProgressCompleted, but it didn't help. I think we can get away with it, the next version would probably enumerate the gac using a low-priority background job, so the user shouldn't see any progress at all.
Coordinator
Feb 22, 2007 at 8:42 AM
Blech but at least it seems to be a one time deal. BTW it seems that you guys are caching the results (hence the initial progress bar). I assume that you are using a file system watcher or some such mechanism to look for independant adds/removes using gacutil?
Developer
Feb 22, 2007 at 3:45 PM
You can't get rid of that progress bar unfortunately because it is attached to the current ExecutionContext, which is get-childitems in that case. I already went down that path of trying to get rid of it before I dove into reflector to find out what's going on. Essentially what's happening is that the provider uses WriteProgress of the current Cmdlet using the provider, that's why it stays on screen until the command has finished executing.

I had looked at using the -Force command to update the GAC cache, but Jachym tells me that the dir.ps1 command has -Force turned on permanently. A file system watcher isn't really feasible right now IMO, at least not until we make the cache a full background thread.