Category Archives: lync

Multiple Lync Persistent Chat pool headaches

Why would you want more than one Persistent Chat pool? We can scale a single pool to 4 active servers each with 20,000 users, to support 80,000 concurrent PChat users – enough for a great many organizations. In many cases, a single central PChat pool makes a lot of sense. With good use of PChat categories, different groups of users can be separated if needed, and administration (such as creating new rooms) easily delegated.

However, this single-pool model doesn’t fit all. Sometimes different regions or countries host their own Lync infrastructure in a shared topology and each will need (or want) their own Persistent Chat pool.

Generally, this works fine but there’s a problem – the dreaded “Your chat room access may be limited due to an outage” message. You may have seen it before if you’ve broken something, but you will also see it if any PChat pool defined in the topology is down.

pchat-outage

Your local PChat pool may be up, users may be able to get to their chat rooms, but this can still cause a lot of helpdesk calls. It also looks very similar at a glance to the “Limited functionality” warning, which can impact users’ perception of how reliable Lync is.

There are some practical things you can do to help mitigate this however:

  • Agree patching / outage windows for Persistent Chat pools, to avoid users’ typical working hours (in all regions!)
  • Care and planning are needed when adding a Persistent Chat pool. Ensure at least one server for the new pools is ready to be installed and configured as quickly as possible, and perform this out of hours.
  • Educate users about the message, and what it really means – it’s warning of an outage, but it may not be an outage on their pool!
  • Consider limiting the number of users enabled for Persistent Chat – if only 10% of users need it, they’re the only ones who should be enabled. Others won’t see the “outage” warning if they’re not enabled for PChat in the first place.
  • Consider limiting the number of PChat pools in the organization. While you can collocate PChat on a Standard Edition server, this only makes sense in small environments. Don’t let the failure of a single server for 100 users cause an outage warning for 100,000+ users!

Hopefully some better logic around this warning will come in future versions of the Lync (or Skype for Business) client. In the meantime, think carefully about whether you want multiple PChat pools and if you do, what you need to do to avoid the dreaded “outage” message!

Lync 2013 – Windows Fabric installation failure code 1603

An installation issue I saw recently at a customer. Pre-requisites all went on fine, but when doing the Lync install, I hit an issue with Windows Fabric. I was installing at the command-line and saw:

Checking prerequisite WinFab…installing…
There is a problem with this Windows Installer package. A program run as part of the setup did not finish as expected. Contact your support personnel or package vendor.  failure code 1603

winfab-error1

Running via the Deployment Wizard shows the following:

Checking prerequisite WinFab…installing…failure code 1603
Prerequisite installation failed: WinFab

A google search for those terms found this useful blog post but the suggestions didn’t work for me – the firewall service was stopped, but having it running didn’t help, and the system wasn’t set to Italian region (which is on the Lync known issues list), but even so any changes to time format settings didn’t help either.

The fix, as it turns out, was a simple one. The Performance Logs and Alerts service had been disabled in the customer’s standard Windows Server build. Setting this service to Manual or Automatic allowed Windows Fabric to install correctly and the Lync install to proceed.

I did hit a few issues with Windows Fabric after this – if you see anything like the below (event 50006 on LS AppDomain Host Process), uninstall Windows Fabric via the Control Panel > Programs and Features, then reinstall Windows Fabric via the msi.

winfab-error2

 

Persistent Chat – Server could not process your request

A quick and simple one from this week’s deployment of Lync Server 2013.

We got Persistent Chat up and running, created a category, and gave our test user rights to create rooms. Signed in to the Lync client, we selected “Create a Chat Room…”.

pchat-create

This launches a browser to:

https://<pool-web-fqdn>/PersistentChat/RM?clientlang=en-US

..and we were hit with this error: Server could not process your request. Please try again later.

pchat-error

No errors in the Lync or Application event logs, and the only reference to the error I could find was this blog post at pro-lync.be. However, that problem relates to SBA-homed users, which definitely wasn’t the issue here.

The answer was in the IIS log for the Lync internal web site – by default, under C:\inetpub\Logs\W3SVC<siteID>\ (usually the lower siteID is the internal site). We saw the requests for /PersistentChat/RM?clientlang=en-US – but the username was that of the user who had logged in to Windows, not to Lync.

When we logged in to the PC as the test user, signed in to Lync and tried again, we got the chat room management page as expected and everything worked fine.

Fictitious telephone numbers

Most countries publish a telephone numbering plan, and this comes in very useful when setting up dial plans and normalization rules for Lync. It wasn’t until I was delving into the UK numbering plan that I found that large blocks of numbers in the UK are specifically reserved for “drama purposes” – in other words to use in films and TV shows. They are not allocated to anyone, and should not be allocated in the foreseeable future. This is much like the 555 convention used in the USA.

The full list is here – a total of 20,000 numbers to choose from! They include most major UK cities/areas, as well as mobile, freephone, premium rate and non-geographic.

What does this have to do with Lync? There are a few situations where they can be useful:

  • Customer documentation, especially example numbers or in generic templates.
  • Testing dial plans and PSTN usages, particularly for premium rate where you might not want to connect to a real one and pay over £1 a minute!
  • Validating that numbers across different area codes normalize correctly.

Australia has a similar scheme – the ACMA have a list of numbers for drama.

I’ve not come across any for other countries though – if you know of any, please get in touch!

Error when installing Lync Archive or Monitoring databases

I hit an odd problem at a customer last week when trying to install the Lync Server 2013 Archiving and Monitoring databases on their existing SQL server. They had (sensibly) set up separate drives for SQL data and logs and wanted the Lync databases put on the correct drives. They usually put them on the root of the drive, no base folder.

I defined the database in Topology Builder, Publish Topology and got the Create Databases dialog. I manually specified the file locations:

sqlissue1

Yet the process fails with the little red cross – so what’s going on?

sqlissue2 sqlissue3

InstallDatabaseInternalFailure: An internal error has occurred while trying to create or update the database.

Error: Default path ‘H:’ obtained from Sql Server does not have a drive letter. Please check your SQL Server installations and try again.

The SQL Server event viewer shows 18456 events, saying that login failed – “Token-based server access validation failed with an infrastucture error”.

sqlissue4

So what is going on here? Some things I tried or verified:

  • Account permissions (sysadmin on instance, domain admin)
  • UAC disabled
  • Firewall disabled
  • Plenty of free space on the drives
  • Creating a database manually from the SQL server
  • Lync server could get to the instance OK via Management Studio and create a database
  • Tried creating folders on the SQL server drives and installing DBs to there
  • Using Install-CSDatabase cmdlet via the shell instead of Topology Builder

The culprit: the default paths for the instance were set to the root of the volume. In this state, it doesn’t matter what path you specify to install the databases to, it will always fail.

Workaround: change the default paths for the instance (even if only temporarily).

In SQL Server Management Studio, right-click on the instance, and select Properties.

sqlissue7

Under Database Settings, Default Database Location, change both paths to something else (it has to be valid, but can be anything as long as it’s not at the root of the volume). Click OK to save changes.

sqlissue6

The problem seems to be that when you specify the root of a volume, SQL Server does not keep the trailing backslash (eg. H:\ is stored as H:). The Install-CSDatabase cmdlet appears to retrieve the default path for the instance and SQL considers it invalid, even if you’ve told it to install to a different path.

Trying it now and it works fine, even with the same settings as before (installing to the root of the drives):

sqlissue8

Probably not a common issue but one to consider if you have problems installing the Monitoring or Archiving databases for Lync Server.

The customer (and my lab for screenshots) are on SQL Server 2008 R2 RTM. I have also tested with SP2 and the issue is the same. I’ll try out on SQL Server 2012 at some point to see if that does the same.

System Center Advisor for Lync

A colleague recently made me aware of System Center Advisor (SCA) – this is a cloud service offered by Microsoft that analyses local servers and workloads, recording configurations and offering useful advice for proactively keeping them working at their best. I wasn’t familiar with it, so decided to take a closer look.

The product was born from a desire to reduce the number of “common issue” support cases that Microsoft see – where they hear the symptoms and know very quickly that a particular patch, or config change, will solve the issue. By feeding information about a customer’s systems to Microsoft, they can then advise on any necessary actions to take, hopefully before it has any impact on a running service.

Lync 2010 support was added to the offering in November 2012, on top of the previously supported Windows Server, SQL Server, SharePoint and Exchange. Lync 2013 support is not yet present at time of writing, but expected soon. A recent development is that Microsoft has made SCA available to all licenced customers, where previously it was a benefit offered only to Software Assurance customers.

There is a Microsoft Virtual Academy course on SCA available, with about an hour of material giving a good overview, although some details have changed since it was published. Some of the info here is taken from that course.

Sign-up and installation

A Microsoft account (aka Live ID, Passport, etc.) is needed to get started. One account can be used for multiple monitored organisations, useful for companies that support more than one customer.

Once logged in, it’s a simple process to download the installer and a certificate.

2013-05-08_173226

At least one “gateway” is needed, which will be a machine that can reach the internet. “Agents” run on each monitored server and report their results up to the gateway. The gateway then passes this on to the SCA service which is hosted on Microsoft’s Azure platform. The installer covers both roles, and has 32 and 64 bit versions included.

Data is only sent once a day, so unless you hurry the process up with some manual intervention, you won’t see much for up to 24 hours.

Interface

A dashboard view gives a summary of alerts.

2013-05-08_172503

The main alerts view. All columns can be immediately sorted and filtered without refreshing, and the lower pane gives more detail about the issue, including a link to the relevant solution or KB article. Microsoft are trying to avoid the need to trawl the internet by taking you straight to the most relevant solution.

2013-05-08_172618

Once SCA detects that the issue has been resolved (for example, by applying a recommended patch), it will automatically close the issue. You can also manually ignore an issue to avoid seeing it.

Going in to the alert rules shows us the various scenarios that will generate an alert – this will include items such as a certificate expiring soon, or a misconfigured network interface.

2013-05-08_172821

Configuration data is viewable within SCA. It is not intended to be an exhaustive list, but the most useful data needed to solve support issues. This is available both as a current snapshot of settings:

2013-05-08_173033

..and as a configuration history. This can be particularly useful as one of the first questions in a support case is “what has changed recently?”. This records the setting, both previous and new values and when the change occurred.

2013-05-08_211104

Finally, we have screens showing lists of monitored servers (this is also where we set up new servers) and user accounts. Any additional users also need to have a Microsoft account to access the service.

2013-05-08_173151

2013-05-08_211822

Under the covers

The agent installed for SCA is the System Center Operations Manager 2007 R2 agent. If an existing agent is found on a monitored server, it will work alongside – effectively the server will go on being monitored by the existing SCOM agent but also send some data to SCA. This will also work side-by-side with the SCOM 2012 agent.

Data sent from the agent to gateway and on to Microsoft is all readable XML files, so it is easy to see what is (and is not) being sent outside the organisation. Uploads are archived locally (for 5 days by default) so that you can read the contents.

The agent needs about 75MB of system memory, and needs about 150KB of network traffic per server per day.

Microsoft go to great lengths to stress security and privacy of the data collected – they do not share any data with third parties, and will not use it for sales or licence validation purposes. They do not have any visibility of servers that do have the agent deployed to them. The SCA account can be closed at any time and removal of data will happen within 90 days.

Communication between agents and the gateway is over port 80 by default. Uploads from the gateway to Microsoft are via HTTPS.

What does it NOT do?

SCA is not intended to be a real-time monitoring service. Data is only uploaded every 24 hours. A proper monitoring solution such as SCOM is recommended if you need to know when workloads are:

  • Unavailable
  • Underperforming
  • Causing business impact or downtime

It also does not cover every possible scenario, or recommend every patch that may be necessary – just what Microsoft CSS commonly see in support cases. It is still a great idea to implement change control, and to review published patches to see if they are applicable to your environment.

Conclusion

Now that SCA is effectively free to all licenced Microsoft customers, it is well worth evaluating.

I think the most benefit comes to smaller organisations, who may not have specialist teams looking after specific products, do not have any formal change control and don’t usually have time to rigorously check for new patches. Being informed of a possible problem and taken straight to the relevant page for the fix could save a lot of time. These orgs will often pay for individual support cases with Microsoft too, so if they can avoid a common issue, it saves them money too.

Where there is no in-house knowledge of products like Lync, SharePoint or Exchange, they may defer to an outside company or consultant for support, and this is a great way to give them an overview of the environment and keep them informed about any recommended work.

Larger organisations will typically be using a monitoring product such as SCOM, and have good processes in place for regularly patching servers and performing health checks. It may still be worth their while to use SCA, as it is easy for some items to be overlooked, such as expiring certificates, and as a reassurance that their systems are being managed properly.