All Systems Operational

Updated a few seconds ago

Firefly Learning Platform Europe, Middle East and Africa




Operational

Firefly Learning Platform Asia




Operational

Firefly Learning Platform China




Operational

Firefly Learning Platform Australia




Operational

Firefly Learning Platform North America




Operational

0

Upcoming Maintenances

3

Incidents Last 30 Days

0

Maintenances Last 30 Days

External Services

History (Last 7 days)

Incident Status

Degraded Performance


Components

Firefly Learning Platform Europe, Middle East and Africa


Locations

Firefly Learning Platform




March 25, 2020 10:59AM UTC
[Resolved] Performance degradation has been resolved

March 25, 2020 10:48AM UTC
[Monitoring] We have made some changes that has improved the performance of the affected schools to normal levels. We will continue to monitor.

March 25, 2020 10:01AM UTC
[Investigating] Most schools are working very smoothly with further record levels of usage today. Unfortunately 27 schools are experiencing performance issues because of the load to their database cluster. We are working on reducing the load and adding capacity and those affected schools may see brief downtime.
Login component failureService Disruption

Incident Status

Service Disruption


Components

Firefly Learning Platform Europe, Middle East and Africa


Locations

Firefly Learning Platform




March 24, 2020 9:07PM UTC
[Resolved] The platform is performing normally.

March 24, 2020 4:52PM UTC
[Monitoring] We believe we have identified and fixed the issue that was causing an issue for around 5% of requests and affecting Office 365 login. We are now seeing 0.3% of requests in error across Firefly. We continue to monitor and prepare for tomorrow.

March 24, 2020 3:46PM UTC
[Monitoring] Errors are now down to 1% of all traffic. However, we are aware that most of the failing requests relate to Office 365 single sign on. It does seem to work after a lot of retries, but we are of course working on this at the moment to remove the need to retry.

March 24, 2020 2:28PM UTC
[Monitoring] We are currently seeing a failure rate of about 5% (95% of pages are loading successfully). We are investigating now if there is a pattern to the pages that are not loading. Please advise users to try reloading the page while we investigate.

March 24, 2020 1:59PM UTC
[Monitoring] As schools have ramped up usage again we have seen increased error rates due to load. These appear to be intermittent - reloading the page can often resolve the issue. We are investigating further.

March 24, 2020 12:59PM UTC
[Monitoring] We have just reopened access to all school sites in Europe for us to monitor. The sites appear to be working well now. We will closely monitor before resolving the incident. We have successfully made the configuration changes so we are now using the new Redis infrastructure.

March 24, 2020 12:10PM UTC
[Identified] We are now expecting the system to be up and running again by 1:30pm. We are working through the configuration changes needed on our large fleet of web servers to access the new component.

March 24, 2020 11:30AM UTC
[Identified] We have been working hard on resolving the issue with our login component that was preventing sites from working. The problem relates to a component called Redis which is typically extremely reliable and experienced no issues yesterday. We are bringing up a completely new instance of the Redis component and making changes to our large number of servers to use this instead. We will then need to do some testing of the new infrastructure and will then update again. We have to do this meticulously and carefully to try and ensure we set the infrastructure up for success when we re-open access to the system.

March 24, 2020 9:54AM UTC
[Identified] We’re still seeing phenomenal demand on our infrastructure as our schools move online, and despite scaling everything up to more than 16x its normal capacity yesterday we’re investigating a separate issue that’s preventing people from logging into our platform this morning. We’re really sorry about this. Our teams are working as hard as we can alongside our hosting partner Amazon to resolve it as quickly as possible. We know this is a critical time for you, and we’re doing everything in our power to get things up and running. We know it’s important for you to be able to plan. We expect that the issue is likely to continue for the remainder of the morning, and will provide an update by 12.30pm (GMT) regarding this afternoon’s availability.

March 24, 2020 8:38AM UTC
[Investigating] We are still working with Amazon to try and resolve the issue with the underlying Redis system that supports our login system. We apologise for the disruption.

March 24, 2020 8:10AM UTC
[Investigating] We are working with our infrastructure partner Amazon to try and resolve the issue with the login component as quickly as possible.

March 24, 2020 7:37AM UTC
[Investigating] We are currently investigating a failure in the system that manages login to the system (Redis). We hope to resolve this quickly.

Incident Status

Partial Service Disruption


Components

Firefly Learning Platform Europe, Middle East and Africa


Locations

Firefly Learning Platform




March 23, 2020 6:10PM UTC
[Resolved] We appreciate that today has been a challenging day for many students, parents, teachers and school leaders as we collectively face the “new normal” of distance learning with school closures globally. The Firefly team would like to thank you for your patience and support as we adapt together to enable distance learning at scale. We apologise that in the early part of today due to unprecedented traffic levels (7x previous peak), some Firefly school sites were loading slowly or intermittently. The majority of sites were working again by 11.30am (GMT) and have been up and running since. You can find out more about the actions we are taking ahead of tomorrow at https://helpcentre.fireflylearning.com/technical/cloud/communications/23rd-march-firefly-learning-continuity-preparedness-update-and-scheduled-maintenance

March 23, 2020 3:19PM UTC
[Monitoring] The infrastructure continues to remain stable for most schools. We are instituting a maintenance window tonight between 8pm and 6am (local time) in order to further prepare for tomorrow. We will also be providing further updates here as to our preparations for tomorrow and will leave this incident open.

March 23, 2020 1:11PM UTC
[Monitoring] Firefly continues to be stable for most schools now. There are a small number of schools seeing degraded performance and we are working on those individually. We will be instituting an earlier than usual maintenance window at the end of the school 'day' today to help prepare for tomorrow morning, and will communicate regarding this shortly.

March 23, 2020 12:13PM UTC
[Monitoring] The additional capacity we added in the last hour has allowed us to increase the performance and stability of the platform significantly. We are seeing consistently high levels of activity but much improved performance from the morning. Our team will of course continue to monitor very closely.

March 23, 2020 11:33AM UTC
[Identified] Our infrastructure in Europe continues to struggle with the unprecedented load today but we are working hard to mitigate this and are confident it can be resolved. The infrastructure team have already increased our available infrastructure by 4x which has increased the numbers of students, teachers and parents that can access the site but it is still running very slowly for many. The infrastructure team continue to increase capacity and are also rapidly actively investigating other options. We have doubled the size of the infrastructure team itself and continue to monitor the situation closely from both UK and Australia based staff. We are working very closely with our infrastructure partners Amazon. As soon as we have a further update we will share it here. We want to apologise again for the impact on students, teachers and parents at our schools and are working through the day and night. Please also note that our infrastructure is segregated so this is not being affected by our offer of free access to other schools at this time.

March 23, 2020 10:35AM UTC
[Identified] We are still working hard to expand capacity fast enough to manage the load. At this time we do not have an ETA for normal operation but as soon as we do we will update you.

March 23, 2020 9:41AM UTC
[Identified] Due to the unprecedented level of load many sites are loading slowly or intermittently. We have already added significant extra capacity to the system and are adding significantly more. This is being worked on as a top priority. We recognise that schools are teaching from home at the moment and apologise for the impact. We are doing everything we can to resolve as soon as possible.